1. Introduction

A Hierarchical HAZOP-Like Safety Analysis for Learning-Enabled Systems

Yi Qi

Philippa Ryan Conmy

Wei Huang

Xingyu Zhao

Xiaowei Huang

1 0 Adelard Part of NCC Group , London, N1 7UX , U.K 1 Department of Computer Science, University of Liverpool , Liverpool, L69 3BX , U.K

Hazard and Operability Analysis (HAZOP) is a powerful safety analysis technique with a long history in industrial process control domain. With the increasing use of Machine Learning (ML) components in cyber physical systems-so called LearningEnabled Systems (LESs), there is a recent trend of applying HAZOP-like analysis to LESs. While it shows a great potential to reserve the capability of doing suficient and systematic safety analysis, there are new technical challenges raised by the novel characteristics of ML that require retrofit of the conventional HAZOP technique. In this regard, we present a new Hierarchical HAZOP-Like method for LESs (HILLS). To deal with the complexity of LESs, HILLS first does “divide and conquer” by stratifying the whole system into three levels, and then proceeds HAZOP on each level to identify (latent-)hazards, causes, security threats and mitigation (with new nodes and guide words). Finally, HILLS attempts at linking and propagating the causal relationship among those identified elements within and across the three levels via both qualitative and quantitative methods. We examine and illustrate the utility of HILLS by a case study on Autonomous Underwater Vehicles, with discussions on assumptions and extensions to real-world applications. HILLS, as a first HAZOP-like attempt on LESs that explicitly considers ML internal behaviours and its interactions with other components, not only uncovers the inherent dificulties of doing safety analysis for LESs, but also demonstrates a good potential to tackle them.

eol>Safety analysis HAZOP learning-enabled system trustworthy AI AI safety hazard identification autonomous underwater vehicle machine learning security deviation analysis robotics and autonomous system cyber physical system

1. Introduction

for the whole system can be identified [3].

In recent years, increasingly sophisticated mathematiAfter initially developed to support the chemical process cal modelling processes from Machine Learning (ML) are industries (by Lawley [1]), Hazard and Operability Anal- being used to analyse complex data and then embedded ysis (HAZOP) has been successfully and widely applied into cyber physical systems—so called Learning-Enabled in the past 50 years. It is generally acknowledged to be Systems (LESs). How to ensure the safety of LESs has an efective yet simple method to systematically iden- become an enormous challenge [4, 5, 6]. As LESs are distify safety hazards. HAZOP is a prescriptive analysis ruptively novel, they require new and advanced analysis procedure designed to study the system operability by for the complex requirements on their safe and reliable analysing the efects of any deviation from its design function [7]. Such analysis needs to be tailored to fully intent [2]. A HAZOP does semi-formal, systematic, and evaluate the new character of ML [8, 9], making concritical examination of the process and engineering inten- ventional methods including HAZOP and HAZOP-like tions of the process design. The potential for hazards or variants (e.g., CHAZOP [10] and PES-HAZOP [11] that operability problems are thus assessed, and malfunction are respectively introduced for computer-based and proof individual components and associated consequences grammable electronic systems) obsolete. Moreover, LESs exhibit unprecedented complexity, while past experience 2T0h2e2I)J,CJAulIy-E2C4A-2I5-2,220W2o2r,kVsiheonpnoan,AAurtsitficriiaal Intelligence Safety (AISafety suggests that HAZOP should be continuously retrofitted * Corresponding author. to accommodate more complex systems [12], consider$ yiqi@liverpool.ac.uk (Y. Qi); pmrc@adelard.com (P. R. Conmy); ing quantitative analysis frameworks [13, 14] and human huang23@liverpool.ac.uk (W. Huang); factors [15]. To the best of our knowledge, there is no xingyu.zhao@liverpool.ac.uk (X. Zhao); HAZOP-like safety analysis dedicated for LESs that takes xiaohwttpeis.:h//ugainthgu@b.lcivoemrp/YoioQl.ia0c3.u18k ((YX.. QHiu);ang) into account ML characters while preserving the simhttps://www.adelard.com/people/philippa-ryan.html (P. R. Conmy); plicity and efectiveness of HAZOP (comparing to other https://intranet.csc.liv.ac.uk/~wh1923/ (W. Huang); conventional safety analysis methods [16]), which motihttps://www.xzhao.me/ (X. Zhao); https://cgi.csc.liv.ac.uk/~xiaowei/ vates this research. (X. Huang) In this paper, we introduce a new Hierarchical HAZOP(X. 0H00u0a-n0g0)02-3474-349X (X. Zhao); 0000-0001-6267-0366 Like method for LESs (HILLS). HILLS first stratifies © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License the complex LESs into three levels—System Level, MLCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) Lifecycle Level and Inner-ML Level, then applies HAZOP separately on each level to identify safety elements of improvement of the system. More details are given for interest, namely causes, mitigation, hazards (or latent- each step of HAZOP as what follows. hazards for latent levels that cannot directly lead to mishaps) and security threats. When applying HAZOP on Form HAZOP team To perform HAZOP, a team of the ML related levels, we revise HAZOP to cope with ML specialists is formed according to the project scope and characteristics, e.g., by introducing new ways of defining aims. These experts have extensive experience, expert nodes and new sets of guide words. We also identify knowledge and understand the overall procedures of the causes of hazards from the ML development process (mod- system deeply, such as operations, maintenance and enelled by the ML-Lifecycle level) to reflect its data-driven gineering design. nature (e.g., how data is collected, processed, etc). Furthermore, we attempt to address the challenge of how Identify system elements The HAZOP team will forto link and propagate those identified safety elements mally represent the system under study by identifying within and across three levels, then propose both qualita- the elements. Each element is called a Node, representtive and quantitative (an initial Bayesian Belief Network ing an operational function. Then, nodes and interactions (BN) solution) methods to model the casual relationships. between nodes (e.g., data/control flows) collectively form To examine the efectiveness and demonstrate the use the system representation under analysis. case of HILLS, we finally conduct a case study on Autonomous Underwater Vehicles (AUVs), with discussions on assumptions adopted and extensions to real-world applications.

The key contributions of this work include: a) A first HAZOP-like safety analysis for LESs that explicitly considers ML characters (including security threats and the data-driven nature in the development process) and reduces the complexity by hierarchical design.

b) New considerations of dividing nodes in the system representation and a set of new guide words that adapt the traditional HAZOP for levels regarding ML models.

c) A first attempt at linking/propagating identified causes, mitigation, (latent-)hazards and security threats across ML levels.

d) Key challenges identified as research questions that are generic to safety analysis for LESs in future research.

Consider deviations of operational parameters

HAZOP assumes that a problem can only arise when there are some Deviations from the intent design. HAZOP searches for deviations in the system representation. The deviation on a node is expressed as the combination of Guide Words and process Attributes .

Each guide word is a short word to create the imagination of a deviation of the design/process intent. The most commonly used guide words are: no, more, less, as well as, part of, other than, and so on. Guide words provide a systematic and consistent means of brainstorming potential deviations to normal operations. Each guide word has a specific meaning, e.g., no means the complete negation of the design intention, early means something occurred earlier than intended time. Attributes are closely related to nodes, and are usually the subject of the action being performed. The definition of attributes relies on expert knowledge.

2. Preliminaries: HAZOP HAZOP is an inductive hazard assessment method that

is conducted by an expert team. It systematically investigates each element in the system with the goal to find the potential situation that could cause the element to pose hazards or limit the system’s normal operations.

There are four basic steps to perform the HAZOP: • Define the project scope and aims, and form an expert team. • Identify system elements and model the system as a system representation. • Consider possible deviation of operational parameters. • Identify hazards, causes and mitigation solutions.

Identify hazards, causes and mitigation Where

the result of a deviation would be a danger to workers or to the production process, a potential problem is found. Hazard (H) is a source of potential damage, harm or adverse health efects on something/someone, while mishaps are damages or harms on something/someone. Cause (C) is the reasons why the deviation could occur. It is possible that several causes are identified for one deviation. Mitigation (M) helps to reduce the occurrence frequency of the deviations or to mitigate their consequences. Hazards, causes, and mitigation are usually assigned with their respective IDs.

3. Problem Statement Once the four steps are completed, team members may generate additional safety requirements if necessary to mitigate or prevent the identified issues, leading to Given HAZOP was not originally designed for LESs, in

evitably new problems arise when attempting to apply HAZOP on LESs. These problems are formalized as a set of research questions (RQs) proposed in this section. We RQ4: How to establish the relationship between ifrst present the rationale behind those RQs (i.e., justifica- identified safety elements across levels? For simtion of how we have come to the RQs) and then articulate plicity, HAZOP is expected to be applied separately what would be the expected solution to each RQ. to each level of a hierarchical system representation. Therefore, to get the safety analysis of the whole comRQ1: How to reduce the complexity of LESs so that plex system, it is necessary to study the relationship HAZOP can be efectively applied to? HAZOP is a between identified safety elements—namely causes, mitsemi-formalised analytical method, used to identify the igation, hazards (and latent-hazards)—across diferent hazard scenarios of a defined process, and it has been levels. Then, based on the nature of the relationship (e.g., successfully used on relatively simple systems. When fac- causal or not, quantitative or qualitative, probabilistic ing a complex system, HAZOP often cannot play its role or deterministic), proper formalism should be used to well. LESs exhibit unprecedented complexity, rendering establish and express such relationship of those hazard directly applying HAZOP to LESs infeasible. Therefore, analysis results collected from each level. we need to reduce the complexity in the system representation. A simple yet efective solution is by “divide and 4. Running Example conquer”, e.g., stratifying a complex system into multiple levels. In this regard, a promising solution to RQ1 is to propose a hierarchical system representation, so that HAZOP can be efectively applied.

RQ2: How to define nodes in each level, especially for novel levels regarding ML? We assume that HA

ZOP can efectively handle a single level system representation, as we expect to introduce a hierarchical structure in the RQ1 solution. The second step of HAZOP is to divide nodes at each level (presuming we already have a group of experts as the HAZOP team). Past experience shows that division of nodes can be based on the functionalities of components in the system [17], so we may continue using such traditional method for those non-ML related levels. However, when there are ML components in the system under analysis, it is dificult for the traditional division method of nodes to be directly applied. Therefore, RQ2 is raised to explore the novel definition of “functionalities” at ML-related levels.

We present a running example from the SOLITUDE

project1, which conducts safety analysis on an AUV that autonomously finds a dock and performs the docking task. The workflow of the scenario is given in Figure 1.

The robot starts when received the user’s command.

RQ3: Will there be any new guide words related to Once started, it uses sensors (e.g., cameras) to receive ML? Guide word is one of the key compositions of a data. Data is transmitted and preprocessed before feeding deviation. The team of experts is responsible for iden- into the YOLO model for object detection and localisatifying guide words that fit the scope of their analysis, tion. The localisation result is further utilised for path while common guide words used were No, Less/More, planning. In addition, the above normal workflow may Slower/Faster, Early/Late, etc. However, the existing sufer from external attacks on some stages, including set of guide words is unproven for use in ML applica- data transmission, data preparation, and path planning. tions, so this RQ aims at determining the efectiveness We remark that, the scenario in the project is more comand new meanings of known guide words for ML related plex, including utilising deep reinforcement learning for levels, and checking whether there might be missing motion planning, but for the space limit, this paper only guide words. Although we expect most of the known focuses on the perception component. guide words can still be applicable, they might miss some deviations given the new characteristics of ML. Thus, prospective new guide words may be introduced, they 5. Proposed Method might miss some deviations given the new characteristics of ML. Thus, prospective new guide words may be In this section, we present the HILLS method, and comintroduced. pare it with HAZOP. HILLS is inheriting from HAZOP

1https://github.com/Solitude-SAMR/UWV_RAM

the basic structure composition and definitions of ele- diferent functions, and they will be categorized as difments, with extensions that are suitable for LESs. The ferent Nodes. Consider the running example in Figure 1, tables and figures presented in this section are partial for “blue blocks” represent the functional areas of the runillustrative purpose only, cf. the complete HILLS analysis ning example, which means that our nodes can be set results based on the SOLITUDE project at the GitHub according to these blocks. An example of setting nodes repository1. is provided in Table 1. We note, the setting of nodes is specific to the system under investigation. E.g., the node 5.1. Hierarchical HAZOP “Labeling” was not included in Figure 1. Some guide words originated from, e.g., the chemical As shown in Figure 2, HILLS has a three-level structure, industry can still be used in LESs. Attributes related to including system level, ML-lifecycle level and inner-ML the LES are used together with the guide words to express level. We analyse each level individually in this subsec- deviations. tion, and discuss their relations in Section 5.2. Note, the Example 1 At system level, we discovered several hazHILLS structure discussed here is generic (for illustra- ards from the running example, some of them are sumtion purpose), and may be subject to adaptation when marised in Table 2. E.g., one of the hazards is “erratic working with specific systems. trajectory”, suggesting that the robot moves into an unsafe area. This hazard is associated with a deviation “no action” where “no” is the guide word and “action” is the attribute (when the AUV takes no actions in the water, the disturbance of current makes it dificult for the robot to maintain a stable trajectory). One of the causes of the hazard is “no data from sensor”, which can be mitigated by, e.g., the use of an acoustic guidance system as a duplicated perception component based on another sensor.

Example 2 Some hazards, such as “erratic trajectory”, may appear in diferent nodes, which suggests that they may occur more often, and thus may have the higher priority to be mitigated after considering the severity of consequences as well.

Example 3 One hazard can be mitigated in diferent ways. For example, we identified several mitigation solutions for the “erratic trajectory”, most of which focus on Figure 2: The 3 level hierarchical structure of HILLS early prevention, such as “maximum safe distance maintained if uncertain” and “camera health monitor”.

HILLS aims to exhaustively cover all potential hazards.

Table 1 In the running example, the possible causes of crashes Nodes in each level in SOLITUDE example or failing to turn directions when facing obstacles may include “no data from sensors (instantaneous or permaLevel Node Description nent)”, and “misclassification”, corresponding to the errors in hardware and software components, respectively.

However, the hazards, causes or mitigation may not be fully identifiable at this level. For example, there are other mitigation solutions for the cause “misclassification” that need to consider how the ML component is trained and constructed. However, the system level alone cannot naturally include relevant nodes for this purpose.

This motivates us to consider other levels (as discussed below).

User Hardware components Data transmission Data collection Labeling Data preprocessing Hyperparameter setting Model deployment Feature Extracting Object Detection

Localisation System level System level System level ML-lifecycle level ML-lifecycle level ML-lifecycle level ML-lifecycle level ML-lifecycle level Inner-ML level Inner-ML level ML-lifecycle level

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10

Node 11

5.1.1. System level HILLS at the system level largely follows HAZOP. Hardware, software, and ML components of an LES represent 5.1.2. ML-Lifecycle level The key motivation for the ML-lifecycle level is to handle

the complexity arising from the integration of ML components into an LES, considering mainly the human factors and security threats involved in the development process of ML models. Thus, deviations from this level cannot be of data preparation. Aforementioned mistakes are direct identified if analysis was only conducted at the system human errors. There are also adversarial attacks that level. On the other hand, the hazards at system level can lead to significant drop in performance, which are may be attributed to the hazards at ML-lifecycle level, classified as security threats. Some examples are shown e.g., the low prediction accuracy of ML component may in Table 4. be caused by the polluted data in the data collection or Example 4 On the node “data collection”, there is a insuficient epochs of training. For the running example, threat “data poisoning”, which occurs because the input through the analysis at the ML-lifecycle level, we know data is contaminated. A suggested mitigation is to deploy that the low accuracy of the results may be caused by a detector based on data provenance. inaccurate labeling. We remark that, deviations identi- Example 5 For ML components, we identified mitigaifed at non system level are called Latent-hazards (LH), tion, e.g., “classifier reliability for critical objects >X” [ 18], as they pose indirect hazards from latent levels with no to reduce misclassifications with safety impacts. hardware components being interacted and thus cannot Example 6 For the latent-hazards “low prediction accudirectly lead to mishaps. racy”, its causes include “users make mistakes on labelling”,

Table 3 presents a set of guide words that are required “data itself is missing”, and “data itself is incomplete”, each at this level. These guide words are redefined from the of which has their suggested mitigation (cf. Table 4). existing guide words in HAZOP. Table 3 includes both Example 7 There is a deviation “attack”, whose threats their original meanings (in HAZOP) and new meanings are various attacks, e.g., evasion attack, backdoor attack, (in HILLS). “part of” represents a qualitative modification and data poisoning attack. Their respective cause is usuin the original meaning, and in HILLS it may mean the ally that a certain entity in the training or inference of an incompleteness of the structures, definitions, or settings. ML model (e.g., input instance, model structure, training, For “Less” and “More”, considering that we are concerned dataset) is perturbed, modified, or contaminated. Their about data flow and data value, their new meanings refer respective mitigation can be very specific (cf. Table 4), e.g., to the amount of data rather than, e.g., the water volume. the backdoor detector in [19] for tree ensemble classifiers.

5.1.3. Inner-ML level ML components such as YOLO are composed of one or

Guide word Part of more ML models, each of which is formed of a set of Original Meaning Qualitative modification functional layers. Even after a thorough analysis of all New Meaning Incomplete definition or setting possible deviations (with mitigation solutions) in the ML Guide word Less development process modelled by our ML-lifecycle level, Original Meaning Too little additive volume added the ML components may not perform as expected, e.g., New Meaning A less amount of data the convolutional layers fail to extract features accurately, Guide word More and the fully connected layers fail to make reliable classiNOeriwginMaelaMnienagning TAoloarmgeucahmaodudnittiovfedvaotlaume added ifcations. Thus, safety analysis on the internal structure of an ML component is required. At the inner-ML level,

HILLS takes the method of extracting basic layers of an

Safety analysis at the ML-lifecycle level can exhibit ML component to form a model for analysis. To cater new latent-hazards, as shown in Table 4. While ML mod- for diferent complexity of the ML component, two exels are subject to security issues, we believe malicious traction methods are proposed. The first one deals with attacking behaviors should also be considered as security simple models with up to 5 layers. It follows the layer Threats (T). Human factors are considered because ML structure and considers each layer to represent a sepadevelopment is a human-centered process, which makes rate functionality. Consequently, each layer is defined possible some human related errors such as labelling er- as a node in the system representation. The second one rors, part of operations were forgotten and the omission deals with more complex, larger models by abstracting a model into several functional blocks and every block may contain a number of layers. Our analysis in the running example follows the second method.

5.1.4. Further Considerations on Use Cases of HILLS HAZOP is to provide a systematic, critical examination of

the process (and engineering intent) of a new or existing Table 5 facility, and should normally be done before the system is New guide words of ML-Lifecycle and inner-ML levels oficially put into service [ 22]. Nevertheless, we believe Guide words Meaning that HILLS can still be applied after the occurrence of an accident, in particular the recent technologies have Wrong Wrong setting or data value enabled the recording of system executions through, e.g., Invalid Icnovnaflliidctdinagtawviathluoetohrerdcaotamfploowne,nptosssibly dHiIrLeLctSombaseyruvsaetitohne, rreeccoorrddiendgvsitdoeiod,eonrtisfnyarpeslahtoetdimcaaugseess. Incomplete Incomplete data value and hazards.

Perturbed Data was perturbed by external attackers Moreover, we note the following points when using Incapable Part of data can not be labeled HILLS. First, when dealing with an LES, we focus on the workflow or the pipeline diagram of the entire system, to identify nodes according to the method we explained ear

We identified several new guide words, as shown in lier. The analysis at the system level can help us identify Table 5, which are highly relevant to the setup of the the hazards sourced from the ML components, to enable ML component and data flow. It is worth noting that the the analysis at the lower levels. “Perturbed” is a special guide word that is needed when Second, guide words will be combined with the atconsidering the existence of an external attacker. tributes of each node to form deviations. This will pro

Example 8 Deviations containing “perturbed” are usu- ceed sequentially following the level structure of HILLS, ally proprietary attacks, e.g., we record “perturbed dataset” i.e., the deviations at the system level will be identified as “attack” and the threat as “data poisoning” (cf. Table 4). ifrst, followed by the ML-lifecycle level, and the inner-ML

As shown in Table 6, HILLS performs analysis inside level. an ML model, which in general is closely related to the Third, before looking for (latent-)hazards, causes, and internal structure of the model. mitigation at each level, we are based on a reasonable

Example 9 When the ML component has wrong output, assumption that mitigation solutions of higher levels are we can get from the inner-ML level analysis that this may easier than lower levels. That said, HILLS may not need be related to the setting of the hyperparameter. Explainable to be conducted at the inner-ML level, and can stop when AI (XAI) methods may help users to, e.g., locate which layer all hazards are found and mitigated at other levels. of neurons contribute the most to the wrong ML behaviours [20] and detect backdoors [21].

Example 10 At the inner-ML level, we focus on the ML 5.2. Relations Between Levels model structure itself. E.g., unsuitable parameter setting in activation functions or pooling layers also make specific latent-hazards. It also leads to wrong outputs or losing part of information of figures (cf. Table 6).

Up to now, we have identified the nodes, attributes, guide

words, (latent-)hazards, threats, causes, and mitigation solutions for individual levels in the HILLS framework. We also notice that the relations between these elements can be very complicated. This calls for a formal analysis of the relations. While formalising the relations between levels is a significant challenge, and there might not be one best way, we propose to study them both qualitatively and quantitatively.

A BN is a graphical model that presents probabilistic re

lationships between a set of variables by determining 5.2.1. Qualitative Analysis causal relationships between them [23]. It is also a powQualitative analysis studies the connections between lev- erful tool for knowledge representation and reasoning els, with the guide words as entry points. The guide under uncertainty, visually presenting probabilistic relawords and the deviations may have the following con- tionships between a set of variables [24]. Actually, BN has nections. already been used to study the relation between latent

First of all, the same guide words at a level have strong features learned by a deep neural network [25]. While associations, even if they are combined with diferent using BN to express relationship of elements is not a new attributes. Second, if a guide word is the same between idea in traditional safety analysis [26, 27, 28]. We take the diferent levels, the one in the higher level may contribute relationship between several elements at the ML-lifecycle as the main reason for the latent-hazard of the lower level. level and the inner-ML level as an example to explore the

Example 11 We use “no” as an example. We can get possibility of using BN to represent it. This is an idea of a deviation “no action” at the system level, and have the quantitatively expressing relationships, since the higher deviation “no localisation” in the ML-lifecycle level. Given level contains some abstract concepts, it is dificult to they share the same guide word, we should consider whether represent in variables. Even if we assume that abstract the “no localisation” has a causality relation with the “no concepts are represented using variables, it is hard to action”. present Conditional Probability Tables (CPTs) as a pre

Moreover, it is assumed that there is an inclusive re- requisite for BN to start. All parameters used to quantify lationship between the guide words of the higher level BN must be obtained based on system background and and lower level, such as “no” and “part of”, or there are expert knowledge. similar meanings, such as “invalid” or “incompatible”.

The existence of a guide word with an inclusive relationship suggests that for the latent-hazard found in the lower level, its cause may belong to the higher level.

Example 12 If we choose “No action” at system level and “Part of definition” at the ML-lifecycle level (e.g., images without defined labels), then we may establish an inclusive relationship between “No” and “Part of”.

Example 13 We use “invalid data value” and “incom- Figure 3: A BN fragment (with illustrative probabilities) patible data value” as examples, “incompatible data value” may lead to the low accuracy of output or no results, it has Figure 3 shows a fragment of the BN model for the a similar meaning with “invalid data value”. running example, considering several security threats

Selecting guide words is arguably a quite subjective between the ML-lifecycle level and the inner-ML level. activity that experts may use diferent guide words with The nodes of a BN can represent threats ( .), causes similar semantics to identify the same cause. To this end, (.), or mitigation ( .), where variable ∈ {1, 2, 3} the proposed way of establishing relationships across ranges over the levels in HILLS and is the index of the levels can only cope with the ideal case in which identi- threat/cause/mitigation at a level. E.g., 2. is the -th cal guide words are used. Alternative methods are still threat at ML-lifecycle level. needed for other cases, which forms our future work. Besides, we need to assign CPT to each non-leaf node of the BN, and assign a prior probability to the leaf or set the observed evidence probability node. It is noted that the expert knowledge is needed for both the construction of the basic structure and the assignment of CPTs. The probabilities used in Figure 3 are for illustrative purposes, process [39] or consider the direct application of the HAwhile more enlightening examples can be found in [25]. ZOP to the hierarchical structure of traditional systems

Example 14 For threat nodes with no incoming arrows, with no ML components [40]. A hierarchical structure is such as 2. and 3., we may set the probability of their needed for its suitability to work with ML components occurrence to 100 percent. (black-box in general, and inside the black-box, it is a

Once constructed, we can make probabilistic inference layer-structure with each layer being a simple matheon the BN to ensure that the construction is correct w.r.t. matical function). In HILLS, we innovatively consider expert knowledge. The following are two typical exam- the interaction between humans and ML components ples, by applying the d-separation algorithm [29] (for and the internal structure of the ML components. Moredetermining dependencies of variables in a BN). over, inspired by [41], we investigate how to link and

Example 15 There may be multiple children nodes at propagate identified safety elements at diferent levels. diferent levels for a parent node. In Figure 3, the threat 2. has two causes, 2. and 3., at the ML-lifecycle STPA STAMP (Systems-Theoretic Accident Model and level and inner-ML level, respectively. While the two causes Processes) is also a very popular safety analysis method. may be mitigated separately as they belong to diferent STAMP uses three fundamental concepts from syslevels, the efectiveness of their respective mitigation might tem theory: Emergence and hierarchy, communication afect the probabilistic inference based on each other’s CPT and control, and process models [42]. STPA (System(under the condition that the probability for 2.1 is not Theoretic Process Analysis) uses such techniques, being observable). based on the STAMP model. STPA pays more attention

Example 16 There may be multiple parent nodes for to the overall control loop and process analysis of the a child node. In Figure 3, the mitigation 2., has two system, and focuses on unsafe control actions and causal causes, 2. and 2., representing that one mitigation factors in a control structure. It is widely used in railmay support two causes. By observing the efectiveness of way safety assurances [43], cyber safety and security the mitigation (i.e., the CPT of 2.), we will infer how [44], robotics [45] and driver-vehicle interactions [46]. one cause 2. may influence the other cause 2. and STPA is also used to explore a hierarchical structural vice versa. safety analysis framework in [47]. Comparing to STPA,

We note, the construction of the BN structure and HAZOP is relatively easier to conduct and clearer to comCPTs, as well as the above probabilistic inference, should municate, supported by structural decomposition of the be discussed and accepted by domain experts and all system functions [16]. We start with retrofitting HAstakeholders. We believe BN is potentially a powerful ZOP for LESs, while STPA ofers a new perspective to tool for the purpose of modelling probabilistic causality consider the feasibility of hierarchical safety analysis on relationship between elements of ML related levels, while LESs which is our planed future work. how to apply BN in practice in the context of HILLS remains an open challenge.

7. Conclusion 6. Related Work We propose a hierarchical HAZOP-like method, HILLS,

for the safety analysis of LESs. Being diferent from the HAZOP HAZOP is widely used in industrial domains, traditional HAZOP, HILLS analyses LESs in a hierarchical such as nuclear power [30] and chemical industry [31]. way, disentangling the complexity by working with three In recent years, there has been eforts on integrating separate levels first and then establishing their relations HAZOP with other methods [32, 33] to analyse com- via both qualitative and quantitative methods, e.g., BNs. mon causes and system scenarios [34]. A comprehensive HILLS is applied to a practical example of AUVs, with review of those techniques may refer to recent survey the discovery of new guide words as well as new causes papers, e.g. [35]. The application of HAZOP on computer- and mitigation related to ML. based systems first appears in [ 36]. After that, the expe- In conclusion, HILLS complements HAZOP when rience gained from application of HAZOP and related working with LESs, and is able to identify safety hazards techniques to computer-based systems was summarised and security threats related to ML components through in [37]. There is a recent trend of applying HAZOP-like its structural advantages. analysis to LESs, e.g., in autonomous car context [38].

Hierarchical structure The concept of hierarchy is not new, but existing papers either focus on the hierarchical priority of the analysis order in the HAZOP analysis Acknowledgments This work is supported by U.K. DSTL through the project

of Safety Argument for Learning-enabled Autonomous Underwater Vehicles and U.K. EPSRC through End- [12] H. Pasman, W. Rogers, How can we improve hazop, to-End Conceptual Guarding of Neural Architectures our old work horse, and do more with its results? [EP/T026995/1]. This project has received funding an overview of recent developments, Chemical from the European Union’s Horizon 2020 research and in- Engineering Transactions 48 (2016) 829–834. novation programme under grant agreement No 956123. [13] H. Ozog, Hazard identification and quantification, XZ’s contribution to the work is partially supported Chem. Eng. Prog. 83 (1987) 55–64. through Fellowships at the Assuring Autonomy Inter- [14] V. Cozzani, S. Bonvicini, G. Spadoni, S. Zanelli, Haznational Programme. YQ’s contribution to the work is mat transport: A methodological framework for the supported through Chinese Scholarship Council (CSC). risk analysis of marshalling yards, Journal of Hazardous Materials 147 (2007) 412–423. [15] P. Aspinall, Hazops and human factors, in: InstiReferences tution of Chemical Engineers Symposium Series, volume 151, 2006, p. 820. [1] H. Lawley, Operability studies and hazard analysis, [16] L. Sun, Y.-F. Li, E. Zio, Comparison of the ha

Chem. Eng. Prog. 70 (1974) 45–56. zop, fmea, fram, and stpa methods for the hazard [2] F. Crawley, B. Tyler, Chapter 3 - the hazop study analysis of automatic emergency brake systems, method, in: F. Crawley, B. Tyler (Eds.), HAZOP: ASCE-ASME Journal of Risk and Uncertainty in EnGuide to Best Practice (3rd Edition), Elsevier, 2015. gineering Systems, Part B: Mechanical Engineering [3] J. Dunjó, V. Fthenakis, J. A. Vílchez, J. Arnaldos, 8 (2022).

Hazard and operability (hazop) analysis. a literature [17] D. Slater, The Hazop methodology, 2015. review, Journal of Hazardous Materials 173 (2010) [18] X. Zhao, W. Huang, A. Banks, V. Cox, D. Flynn, 19–32. S. Schewe, X. Huang, Assessing the reliability of [4] D. Lane, D. Bisset, R. Buckingham, G. Pegman, deep learning classifiers through robustness evalT. Prescott, New foresight review on robotics and uation and operational profiles, in: AISafety’21 autonomous systems, Technical Report No. 2016.1, Workshop at IJCAI’21, volume 2916, ceur-ws.org, LRF, 2016. 2021. [5] X. Zhao, A. Banks, J. Sharp, V. Robu, D. Flynn, [19] W. Huang, X. Zhao, X. Huang, Embedding and M. Fisher, X. Huang, A Safety Framework for Crit- extraction of knowledge in tree ensemble classifiers, ical Systems Utilising Deep Neural Networks, in: Machine Learning 111 (2022) 1925–1958. Computer Safety, Reliability, and Security (Safe- [20] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Comp’20), volume 12234 of LNCS, Springer, Cham, Müller, W. Samek, On pixel-wise explanations for 2020, pp. 244–259. non-linear classifier decisions by layer-wise rele[6] E. Asaadi, E. Denney, G. Pai, Quantifying assur- vance propagation, PloS one 10 (2015) e0130140. ance in learning-enabled systems, in: SafeComp’20, [21] X. Zhao, W. Huang, X. Huang, V. Robu, D. Flynn, volume 12234 of LNCS, Springer, Cham, 2020, pp. BayLIME: Bayesian local interpretable model270–286. agnostic explanations, in: Proc. of the 37th Conf. [7] R. Bloomfield, H. Khlaaf, P. R. Conmy, G. Fletcher, on Uncertainty in Artificial Intelligence, UAI’21, Disruptive innovations and disruptive assurance: PMLR, 2021, pp. 887–896.

Assuring machine learning and autonomy, Com- [22] J. Jurkiewicz, J. Nawrocki, M. Ochodek, T. Głowacki, puter 52 (2019) 82–89. Hazop-based identification of events in use cases: [8] E. Alves, D. Bhatt, B. Hall, K. Driscoll, A. Muruge- An empirical study, Empir Software Eng 20 (2015) san, J. Rushby, Considerations in assuring safety 82–109. of increasingly autonomous systems, Technical Re- [23] E. Lee, Y. Park, J. G. Shin, Large engineering project port NASA/CR-2018-220080, NASA, 2018. risk management using a bayesian belief network, [9] S. Burton, I. Habli, T. Lawton, J. McDermid, P. Mor- Expert Systems with Applications 36 (2009) 5880– gan, Z. Porter, Mind the gaps: Assuring the safety 5887. of autonomous systems from an engineering, eth- [24] J. Cheng, R. Greiner, J. Kelly, D. Bell, W. Liu, Learnical, and legal perspective, Artificial Intelligence ing bayesian networks from data: An information279 (2020) 103201. theory based approach, Artificial intelligence 137 [10] P. Andow, H. G. Britain, E. Safety, Guidance on (2002) 43–90.

HAZOP procedures for computer-controlled plants, [25] N. Berthier, A. Alshareef, J. Sharp, S. Schewe, Great Britain, Health and Safety Executive, 1991. X. Huang, Abstraction and symbolic execution of [11] D. J. Burns, R. M. Pitblado, A Modified Hazop deep neural networks with bayesian approximation Methodology For Safety Critical, Springer London, of hidden features (2021).

London, 1993. [26] S. Thomas, K. Groth, Toward a hybrid causal framework for autonomous vehicle safety analy- Control Laboratory SCL-009/2003 (2003). sis, Proceedings of the Institution of Mechanical [41] M. Wallace, Modular architectural representation Engineers, Part O: Journal of Risk and Reliability and analysis of fault propagation and transforma(2021) 1748006X2110433. tion, Electronic Notes in Theoretical Computer [27] E. Denney, G. Pai, I. Habli, Towards measurement Science 141 (2005) 53–71.

of confidence in safety cases, in: Int. Symp. on [42] N. Leveson, Engineering a Safer World: Systems Empirical Software Engin. and Measurement, 2011, Thinking Applied to Safety, Engineering systems, pp. 380–383. MIT Press, 2011. [28] X. Zhao, D. Zhang, M. Lu, F. Zeng, A new approach [43] P. Yang, R. Karashima, K. Okano, S. Ogata, Autoto assessment of confidence in assurance cases, in: mated inspection method for an stamp/stpa - fallen Computer Safety, Reliability, and Security (Safe- barrier trap at railroad crossing -, Procedia ComComp’12), volume 7613 of LNCS, Springer, 2012, pp. puter Science 159 (2019) 1165–1174. 79–91. [44] T. Kaneko, Y. Takahashi, T. Okubo, R. Sasaki, Threat [29] D. Koller, N. Friedman, Probabilistic Graphical Mod- analysis using stride with stamp/stpa, in: Proc. of els: Principles and Techniques, Adaptive computa- the Int. Workshop on Evidence-based Security and tion and machine learning, MIT Press, 2009. Privacy in the Wild, 2018. [30] S. Rimkevičius, M. Vaišnoras, E. Babilas, E. Ušpuras, [45] A. Adriaensen, L. Pintelon, F. Costantino, G. D.

Hazop application for the nuclear power plants de- Gravio, R. Patriarca, An stpa safety analysis case commissioning projects, Annals of Nuclear Energy study of a collaborative robot application, IFAC(2016). PapersOnLine 54 (2021) 534–539. 17th IFAC Sym[31] W. Tian, T. Du, S. Mu, Hazop analysis-based dy- posium on Information Control Problems in Manunamic simulation and its application in chemical facturing INCOM 2021. processes, Asia-Pacific Journal of Chemical Engi- [46] S. Chen, S. Khastgir, I. Babaev, P. Jennings, Identineering 10 (2015) 923–935. fying accident causes of driver-vehicle interactions [32] P. K. Marhavilas, M. Filippidis, G. K. Koulinas, D. E. using system theoretic process analysis (stpa), in: Koulouriotis, An expanded hazop-study with fuzzy- 2020 IEEE Int. Conf. on Systems, Man, and Cyberahp (xpa-hazop technique): Application in a sour netics (SMC), 2020, pp. 3247–3253. crude-oil processing plant, Safety science 124 (2020) [47] M. Chaal, O. A. Valdez Banda, J. A. Glomsrud, S. Bas104590. net, S. Hirdaris, P. Kujala, A framework to model [33] M. Danko, J. Janošovsky`, J. Labovsky`, L. Jelemensky`, the stpa hierarchical control structure of an auIntegration of process control protection layer into tonomous ship, Safety Science 132 (2020) 104939. a simulation-based hazop tool, Journal of Loss Prevention in the Process Industries 57 (2019) 291–303. [34] E. Roche, W. Dupont, A. Summers, Beyond hazop:

Analyzing common cause and system scenarios,

Process Safety Progress 38 (2019) e11997. [35] F. Crawley, B. Tyler, HAZOP: Guide to Best Practice,

Elsevier Science, 2015. [36] M. Chudleigh, J. Catmur, Safety assessment of computer systems using HAZOP and audit techniques, in: SafeComp’92, Elsevier, 1992, pp. 285–292. [37] T. A. Kletz, Hazop–past and future, Reliability

Engineering & System Safety 55 (1997) 263–266. [38] B. Kramer, C. Neurohr, M. Büker, E. Böde, M. Fränzle, W. Damm, Identification and quantification of hazardous scenarios for automated driving, in: International Symposium on Model-Based Safety and Assessment, Springer, 2020, pp. 163–178. [39] M. R. Othman, R. Idris, M. H. Hassim, W. H. W.

Ibrahim, Prioritizing HAZOP analysis using analytic hierarchy process (AHP), Clean Technologies and Environmental Policy 18 (2016) 1345–1360. [40] E. Németh, R. Lakner, K. Hangos, I. Cameron, Hierarchical cpn model-based diagnosis using HAZOP knowledge, Technical report of the Systems and