<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Hierarchical HAZOP-Like Safety Analysis for Learning-Enabled Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yi Qi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippa Ryan Conmy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wei Huang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xingyu Zhao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowei Huang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Adelard Part of NCC Group</institution>
          ,
          <addr-line>London, N1 7UX</addr-line>
          ,
          <country country="UK">U.K</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Liverpool</institution>
          ,
          <addr-line>Liverpool, L69 3BX</addr-line>
          ,
          <country country="UK">U.K</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Hazard and Operability Analysis (HAZOP) is a powerful safety analysis technique with a long history in industrial process control domain. With the increasing use of Machine Learning (ML) components in cyber physical systems-so called LearningEnabled Systems (LESs), there is a recent trend of applying HAZOP-like analysis to LESs. While it shows a great potential to reserve the capability of doing suficient and systematic safety analysis, there are new technical challenges raised by the novel characteristics of ML that require retrofit of the conventional HAZOP technique. In this regard, we present a new Hierarchical HAZOP-Like method for LESs (HILLS). To deal with the complexity of LESs, HILLS first does “divide and conquer” by stratifying the whole system into three levels, and then proceeds HAZOP on each level to identify (latent-)hazards, causes, security threats and mitigation (with new nodes and guide words). Finally, HILLS attempts at linking and propagating the causal relationship among those identified elements within and across the three levels via both qualitative and quantitative methods. We examine and illustrate the utility of HILLS by a case study on Autonomous Underwater Vehicles, with discussions on assumptions and extensions to real-world applications. HILLS, as a first HAZOP-like attempt on LESs that explicitly considers ML internal behaviours and its interactions with other components, not only uncovers the inherent dificulties of doing safety analysis for LESs, but also demonstrates a good potential to tackle them.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Safety analysis</kwd>
        <kwd>HAZOP</kwd>
        <kwd>learning-enabled system</kwd>
        <kwd>trustworthy AI</kwd>
        <kwd>AI safety</kwd>
        <kwd>hazard identification</kwd>
        <kwd>autonomous underwater vehicle</kwd>
        <kwd>machine learning security</kwd>
        <kwd>deviation analysis</kwd>
        <kwd>robotics and autonomous system</kwd>
        <kwd>cyber physical system</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>for the whole system can be identified [3].</p>
      <p>In recent years, increasingly sophisticated
mathematiAfter initially developed to support the chemical process cal modelling processes from Machine Learning (ML) are
industries (by Lawley [1]), Hazard and Operability Anal- being used to analyse complex data and then embedded
ysis (HAZOP) has been successfully and widely applied into cyber physical systems—so called Learning-Enabled
in the past 50 years. It is generally acknowledged to be Systems (LESs). How to ensure the safety of LESs has
an efective yet simple method to systematically iden- become an enormous challenge [4, 5, 6]. As LESs are
distify safety hazards. HAZOP is a prescriptive analysis ruptively novel, they require new and advanced analysis
procedure designed to study the system operability by for the complex requirements on their safe and reliable
analysing the efects of any deviation from its design function [7]. Such analysis needs to be tailored to fully
intent [2]. A HAZOP does semi-formal, systematic, and evaluate the new character of ML [8, 9], making
concritical examination of the process and engineering inten- ventional methods including HAZOP and HAZOP-like
tions of the process design. The potential for hazards or variants (e.g., CHAZOP [10] and PES-HAZOP [11] that
operability problems are thus assessed, and malfunction are respectively introduced for computer-based and
proof individual components and associated consequences grammable electronic systems) obsolete. Moreover, LESs
exhibit unprecedented complexity, while past experience
2T0h2e2I)J,CJAulIy-E2C4A-2I5-2,220W2o2r,kVsiheonpnoan,AAurtsitficriiaal Intelligence Safety (AISafety suggests that HAZOP should be continuously retrofitted
* Corresponding author. to accommodate more complex systems [12],
consider$ yiqi@liverpool.ac.uk (Y. Qi); pmrc@adelard.com (P. R. Conmy); ing quantitative analysis frameworks [13, 14] and human
huang23@liverpool.ac.uk (W. Huang); factors [15]. To the best of our knowledge, there is no
xingyu.zhao@liverpool.ac.uk (X. Zhao); HAZOP-like safety analysis dedicated for LESs that takes
xiaohwttpeis.:h//ugainthgu@b.lcivoemrp/YoioQl.ia0c3.u18k ((YX.. QHiu);ang) into account ML characters while preserving the
simhttps://www.adelard.com/people/philippa-ryan.html (P. R. Conmy); plicity and efectiveness of HAZOP (comparing to other
https://intranet.csc.liv.ac.uk/~wh1923/ (W. Huang); conventional safety analysis methods [16]), which
motihttps://www.xzhao.me/ (X. Zhao); https://cgi.csc.liv.ac.uk/~xiaowei/ vates this research.
(X. Huang) In this paper, we introduce a new Hierarchical
HAZOP(X. 0H00u0a-n0g0)02-3474-349X (X. Zhao); 0000-0001-6267-0366 Like method for LESs (HILLS). HILLS first stratifies
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License the complex LESs into three levels—System Level,
MLCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) Lifecycle Level and Inner-ML Level, then applies HAZOP
separately on each level to identify safety elements of improvement of the system. More details are given for
interest, namely causes, mitigation, hazards (or latent- each step of HAZOP as what follows.
hazards for latent levels that cannot directly lead to
mishaps) and security threats. When applying HAZOP on Form HAZOP team To perform HAZOP, a team of
the ML related levels, we revise HAZOP to cope with ML specialists is formed according to the project scope and
characteristics, e.g., by introducing new ways of defining aims. These experts have extensive experience, expert
nodes and new sets of guide words. We also identify knowledge and understand the overall procedures of the
causes of hazards from the ML development process (mod- system deeply, such as operations, maintenance and
enelled by the ML-Lifecycle level) to reflect its data-driven gineering design.
nature (e.g., how data is collected, processed, etc).
Furthermore, we attempt to address the challenge of how Identify system elements The HAZOP team will
forto link and propagate those identified safety elements mally represent the system under study by identifying
within and across three levels, then propose both qualita- the elements. Each element is called a Node,
representtive and quantitative (an initial Bayesian Belief Network ing an operational function. Then, nodes and interactions
(BN) solution) methods to model the casual relationships. between nodes (e.g., data/control flows) collectively form
To examine the efectiveness and demonstrate the use the system representation under analysis.
case of HILLS, we finally conduct a case study on
Autonomous Underwater Vehicles (AUVs), with discussions
on assumptions adopted and extensions to real-world
applications.</p>
      <p>The key contributions of this work include:
a) A first HAZOP-like safety analysis for LESs that
explicitly considers ML characters (including security
threats and the data-driven nature in the development
process) and reduces the complexity by hierarchical
design.</p>
      <p>b) New considerations of dividing nodes in the system
representation and a set of new guide words that adapt
the traditional HAZOP for levels regarding ML models.</p>
      <p>c) A first attempt at linking/propagating identified
causes, mitigation, (latent-)hazards and security threats
across ML levels.</p>
      <p>d) Key challenges identified as research questions that
are generic to safety analysis for LESs in future research.</p>
      <sec id="sec-1-1">
        <title>Consider deviations of operational parameters</title>
        <p>HAZOP assumes that a problem can only arise when
there are some Deviations from the intent design.
HAZOP searches for deviations in the system representation.
The deviation on a node is expressed as the combination
of Guide Words and process Attributes .</p>
        <p>Each guide word is a short word to create the
imagination of a deviation of the design/process intent. The most
commonly used guide words are: no, more, less, as well
as, part of, other than, and so on. Guide words provide a
systematic and consistent means of brainstorming
potential deviations to normal operations. Each guide word has
a specific meaning, e.g., no means the complete negation
of the design intention, early means something occurred
earlier than intended time. Attributes are closely related
to nodes, and are usually the subject of the action being
performed. The definition of attributes relies on expert
knowledge.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries: HAZOP</title>
      <sec id="sec-2-1">
        <title>HAZOP is an inductive hazard assessment method that</title>
        <p>is conducted by an expert team. It systematically
investigates each element in the system with the goal to find
the potential situation that could cause the element to
pose hazards or limit the system’s normal operations.</p>
        <p>There are four basic steps to perform the HAZOP:
• Define the project scope and aims, and form an
expert team.
• Identify system elements and model the system
as a system representation.
• Consider possible deviation of operational
parameters.
• Identify hazards, causes and mitigation solutions.</p>
        <sec id="sec-2-1-1">
          <title>Identify hazards, causes and mitigation Where</title>
          <p>the result of a deviation would be a danger to
workers or to the production process, a potential problem is
found. Hazard (H) is a source of potential damage, harm
or adverse health efects on something/someone, while
mishaps are damages or harms on something/someone.
Cause (C) is the reasons why the deviation could occur.
It is possible that several causes are identified for one
deviation. Mitigation (M) helps to reduce the occurrence
frequency of the deviations or to mitigate their
consequences. Hazards, causes, and mitigation are usually
assigned with their respective IDs.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Problem Statement</title>
      <sec id="sec-3-1">
        <title>Once the four steps are completed, team members may generate additional safety requirements if necessary to mitigate or prevent the identified issues, leading to</title>
      </sec>
      <sec id="sec-3-2">
        <title>Given HAZOP was not originally designed for LESs, in</title>
        <p>evitably new problems arise when attempting to apply
HAZOP on LESs. These problems are formalized as a set
of research questions (RQs) proposed in this section. We RQ4: How to establish the relationship between
ifrst present the rationale behind those RQs (i.e., justifica- identified safety elements across levels? For
simtion of how we have come to the RQs) and then articulate plicity, HAZOP is expected to be applied separately
what would be the expected solution to each RQ. to each level of a hierarchical system representation.
Therefore, to get the safety analysis of the whole
comRQ1: How to reduce the complexity of LESs so that plex system, it is necessary to study the relationship
HAZOP can be efectively applied to? HAZOP is a between identified safety elements—namely causes,
mitsemi-formalised analytical method, used to identify the igation, hazards (and latent-hazards)—across diferent
hazard scenarios of a defined process, and it has been levels. Then, based on the nature of the relationship (e.g.,
successfully used on relatively simple systems. When fac- causal or not, quantitative or qualitative, probabilistic
ing a complex system, HAZOP often cannot play its role or deterministic), proper formalism should be used to
well. LESs exhibit unprecedented complexity, rendering establish and express such relationship of those hazard
directly applying HAZOP to LESs infeasible. Therefore, analysis results collected from each level.
we need to reduce the complexity in the system
representation. A simple yet efective solution is by “divide and 4. Running Example
conquer”, e.g., stratifying a complex system into
multiple levels. In this regard, a promising solution to RQ1 is
to propose a hierarchical system representation, so that
HAZOP can be efectively applied.</p>
        <sec id="sec-3-2-1">
          <title>RQ2: How to define nodes in each level, especially for novel levels regarding ML? We assume that HA</title>
          <p>ZOP can efectively handle a single level system
representation, as we expect to introduce a hierarchical structure
in the RQ1 solution. The second step of HAZOP is to
divide nodes at each level (presuming we already have a
group of experts as the HAZOP team). Past experience
shows that division of nodes can be based on the
functionalities of components in the system [17], so we may
continue using such traditional method for those non-ML
related levels. However, when there are ML components
in the system under analysis, it is dificult for the
traditional division method of nodes to be directly applied.
Therefore, RQ2 is raised to explore the novel definition
of “functionalities” at ML-related levels.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>We present a running example from the SOLITUDE</title>
        <p>project1, which conducts safety analysis on an AUV that
autonomously finds a dock and performs the docking
task. The workflow of the scenario is given in Figure 1.</p>
        <p>The robot starts when received the user’s command.</p>
        <p>RQ3: Will there be any new guide words related to Once started, it uses sensors (e.g., cameras) to receive
ML? Guide word is one of the key compositions of a data. Data is transmitted and preprocessed before feeding
deviation. The team of experts is responsible for iden- into the YOLO model for object detection and
localisatifying guide words that fit the scope of their analysis, tion. The localisation result is further utilised for path
while common guide words used were No, Less/More, planning. In addition, the above normal workflow may
Slower/Faster, Early/Late, etc. However, the existing sufer from external attacks on some stages, including
set of guide words is unproven for use in ML applica- data transmission, data preparation, and path planning.
tions, so this RQ aims at determining the efectiveness We remark that, the scenario in the project is more
comand new meanings of known guide words for ML related plex, including utilising deep reinforcement learning for
levels, and checking whether there might be missing motion planning, but for the space limit, this paper only
guide words. Although we expect most of the known focuses on the perception component.
guide words can still be applicable, they might miss some
deviations given the new characteristics of ML. Thus,
prospective new guide words may be introduced, they 5. Proposed Method
might miss some deviations given the new
characteristics of ML. Thus, prospective new guide words may be In this section, we present the HILLS method, and
comintroduced. pare it with HAZOP. HILLS is inheriting from HAZOP</p>
      </sec>
      <sec id="sec-3-4">
        <title>1https://github.com/Solitude-SAMR/UWV_RAM</title>
        <p>the basic structure composition and definitions of ele- diferent functions, and they will be categorized as
difments, with extensions that are suitable for LESs. The ferent Nodes. Consider the running example in Figure 1,
tables and figures presented in this section are partial for “blue blocks” represent the functional areas of the
runillustrative purpose only, cf. the complete HILLS analysis ning example, which means that our nodes can be set
results based on the SOLITUDE project at the GitHub according to these blocks. An example of setting nodes
repository1. is provided in Table 1. We note, the setting of nodes is
specific to the system under investigation. E.g., the node
5.1. Hierarchical HAZOP “Labeling” was not included in Figure 1.
Some guide words originated from, e.g., the chemical
As shown in Figure 2, HILLS has a three-level structure, industry can still be used in LESs. Attributes related to
including system level, ML-lifecycle level and inner-ML the LES are used together with the guide words to express
level. We analyse each level individually in this subsec- deviations.
tion, and discuss their relations in Section 5.2. Note, the Example 1 At system level, we discovered several
hazHILLS structure discussed here is generic (for illustra- ards from the running example, some of them are
sumtion purpose), and may be subject to adaptation when marised in Table 2. E.g., one of the hazards is “erratic
working with specific systems. trajectory”, suggesting that the robot moves into an unsafe
area. This hazard is associated with a deviation “no action”
where “no” is the guide word and “action” is the attribute
(when the AUV takes no actions in the water, the
disturbance of current makes it dificult for the robot to maintain
a stable trajectory). One of the causes of the hazard is “no
data from sensor”, which can be mitigated by, e.g., the use
of an acoustic guidance system as a duplicated perception
component based on another sensor.</p>
        <p>Example 2 Some hazards, such as “erratic trajectory”,
may appear in diferent nodes, which suggests that they
may occur more often, and thus may have the higher
priority to be mitigated after considering the severity of
consequences as well.</p>
        <p>Example 3 One hazard can be mitigated in diferent
ways. For example, we identified several mitigation
solutions for the “erratic trajectory”, most of which focus on
Figure 2: The 3 level hierarchical structure of HILLS early prevention, such as “maximum safe distance
maintained if uncertain” and “camera health monitor”.</p>
        <p>HILLS aims to exhaustively cover all potential hazards.</p>
        <p>Table 1 In the running example, the possible causes of crashes
Nodes in each level in SOLITUDE example or failing to turn directions when facing obstacles may
include “no data from sensors (instantaneous or
permaLevel Node Description nent)”, and “misclassification”, corresponding to the
errors in hardware and software components, respectively.</p>
        <p>However, the hazards, causes or mitigation may not be
fully identifiable at this level. For example, there are
other mitigation solutions for the cause
“misclassification” that need to consider how the ML component is
trained and constructed. However, the system level alone
cannot naturally include relevant nodes for this purpose.</p>
        <p>This motivates us to consider other levels (as discussed
below).</p>
        <p>User
Hardware components
Data transmission
Data collection
Labeling
Data preprocessing
Hyperparameter setting
Model deployment
Feature Extracting
Object Detection</p>
        <p>Localisation
System level
System level
System level
ML-lifecycle level
ML-lifecycle level
ML-lifecycle level
ML-lifecycle level
ML-lifecycle level
Inner-ML level
Inner-ML level
ML-lifecycle level</p>
        <p>Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Node 7
Node 8
Node 9
Node 10</p>
        <p>Node 11</p>
        <sec id="sec-3-4-1">
          <title>5.1.1. System level</title>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>HILLS at the system level largely follows HAZOP. Hardware, software, and ML components of an LES represent</title>
        <sec id="sec-3-5-1">
          <title>5.1.2. ML-Lifecycle level</title>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>The key motivation for the ML-lifecycle level is to handle</title>
        <p>the complexity arising from the integration of ML
components into an LES, considering mainly the human factors
and security threats involved in the development process
of ML models. Thus, deviations from this level cannot be of data preparation. Aforementioned mistakes are direct
identified if analysis was only conducted at the system human errors. There are also adversarial attacks that
level. On the other hand, the hazards at system level can lead to significant drop in performance, which are
may be attributed to the hazards at ML-lifecycle level, classified as security threats. Some examples are shown
e.g., the low prediction accuracy of ML component may in Table 4.
be caused by the polluted data in the data collection or Example 4 On the node “data collection”, there is a
insuficient epochs of training. For the running example, threat “data poisoning”, which occurs because the input
through the analysis at the ML-lifecycle level, we know data is contaminated. A suggested mitigation is to deploy
that the low accuracy of the results may be caused by a detector based on data provenance.
inaccurate labeling. We remark that, deviations identi- Example 5 For ML components, we identified
mitigaifed at non system level are called Latent-hazards (LH), tion, e.g., “classifier reliability for critical objects &gt;X” [ 18],
as they pose indirect hazards from latent levels with no to reduce misclassifications with safety impacts.
hardware components being interacted and thus cannot Example 6 For the latent-hazards “low prediction
accudirectly lead to mishaps. racy”, its causes include “users make mistakes on labelling”,</p>
        <p>Table 3 presents a set of guide words that are required “data itself is missing”, and “data itself is incomplete”, each
at this level. These guide words are redefined from the of which has their suggested mitigation (cf. Table 4).
existing guide words in HAZOP. Table 3 includes both Example 7 There is a deviation “attack”, whose threats
their original meanings (in HAZOP) and new meanings are various attacks, e.g., evasion attack, backdoor attack,
(in HILLS). “part of” represents a qualitative modification and data poisoning attack. Their respective cause is
usuin the original meaning, and in HILLS it may mean the ally that a certain entity in the training or inference of an
incompleteness of the structures, definitions, or settings. ML model (e.g., input instance, model structure, training,
For “Less” and “More”, considering that we are concerned dataset) is perturbed, modified, or contaminated. Their
about data flow and data value, their new meanings refer respective mitigation can be very specific (cf. Table 4), e.g.,
to the amount of data rather than, e.g., the water volume. the backdoor detector in [19] for tree ensemble classifiers.</p>
        <sec id="sec-3-6-1">
          <title>5.1.3. Inner-ML level</title>
        </sec>
      </sec>
      <sec id="sec-3-7">
        <title>ML components such as YOLO are composed of one or</title>
        <p>Guide word Part of more ML models, each of which is formed of a set of
Original Meaning Qualitative modification functional layers. Even after a thorough analysis of all
New Meaning Incomplete definition or setting possible deviations (with mitigation solutions) in the ML
Guide word Less development process modelled by our ML-lifecycle level,
Original Meaning Too little additive volume added the ML components may not perform as expected, e.g.,
New Meaning A less amount of data the convolutional layers fail to extract features accurately,
Guide word More and the fully connected layers fail to make reliable
classiNOeriwginMaelaMnienagning TAoloarmgeucahmaodudnittiovfedvaotlaume added ifcations. Thus, safety analysis on the internal structure
of an ML component is required. At the inner-ML level,</p>
        <p>HILLS takes the method of extracting basic layers of an</p>
        <p>Safety analysis at the ML-lifecycle level can exhibit ML component to form a model for analysis. To cater
new latent-hazards, as shown in Table 4. While ML mod- for diferent complexity of the ML component, two
exels are subject to security issues, we believe malicious traction methods are proposed. The first one deals with
attacking behaviors should also be considered as security simple models with up to 5 layers. It follows the layer
Threats (T). Human factors are considered because ML structure and considers each layer to represent a
sepadevelopment is a human-centered process, which makes rate functionality. Consequently, each layer is defined
possible some human related errors such as labelling er- as a node in the system representation. The second one
rors, part of operations were forgotten and the omission deals with more complex, larger models by abstracting a
model into several functional blocks and every block may
contain a number of layers. Our analysis in the running
example follows the second method.</p>
        <sec id="sec-3-7-1">
          <title>5.1.4. Further Considerations on Use Cases of</title>
        </sec>
        <sec id="sec-3-7-2">
          <title>HILLS</title>
        </sec>
      </sec>
      <sec id="sec-3-8">
        <title>HAZOP is to provide a systematic, critical examination of</title>
        <p>the process (and engineering intent) of a new or existing
Table 5 facility, and should normally be done before the system is
New guide words of ML-Lifecycle and inner-ML levels oficially put into service [ 22]. Nevertheless, we believe
Guide words Meaning that HILLS can still be applied after the occurrence of
an accident, in particular the recent technologies have
Wrong Wrong setting or data value enabled the recording of system executions through, e.g.,
Invalid Icnovnaflliidctdinagtawviathluoetohrerdcaotamfploowne,nptosssibly dHiIrLeLctSombaseyruvsaetitohne, rreeccoorrddiendgvsitdoeiod,eonrtisfnyarpeslahtoetdimcaaugseess.
Incomplete Incomplete data value and hazards.</p>
        <p>Perturbed Data was perturbed by external attackers Moreover, we note the following points when using
Incapable Part of data can not be labeled HILLS. First, when dealing with an LES, we focus on the
workflow or the pipeline diagram of the entire system, to
identify nodes according to the method we explained
ear</p>
        <p>We identified several new guide words, as shown in lier. The analysis at the system level can help us identify
Table 5, which are highly relevant to the setup of the the hazards sourced from the ML components, to enable
ML component and data flow. It is worth noting that the the analysis at the lower levels.
“Perturbed” is a special guide word that is needed when Second, guide words will be combined with the
atconsidering the existence of an external attacker. tributes of each node to form deviations. This will
pro</p>
        <p>Example 8 Deviations containing “perturbed” are usu- ceed sequentially following the level structure of HILLS,
ally proprietary attacks, e.g., we record “perturbed dataset” i.e., the deviations at the system level will be identified
as “attack” and the threat as “data poisoning” (cf. Table 4). ifrst, followed by the ML-lifecycle level, and the inner-ML</p>
        <p>As shown in Table 6, HILLS performs analysis inside level.
an ML model, which in general is closely related to the Third, before looking for (latent-)hazards, causes, and
internal structure of the model. mitigation at each level, we are based on a reasonable</p>
        <p>Example 9 When the ML component has wrong output, assumption that mitigation solutions of higher levels are
we can get from the inner-ML level analysis that this may easier than lower levels. That said, HILLS may not need
be related to the setting of the hyperparameter. Explainable to be conducted at the inner-ML level, and can stop when
AI (XAI) methods may help users to, e.g., locate which layer all hazards are found and mitigated at other levels.
of neurons contribute the most to the wrong ML behaviours
[20] and detect backdoors [21].</p>
        <p>Example 10 At the inner-ML level, we focus on the ML 5.2. Relations Between Levels
model structure itself. E.g., unsuitable parameter setting
in activation functions or pooling layers also make specific
latent-hazards. It also leads to wrong outputs or losing part
of information of figures (cf. Table 6).</p>
      </sec>
      <sec id="sec-3-9">
        <title>Up to now, we have identified the nodes, attributes, guide</title>
        <p>words, (latent-)hazards, threats, causes, and mitigation
solutions for individual levels in the HILLS framework.
We also notice that the relations between these elements
can be very complicated. This calls for a formal analysis
of the relations. While formalising the relations between
levels is a significant challenge, and there might not be
one best way, we propose to study them both qualitatively
and quantitatively.</p>
      </sec>
      <sec id="sec-3-10">
        <title>A BN is a graphical model that presents probabilistic re</title>
        <p>lationships between a set of variables by determining
5.2.1. Qualitative Analysis causal relationships between them [23]. It is also a
powQualitative analysis studies the connections between lev- erful tool for knowledge representation and reasoning
els, with the guide words as entry points. The guide under uncertainty, visually presenting probabilistic
relawords and the deviations may have the following con- tionships between a set of variables [24]. Actually, BN has
nections. already been used to study the relation between latent</p>
        <p>First of all, the same guide words at a level have strong features learned by a deep neural network [25]. While
associations, even if they are combined with diferent using BN to express relationship of elements is not a new
attributes. Second, if a guide word is the same between idea in traditional safety analysis [26, 27, 28]. We take the
diferent levels, the one in the higher level may contribute relationship between several elements at the ML-lifecycle
as the main reason for the latent-hazard of the lower level. level and the inner-ML level as an example to explore the</p>
        <p>Example 11 We use “no” as an example. We can get possibility of using BN to represent it. This is an idea of
a deviation “no action” at the system level, and have the quantitatively expressing relationships, since the higher
deviation “no localisation” in the ML-lifecycle level. Given level contains some abstract concepts, it is dificult to
they share the same guide word, we should consider whether represent in variables. Even if we assume that abstract
the “no localisation” has a causality relation with the “no concepts are represented using variables, it is hard to
action”. present Conditional Probability Tables (CPTs) as a
pre</p>
        <p>Moreover, it is assumed that there is an inclusive re- requisite for BN to start. All parameters used to quantify
lationship between the guide words of the higher level BN must be obtained based on system background and
and lower level, such as “no” and “part of”, or there are expert knowledge.
similar meanings, such as “invalid” or “incompatible”.</p>
        <p>The existence of a guide word with an inclusive
relationship suggests that for the latent-hazard found in the
lower level, its cause may belong to the higher level.</p>
        <p>Example 12 If we choose “No action” at system level and
“Part of definition” at the ML-lifecycle level (e.g., images
without defined labels), then we may establish an inclusive
relationship between “No” and “Part of”.</p>
        <p>Example 13 We use “invalid data value” and “incom- Figure 3: A BN fragment (with illustrative probabilities)
patible data value” as examples, “incompatible data value”
may lead to the low accuracy of output or no results, it has Figure 3 shows a fragment of the BN model for the
a similar meaning with “invalid data value”. running example, considering several security threats</p>
        <p>Selecting guide words is arguably a quite subjective between the ML-lifecycle level and the inner-ML level.
activity that experts may use diferent guide words with The nodes of a BN can represent threats ( .), causes
similar semantics to identify the same cause. To this end, (.), or mitigation ( .), where variable  ∈ {1, 2, 3}
the proposed way of establishing relationships across ranges over the levels in HILLS and  is the index of the
levels can only cope with the ideal case in which identi- threat/cause/mitigation at a level. E.g.,  2. is the -th
cal guide words are used. Alternative methods are still threat at ML-lifecycle level.
needed for other cases, which forms our future work. Besides, we need to assign CPT to each non-leaf node
of the BN, and assign a prior probability to the leaf or set
the observed evidence probability node. It is noted that
the expert knowledge is needed for both the construction
of the basic structure and the assignment of CPTs. The
probabilities used in Figure 3 are for illustrative purposes, process [39] or consider the direct application of the
HAwhile more enlightening examples can be found in [25]. ZOP to the hierarchical structure of traditional systems</p>
        <p>Example 14 For threat nodes with no incoming arrows, with no ML components [40]. A hierarchical structure is
such as  2. and  3., we may set the probability of their needed for its suitability to work with ML components
occurrence to 100 percent. (black-box in general, and inside the black-box, it is a</p>
        <p>Once constructed, we can make probabilistic inference layer-structure with each layer being a simple
matheon the BN to ensure that the construction is correct w.r.t. matical function). In HILLS, we innovatively consider
expert knowledge. The following are two typical exam- the interaction between humans and ML components
ples, by applying the d-separation algorithm [29] (for and the internal structure of the ML components.
Moredetermining dependencies of variables in a BN). over, inspired by [41], we investigate how to link and</p>
        <p>Example 15 There may be multiple children nodes at propagate identified safety elements at diferent levels.
diferent levels for a parent node. In Figure 3, the threat
 2. has two causes, 2. and 3., at the ML-lifecycle STPA STAMP (Systems-Theoretic Accident Model and
level and inner-ML level, respectively. While the two causes Processes) is also a very popular safety analysis method.
may be mitigated separately as they belong to diferent STAMP uses three fundamental concepts from
syslevels, the efectiveness of their respective mitigation might tem theory: Emergence and hierarchy, communication
afect the probabilistic inference based on each other’s CPT and control, and process models [42]. STPA
(System(under the condition that the probability for  2.1 is not Theoretic Process Analysis) uses such techniques, being
observable). based on the STAMP model. STPA pays more attention</p>
        <p>Example 16 There may be multiple parent nodes for to the overall control loop and process analysis of the
a child node. In Figure 3, the mitigation  2., has two system, and focuses on unsafe control actions and causal
causes, 2. and 2., representing that one mitigation factors in a control structure. It is widely used in
railmay support two causes. By observing the efectiveness of way safety assurances [43], cyber safety and security
the mitigation (i.e., the CPT of  2.), we will infer how [44], robotics [45] and driver-vehicle interactions [46].
one cause 2. may influence the other cause 2. and STPA is also used to explore a hierarchical structural
vice versa. safety analysis framework in [47]. Comparing to STPA,</p>
        <p>We note, the construction of the BN structure and HAZOP is relatively easier to conduct and clearer to
comCPTs, as well as the above probabilistic inference, should municate, supported by structural decomposition of the
be discussed and accepted by domain experts and all system functions [16]. We start with retrofitting
HAstakeholders. We believe BN is potentially a powerful ZOP for LESs, while STPA ofers a new perspective to
tool for the purpose of modelling probabilistic causality consider the feasibility of hierarchical safety analysis on
relationship between elements of ML related levels, while LESs which is our planed future work.
how to apply BN in practice in the context of HILLS
remains an open challenge.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>7. Conclusion</title>
    </sec>
    <sec id="sec-5">
      <title>6. Related Work</title>
      <sec id="sec-5-1">
        <title>We propose a hierarchical HAZOP-like method, HILLS,</title>
        <p>for the safety analysis of LESs. Being diferent from the
HAZOP HAZOP is widely used in industrial domains, traditional HAZOP, HILLS analyses LESs in a hierarchical
such as nuclear power [30] and chemical industry [31]. way, disentangling the complexity by working with three
In recent years, there has been eforts on integrating separate levels first and then establishing their relations
HAZOP with other methods [32, 33] to analyse com- via both qualitative and quantitative methods, e.g., BNs.
mon causes and system scenarios [34]. A comprehensive HILLS is applied to a practical example of AUVs, with
review of those techniques may refer to recent survey the discovery of new guide words as well as new causes
papers, e.g. [35]. The application of HAZOP on computer- and mitigation related to ML.
based systems first appears in [ 36]. After that, the expe- In conclusion, HILLS complements HAZOP when
rience gained from application of HAZOP and related working with LESs, and is able to identify safety hazards
techniques to computer-based systems was summarised and security threats related to ML components through
in [37]. There is a recent trend of applying HAZOP-like its structural advantages.
analysis to LESs, e.g., in autonomous car context [38].</p>
      </sec>
      <sec id="sec-5-2">
        <title>Hierarchical structure The concept of hierarchy is not new, but existing papers either focus on the hierarchical priority of the analysis order in the HAZOP analysis</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <sec id="sec-6-1">
        <title>This work is supported by U.K. DSTL through the project</title>
        <p>of Safety Argument for Learning-enabled Autonomous
Underwater Vehicles and U.K. EPSRC through End- [12] H. Pasman, W. Rogers, How can we improve hazop,
to-End Conceptual Guarding of Neural Architectures our old work horse, and do more with its results?
[EP/T026995/1]. This project has received funding an overview of recent developments, Chemical
from the European Union’s Horizon 2020 research and in- Engineering Transactions 48 (2016) 829–834.
novation programme under grant agreement No 956123. [13] H. Ozog, Hazard identification and quantification,
XZ’s contribution to the work is partially supported Chem. Eng. Prog. 83 (1987) 55–64.
through Fellowships at the Assuring Autonomy Inter- [14] V. Cozzani, S. Bonvicini, G. Spadoni, S. Zanelli,
Haznational Programme. YQ’s contribution to the work is mat transport: A methodological framework for the
supported through Chinese Scholarship Council (CSC). risk analysis of marshalling yards, Journal of
Hazardous Materials 147 (2007) 412–423.
[15] P. Aspinall, Hazops and human factors, in:
InstiReferences tution of Chemical Engineers Symposium Series,
volume 151, 2006, p. 820.
[1] H. Lawley, Operability studies and hazard analysis, [16] L. Sun, Y.-F. Li, E. Zio, Comparison of the
ha</p>
        <p>Chem. Eng. Prog. 70 (1974) 45–56. zop, fmea, fram, and stpa methods for the hazard
[2] F. Crawley, B. Tyler, Chapter 3 - the hazop study analysis of automatic emergency brake systems,
method, in: F. Crawley, B. Tyler (Eds.), HAZOP: ASCE-ASME Journal of Risk and Uncertainty in
EnGuide to Best Practice (3rd Edition), Elsevier, 2015. gineering Systems, Part B: Mechanical Engineering
[3] J. Dunjó, V. Fthenakis, J. A. Vílchez, J. Arnaldos, 8 (2022).</p>
        <p>Hazard and operability (hazop) analysis. a literature [17] D. Slater, The Hazop methodology, 2015.
review, Journal of Hazardous Materials 173 (2010) [18] X. Zhao, W. Huang, A. Banks, V. Cox, D. Flynn,
19–32. S. Schewe, X. Huang, Assessing the reliability of
[4] D. Lane, D. Bisset, R. Buckingham, G. Pegman, deep learning classifiers through robustness
evalT. Prescott, New foresight review on robotics and uation and operational profiles, in: AISafety’21
autonomous systems, Technical Report No. 2016.1, Workshop at IJCAI’21, volume 2916, ceur-ws.org,
LRF, 2016. 2021.
[5] X. Zhao, A. Banks, J. Sharp, V. Robu, D. Flynn, [19] W. Huang, X. Zhao, X. Huang, Embedding and
M. Fisher, X. Huang, A Safety Framework for Crit- extraction of knowledge in tree ensemble classifiers,
ical Systems Utilising Deep Neural Networks, in: Machine Learning 111 (2022) 1925–1958.
Computer Safety, Reliability, and Security (Safe- [20] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R.
Comp’20), volume 12234 of LNCS, Springer, Cham, Müller, W. Samek, On pixel-wise explanations for
2020, pp. 244–259. non-linear classifier decisions by layer-wise
rele[6] E. Asaadi, E. Denney, G. Pai, Quantifying assur- vance propagation, PloS one 10 (2015) e0130140.
ance in learning-enabled systems, in: SafeComp’20, [21] X. Zhao, W. Huang, X. Huang, V. Robu, D. Flynn,
volume 12234 of LNCS, Springer, Cham, 2020, pp. BayLIME: Bayesian local interpretable
model270–286. agnostic explanations, in: Proc. of the 37th Conf.
[7] R. Bloomfield, H. Khlaaf, P. R. Conmy, G. Fletcher, on Uncertainty in Artificial Intelligence, UAI’21,
Disruptive innovations and disruptive assurance: PMLR, 2021, pp. 887–896.</p>
        <p>Assuring machine learning and autonomy, Com- [22] J. Jurkiewicz, J. Nawrocki, M. Ochodek, T. Głowacki,
puter 52 (2019) 82–89. Hazop-based identification of events in use cases:
[8] E. Alves, D. Bhatt, B. Hall, K. Driscoll, A. Muruge- An empirical study, Empir Software Eng 20 (2015)
san, J. Rushby, Considerations in assuring safety 82–109.
of increasingly autonomous systems, Technical Re- [23] E. Lee, Y. Park, J. G. Shin, Large engineering project
port NASA/CR-2018-220080, NASA, 2018. risk management using a bayesian belief network,
[9] S. Burton, I. Habli, T. Lawton, J. McDermid, P. Mor- Expert Systems with Applications 36 (2009) 5880–
gan, Z. Porter, Mind the gaps: Assuring the safety 5887.
of autonomous systems from an engineering, eth- [24] J. Cheng, R. Greiner, J. Kelly, D. Bell, W. Liu,
Learnical, and legal perspective, Artificial Intelligence ing bayesian networks from data: An
information279 (2020) 103201. theory based approach, Artificial intelligence 137
[10] P. Andow, H. G. Britain, E. Safety, Guidance on (2002) 43–90.</p>
        <p>HAZOP procedures for computer-controlled plants, [25] N. Berthier, A. Alshareef, J. Sharp, S. Schewe,
Great Britain, Health and Safety Executive, 1991. X. Huang, Abstraction and symbolic execution of
[11] D. J. Burns, R. M. Pitblado, A Modified Hazop deep neural networks with bayesian approximation
Methodology For Safety Critical, Springer London, of hidden features (2021).</p>
        <p>London, 1993. [26] S. Thomas, K. Groth, Toward a hybrid causal
framework for autonomous vehicle safety analy- Control Laboratory SCL-009/2003 (2003).
sis, Proceedings of the Institution of Mechanical [41] M. Wallace, Modular architectural representation
Engineers, Part O: Journal of Risk and Reliability and analysis of fault propagation and
transforma(2021) 1748006X2110433. tion, Electronic Notes in Theoretical Computer
[27] E. Denney, G. Pai, I. Habli, Towards measurement Science 141 (2005) 53–71.</p>
        <p>of confidence in safety cases, in: Int. Symp. on [42] N. Leveson, Engineering a Safer World: Systems
Empirical Software Engin. and Measurement, 2011, Thinking Applied to Safety, Engineering systems,
pp. 380–383. MIT Press, 2011.
[28] X. Zhao, D. Zhang, M. Lu, F. Zeng, A new approach [43] P. Yang, R. Karashima, K. Okano, S. Ogata,
Autoto assessment of confidence in assurance cases, in: mated inspection method for an stamp/stpa - fallen
Computer Safety, Reliability, and Security (Safe- barrier trap at railroad crossing -, Procedia
ComComp’12), volume 7613 of LNCS, Springer, 2012, pp. puter Science 159 (2019) 1165–1174.
79–91. [44] T. Kaneko, Y. Takahashi, T. Okubo, R. Sasaki, Threat
[29] D. Koller, N. Friedman, Probabilistic Graphical Mod- analysis using stride with stamp/stpa, in: Proc. of
els: Principles and Techniques, Adaptive computa- the Int. Workshop on Evidence-based Security and
tion and machine learning, MIT Press, 2009. Privacy in the Wild, 2018.
[30] S. Rimkevičius, M. Vaišnoras, E. Babilas, E. Ušpuras, [45] A. Adriaensen, L. Pintelon, F. Costantino, G. D.</p>
        <p>Hazop application for the nuclear power plants de- Gravio, R. Patriarca, An stpa safety analysis case
commissioning projects, Annals of Nuclear Energy study of a collaborative robot application,
IFAC(2016). PapersOnLine 54 (2021) 534–539. 17th IFAC
Sym[31] W. Tian, T. Du, S. Mu, Hazop analysis-based dy- posium on Information Control Problems in
Manunamic simulation and its application in chemical facturing INCOM 2021.
processes, Asia-Pacific Journal of Chemical Engi- [46] S. Chen, S. Khastgir, I. Babaev, P. Jennings,
Identineering 10 (2015) 923–935. fying accident causes of driver-vehicle interactions
[32] P. K. Marhavilas, M. Filippidis, G. K. Koulinas, D. E. using system theoretic process analysis (stpa), in:
Koulouriotis, An expanded hazop-study with fuzzy- 2020 IEEE Int. Conf. on Systems, Man, and
Cyberahp (xpa-hazop technique): Application in a sour netics (SMC), 2020, pp. 3247–3253.
crude-oil processing plant, Safety science 124 (2020) [47] M. Chaal, O. A. Valdez Banda, J. A. Glomsrud, S.
Bas104590. net, S. Hirdaris, P. Kujala, A framework to model
[33] M. Danko, J. Janošovsky`, J. Labovsky`, L. Jelemensky`, the stpa hierarchical control structure of an
auIntegration of process control protection layer into tonomous ship, Safety Science 132 (2020) 104939.
a simulation-based hazop tool, Journal of Loss
Prevention in the Process Industries 57 (2019) 291–303.
[34] E. Roche, W. Dupont, A. Summers, Beyond hazop:</p>
        <p>Analyzing common cause and system scenarios,</p>
        <p>Process Safety Progress 38 (2019) e11997.
[35] F. Crawley, B. Tyler, HAZOP: Guide to Best Practice,</p>
        <p>Elsevier Science, 2015.
[36] M. Chudleigh, J. Catmur, Safety assessment of
computer systems using HAZOP and audit techniques,
in: SafeComp’92, Elsevier, 1992, pp. 285–292.
[37] T. A. Kletz, Hazop–past and future, Reliability</p>
        <p>Engineering &amp; System Safety 55 (1997) 263–266.
[38] B. Kramer, C. Neurohr, M. Büker, E. Böde, M.
Fränzle, W. Damm, Identification and quantification
of hazardous scenarios for automated driving, in:
International Symposium on Model-Based Safety
and Assessment, Springer, 2020, pp. 163–178.
[39] M. R. Othman, R. Idris, M. H. Hassim, W. H. W.</p>
        <p>Ibrahim, Prioritizing HAZOP analysis using
analytic hierarchy process (AHP), Clean Technologies
and Environmental Policy 18 (2016) 1345–1360.
[40] E. Németh, R. Lakner, K. Hangos, I. Cameron,
Hierarchical cpn model-based diagnosis using HAZOP
knowledge, Technical report of the Systems and</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>