Safety of Artificial Intelligence: A Collaborative Model∗

                                         John McDermid1 , Yan Jia1
                        1
                          Department of Computer Science, University of York, York, UK
                                     {john.mcdermid, yj914}@york.ac.uk


                         Abstract                                to produce and often substantially lag technological devel-
                                                                 opments. This situation is beginning to change, particularly
     Achieving and assuring the safety of systems that           related to autonomous vehicles (AVs) and, for example, the
     use artificial intelligence (AI), especially machine        British Standards Institution (BSI) has an active programme
     learning (ML), pose some specific challenges that           developing Publicly Available Specifications (PAS) for AVs
     require unique solutions. However, that does not            and in the USA a standard, UL 4600 [ANSI & UL, 2020],
     mean that good safety and software engineering              has recently been published.
     practices are no longer relevant. This paper shows             Thus, both communities are very active but our experi-
     how the issues associated with AI and ML can                ence, e.g. through the Assuring Autonomy International Pro-
     be tackled by integrating with established safety           gramme (AAIP), suggests that the initiatives in the safety
     and software engineering practices. It sets out a           community are having limited impact on the AI community,
     three-layer model, going from top to bottom: sys-           and that the safety community is struggling to come to grips
     tem safety/functional safety; “AI/ML safety”; and           with the subtleties and complexities of ML. Part of the reason
     safety-critical software engineering. This model            for this is the absence of common terminology and a frame-
     gives both a basis for achieving and assuring safety        work that enables the two communities to collaborate. The
     and a structure for collaboration between safety en-        intent here is to provide a model and some terminology which
     gineers and AI/ML specialists. The model is illus-          can facilitate greater collaboration.
     trated with a healthcare use case which uses deep
                                                                    Against this background, the paper is organised as follows.
     reinforcement learning for treating sepsis patients.
                                                                 Section 2 reviews some of the rather “disconnected” activities
     It is argued that this model is general and that it
                                                                 in the AI and safety communities, and identifies foundations
     should underpin future standards and guidelines for
                                                                 for our collaborative model. In section 3 the model is pre-
     safety of this class of system which employ ML,
                                                                 sented, showing how long-established safety-critical software
     particularly because the model can facilitate collab-
                                                                 development principles can support “AI safety” activities,
     oration between the different communities.
                                                                 which in turn support system/functional safety. This model
                                                                 is illustrated in section 4 using a case study from healthcare.
1   Introduction                                                 Section 5 presents a discussion, particularly considering how
                                                                 the model might be developed and utilised to support collab-
There is a growing recognition of the challenges posed by the    oration between the AI and safety communities and this leads
use of artificial intelligence (AI), and more specifically ma-   into the conclusions in Section 6.
chine learning (ML), in safety-critical systems. These chal-
lenges have been recognised by the AI community and there
have been several influential publications, e.g. on “concrete    2   Safety of AI: Two Communities
problems in AI safety” [Amodei et al., 2016], which have
                                                                 This section aims to identify both differences in views of the
drawn the community’s attention to the potential for harm
                                                                 AI and safety communities and a basis on which to build
arising from undesirable, and unanticipated, learnt behaviour
                                                                 the collaborative model. Both communities run regular “AI
of ML-based systems.
                                                                 Safety” events. However, the attendance at such events is
   In parallel, the safety community has been considering the
                                                                 skewed, so they don’t really act as a meeting of minds. Rather
impact of AI and ML on system safety with an emphasis on
                                                                 than viewing this as a sociological problem, it is more helpful
autonomous systems (AS), e.g. including the “gaps” in engi-
                                                                 to consider the differences in technical viewpoints between
neering processes that arise from the use of AI and ML [Bur-
                                                                 the communities to build the collaborative model.
ton et al., 2020]. In practice, development of safety-critical
systems is strongly influenced by standards but they are slow       The AI community viewpoint is considered first, then the
                                                                 safety perspective is introduced in two parts. First, we con-
  ∗                                                              sider broad safety processes, then we consider “good prac-
    Copyright c 2020 the authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0)        tice” in safety-critical software engineering.
2.1   “AI Safety”                                                  using relevance of inputs (to the robot’s set of hypotheses)
There is a growing interest in “AI safety” and many AI prac-       and only learning from relevant inputs, thus reducing the im-
titioners and researchers have sought to identify and address      pact of mis-specified feature or objective spaces.
potential problems. We start with a seminal paper which sets          Further, there is considerable interest in the problem of
out “concrete problems in AI safety” [Amodei et al., 2016]         “adversarial” resilience for RL with many proposed solution
arising from the use of reinforcement learning (RL). A brief       approaches, e.g. [Behzadan and Hsu, 2019]. This is un-
introduction to RL is given in section 4.1; here all we need to    doubtedly relevant in some application domains, and needs
know is that with RL the system learns by getting “rewards”        to be addressed in a complete approach to using AI in safety-
from interaction with the environment and seeks to maximise        critical contexts, but it is outside the scope of this paper.
its reward to achieve the desired behaviour, represented as a         More philosophically, the challenges of “AI Safety” arise
“policy”. The paper uses a cleaning robot as an example and        because systems are developed using “narrow” or “specific”
identifies undesirable (in safety terms, hazardous) behaviours     AI so they do not have an understanding (semantic model)
for the robot. The description of these undesirable behaviours     of real-world objects that would be achieved by Artificial
given here is intended to be general in terms of AI technolo-      General Intelligence (AGI). Whilst AGI is a distant prospect,
gies and their applications:                                       some work on AGI safety is producing interesting results, e.g.
                                                                   causal influence diagrams that bear a resemblance to the mod-
 1. Avoiding negative side effects – ensuring that the be-         els used in safety engineering and thus might be a route to
    haviour meets safety constraints; from a safety perspec-       bridging between these communities [Everitt et al., 2019].
    tive this equates to avoiding hazards;
 2. Avoiding reward hacking – ensuring that the behaviour
                                                                   2.2   System Safety/Functional Safety
    doesn’t get a high reward by “gaming” and producing            System safety is a relatively long-established discipline, gen-
    solutions that are valid in some literal sense but don’t       erally believed to have originated in US military projects in
    meet the designer’s intent;                                    the 1940s. There are now well-established processes which
                                                                   include hazard identification and risk assessment to guide the
 3. Scalable oversight – this concerns the ability of the hu-      design to produce acceptably safe systems. Although not uni-
    man to interact with the system both to monitor it and to      versally accepted, in many industries, e.g. healthcare [NHS
    respond to requests for confirmation of decisions prior        Digital, 2018], it is common for the results of the develop-
    to actions being taken;                                        ment and safety processes to result in the production of a
 4. Safe exploration – when the system needs to try new            safety case, a structured argument, supported by evidence,
    things to learn a better solution, how can negative out-       intended to justify that a system is acceptably safe for a spe-
    comes (i.e. new hazards) be avoided;                           cific application in a specific operating environment [Kelly
                                                                   and Weaver, 2004].
 5. Robustness to distributional shift – how the system
                                                                      System safety has evolved and the term “functional safety”
    adapts to changes in the operational environment, e.g.
                                                                   is often used for safety processes addressing computers and
    for a medical diagnosis system which is developed us-
                                                                   software. Many of the standards for functional safety provide
    ing an Asian population but deployed in Europe.
                                                                   requirements for product design and for the system, software
    These problems can arise from a number of underlying           and hardware development processes. They often use safety
causes, e.g. inappropriate reward functions and mis-specified      integrity levels (SILs) to rank the (potential) risk posed by
feature spaces. Work at DeepMind [Leike et al., 2018] iden-        the system and vary the requirements with SIL, for example
tified similar problems to [Amodei et al., 2016] but also high-    requiring higher levels of redundancy and diversity in the sys-
lighted the “reward-result gap”, where the agent fails to con-     tem architecture at higher SILs (higher risk).
verge on an optimal policy. Their proposed research direction         System and functional safety have continued to evolve but
focuses on “reward modelling” and points out a number of           generally lagging behind technology developments [McDer-
approaches including leveraging existing data, imposing side       mid, 2017] – this lag is perhaps most apparent with the intro-
constraints and adversarial training, to achieve “agent align-     duction of AI and ML into systems. Although there are some
ment”. The “desiderata” (desirable properties) for their work      emerging standards such as UL4600 [ANSI & UL, 2020] gen-
are that the results are: scalable, economic and pragmatic. It     erally they set requirements “around” the AI/ML elements but
also rests on two assumptions (slightly rephrased):                are not clear about what form of evidence would be needed
  • It is possible to learn user intentions to sufficiently high   to show that these requirements are met.
    accuracy;                                                         Instead, for advice on the safe use of AI and ML, one has
                                                                   to turn to the research community rather than standards. Ini-
  • For many learning tasks, evaluation of outcomes is eas-        tiatives such as the AAIP have pioneered work on safety of
    ier than producing the correct behaviour.                      autonomous systems employing AI and ML. Two particularly
   Other work considers the issues of mis-specified feature        important contributions are an ML lifecycle [Ashmore et al.,
spaces [Bobu et al., 2018]. When training robots based on          2019] and an assurance process known as AMLAS [Picardi et
physical human-robot interaction the robot may not have a          al., 2020] that builds on the ML lifecycle. The ML lifecycle
rich enough hypothesis space to capture everything the user        has three stages, and identifies desiderata for each stage:
cares about so the robot may “misinterpret” user input and           1. Data management – including collecting, pre-processing
learn inappropriate actions. The proposed solution includes              and analysing data to be used for model learning;
    2. Model learning – including the model selection and
       choice of hyper-parameters used to control the learning
       process;
    3. Model verification – including the use of mathematical
       analysis and testing.
   As an example of the desiderata, robustness in model learn-
ing considers the model’s ability to perform well in circum-
stances where the inputs encountered at run time are different
to those present in the training data, e.g. due to environmen-
tal uncertainty, such as flooded roads, which is analogous to
“distributional shift” in [Amodei et al., 2016]. This shows the
possibility of reconciling the viewpoints of the two commu-
                                                                                  Figure 1: The Collaborative Model
nities to provide common terminology to underpin a collabo-
rative model.
                                                                     1. System safety/functional safety – application of a clas-
2.3     Safety-Critical Software Engineering                            sical safety process to understand hazards, produce de-
There are many good practices in developing safety-critical             rived safety requirements and to gather evidence from
software, some of which are reflected in standards such as              layers 2 and 3 into the safety/assurance case;
IEC 61508 Part 3 [IEC, 2010]. These include use of for-
mal methods for specifying requirements, coverage criteria           2. AI/ML safety – developing the ML systems to meet
for testing, and traceability from specifications to programs           their performance objectives and satisfaction of the
to assist in verification. The required practices also vary with        ML lifecycle desiderata as well as the derived safety
SIL, for example with more stringent requirements for test              requirements, and providing evidence to support the
coverage at higher SILs. It is not possible to cover all of these       safety/assurance case;
practices here but we use two of them to illustrate our points       3. Safety-critical software engineering – application of
here.                                                                   good software engineering practices to the development
   First, all programming languages have constructs which               of the AI/ML software.
can lead programmers to make mistakes. This has led to the
definitions of so-called “safe subsets” of programming lan-         ML systems are generally developed in an agile manner, often
guages, e.g. the C language [Hatton, 2004], which restrict the      with daily builds as new training data becomes available; this
use of the language so as to avoid the most error-prone con-        is reflected in the iterative loop between the top two layers.
structs. These subsets are well-defined and it is practicable to    However, for simplicity, the description of the layers ignores
use tools to help police their use.                                 the iteration.
   Second, static analysis assesses a program without execut-
ing it. This can identify “undesirable features” in programs,       3.2   Layer 1: System Safety/Functional Safety
including indexing outside array bounds and dividing by zero,       The system safety/functional safety layer has three major el-
that could lead to undefined behaviour. Even if such “undesir-      ements:
able features” are not “unsafe”, their presence may undermine
the results of other analyses or test results, so static analysis    1. Hazard analysis – use of hazard analysis techniques or
is of broad utility in showing the integrity of programs. Lan-          domain knowledge to identify hazards and to estimate
guage subsets and static analysis are not disjoint concepts and         the associated risks;
some of the most powerful tools combine the two, e.g. [Mc-           2. Derived safety requirements (DSRs) – based on the iden-
Cormick and Chapin, 2015].                                              tified hazards, establishing requirements for the ML and
   These practices address conventional programming lan-                other elements of the system so that their contribution to
guages but ML models are programs too and the collaborative             hazards is controlled or their role in mitigating hazards
model explores their relevance for ML.                                  is clearly defined;
                                                                     3. Safety/assurance case – arguments for the safety of
3     The Collaborative Model: An “AI                                   the system, supported where appropriate by evidence
      sandwich”                                                         from the ML layer that the software meets the relevant
The collaborative model aims to provide a structure that links          desiderata and derived safety requirements.
the different world views of the AI and safety communi-             Often requirements for AI/ML systems are articulated at a
ties. The AI elements are in the middle, with safety elements       high level, e.g. to perform better than a human, and it can be
above and below – hence the “sandwich” analogy.                     difficult to map down to concrete requirements on the AI/ML
                                                                    components (the semantic gap [Burton et al., 2020]). This
3.1     The Model                                                   is an open issue for reconciling conventional systems engi-
The collaborative model, see Fig. 1, has three layers which         neering with AI/ML, but it is not always so difficult when
we number top-down. The intent of the layers is:                    considering DSRs, see the case study in section 4.
3.3     Layer 2: “AI/ML Safety”                                      performing actions and receiving feedback from the environ-
The “AI/ML safety” layer is intended to encompass the ML             ment [Sutton and Barto, 2018]. Often the environment is rep-
software development including verification and has three            resented using a Markov Decision Process (MDP). A policy
major elements:                                                      defines the agent’s behaviour and maps the perceived states
                                                                     of the environment to actions for the agent to take. There
    1. Model alignment – meeting the design intent which is          are many different RL algorithm and the case study uses a
       informed by the hazard analysis, as the intent, inter alia,   widely used modern RL algorithm known as double deep Q-
       is to avoid hazards;                                          Networks (double DQN) [Mnih et al., 2015].
    2. Data collection and ML model development – these are
       the first two stages of the ML lifecycle [Ashmore et al.,     4.2   Sepsis Case
       2019] and are informed by the derived safety require-         Sepsis is a life-threatening organ dysfunction caused by a
       ments;                                                        dysregulated host response to infection. It is estimated that
    3. Satisfaction of the ML lifecycle desiderata – show-           one in five deaths worldwide is caused by sepsis [Gallagher,
       ing that the relevant desiderata have been met, includ-       2020], but the optimal treatment strategy for sepsis remains
       ing verifying satisfaction of the derived safety require-     unclear [Marik, 2015]. Evidence suggests that current prac-
       ments and production of relevant evidence to support the      tices in the administration of intravenous fluids and vasopres-
       safety/assurance case.                                        sors are suboptimal [Waechter et al., 2014]. Consequently,
                                                                     researchers have used RL to learn the optimal treatment strat-
The term “model alignment” is used as a generalisation of            egy, e.g. [Raghu et al., 2017].
“agent alignment” and is intended also to include the avoid-            First, we re-implemented the work from [Raghu et al.,
ance of problems such as distributional shift.                       2017] using double DQN. The state space for the MDP in-
                                                                     cluded patients’ demographics, Elixhauser premorbid status,
3.4     Layer 3: Safety-Critical Software Engineering
                                                                     vital signs, laboratory values, fluids and vasopressors re-
This layer incorporates good practices from safety-critical          ceived. The action space for the MDP is discretised into 25
software engineering to ensure the integrity of the code; for        possible actions with 5 possible choices for intravenous flu-
brevity only two elements are illustrated:                           ids and vasopressors respectively, as shown in Table I. Table I
    1. Coding standards – the use of guidelines/rules to avoid       also shows the detailed dose ranges and dose medians for the
       the more error-prone constructs in programming lan-           five vasopressor choices; this is important as the case study
       guages, supporting ML model development;                      focuses on the safety of vasopressor administration. Note
                                                                     that vasopressor dosage is shown in mcg/kg/min of Nore-
    2. Static Analysis/verification – the use of static analysis     pinephrine equivalent. The maximum dosage change occurs
       tools to help identify undesirable features in programs       when the recommendation changes from action 0 to action 4,
       so they can be eliminated, supporting satisfaction of ML      or vice versa, in the following step to treat the same patient.
       desiderata.                                                   This change is 0 to 0.786 mcg/kg/min, as 0.786 mcg/kg/min
There are many different static analysis tools; an example is        is the median of the fourth quartile and is considered to be a
presented in section 4.                                              dangerous dose change in one step in clinical practice. A sud-
                                                                     den major change in vasopressor dosage can result in acute
4     The Case study                                                 hypotension, hypertension or cardiac arrhythmias [Fadale et
                                                                     al., 2014] [Hospira UK Ltd, 2018] [Allen, 2014] (hypoten-
The case study used to illustrate the collaborative model            sion can arise from rapidly decreasing doses, with hyperten-
is from healthcare, particularly the use of RL to derive             sion or arrythmias arising from rapidly increasing doses).
“optimal” policies for treatment of sepsis, which is a life-            Next, we evaluated the original learnt policy and discov-
threatening condition and a major cause of fatalities in hos-        ered that it contains far more of these sudden major changes
pitals. The case study is from our previously published work         when recommending the vasopressor dosage than are found
[Jia et al., 2020], but extended here to address the three lay-      in the clinicians’ treatments based on the real data – MIMIC
ers in the collaborative model. Key aspects of the case study        III [Johnson et al., 2016] (here we refer to these treatments as
are introduced against each layer in the model, with the great-      the clinician policy for ease of comparison). These results are
est emphasis placed on the middle layer as this is where the         shown in Fig. 3a. Thus the initial learnt policy raises some
approaches of the AI and safety communities come together.           safety concerns and we used the collaborative model to guide
Before presenting the case study, we first give a brief intro-       us develop a safer learnt policy.
duction to RL; this is not intended to be an exhaustive discus-
sion of what is a very complex topic but enough to enable the        Layer 1: Hazard and Derived Safety Requirements
case study to be understood. More details on the RL approach         This layer is largely the province of the safety and domain
can be found in [Jia et al., 2020].                                  specialists, i.e. clinicians in this case.
                                                                        In an ideal world, the hazard analysis would be based on
4.1     Reinforcement Learning                                       a clinical pathway (a model of the treatment process, includ-
RL is a very powerful ML technique which is widely used              ing key decisions). For brevity, we illustrate the approach
in complex decision making tasks to find an “optimal” pol-           in terms of a single hazard identified using domain knowl-
icy. It consists of an agent interacting with its environment by     edge. The hazard is defined as: “sudden change in vasopres-
        Table 1: Dosage Actions (from [Jia et al., 2020])          0.75 mcg/kg/min between treatment steps for an individual
                                                                   patient closer to clinician policy (maps to “System recom-
                                                                   mends sudden change”);
                                                                      DSR1: accurately identify hazards, hazards causes and
                                                                   safety constraints (maps to “Insufficient safety constraints de-
                                                                   rived”);
                                                                      DSR2: (a) meet the desiderata for the ML lifecycle [Ash-
                                                                   more et al., 2019] and (b) show satisfaction of safety con-
                                                                   straints arising from DSR1, (maps to “insufficiently thorough
                                                                   development”);
sor dosage”.                                                          DSR3: avoid implementation deficiencies that could give
   It can be seen from Fig 3a that the original learnt policy is   rise to unintended behaviour (maps to “Implementation de-
“less safe” than the clinician policy, specifically 35% of the     fects”).
patients have a sudden major change in the learnt policy as           DSR1, DSR2 and DSR3 are intended to support DSR0,
opposed to 2.6% in the clinician policy. This can be viewed        and should guide the development of the ML in the middle
as an example of the hazard and it is visible when evaluat-        layer in Fig. 1. Our focus here is on DSR1 and DSR2, in
ing the learnt policy, which is consistent with the assumption     order to satisfy DSR0, as this shows most clearly the links
in [Leike et al., 2018] that “evaluation of outcomes is easier     between the concerns of the safety and AI communities.
than producing the correct behaviour”. It can also be seen            Evidence that the DSRs have been met should form part
as showing a failure to achieve “model alignment” and this         of the safety case. Safety arguments are often produced
triggers an iteration around the first two layers of the collab-   graphically, e.g. using the Goal Structuring Notation (GSN)
orative model, producing explicit DSRs.                            [Kelly and Weaver, 2004]. For brevity, we do not set out
   In a normal safety analysis a systematic approach would         the safety argument here, but discuss the evidence needed to
be taken to identify causes of hazards, and there are many         meet DSR0 in the following two subsections. The evidence
approaches to identifying and representing hazard causes,          for DSR1 comes from layer 1 in Fig. 1. In a full development
e.g. SHARD for software-intensive systems [Pumfrey, 1999].         it would rest on the rigour of the hazard analysis process and
However, what we are interested in here is understanding the       the suitability of domain knowledge; for the purpose of this
potential causes of the hazard across the layers in the Collab-    paper we assume that an appropriate hazard has been identi-
orative Model, and we adapt the well-known “bow-tie” dia-          fied, based on clinical knowledge, hence DSR1 is satisfied.
gram for this purpose. This enables us to show causes and
consequences of the hazard derived from an understanding of        Layer 2: “AI/ML Safety”
the potential limitations at each layer, see Fig. 2.


  Figure 2: Bow-tie showing Hazard, Causes and Consequences
                                                                       (a) Original learnt policy    (b) Our modified policy with
                                                                                                     safety constraint
   In Fig. 2 the consequences are the clinical outcomes de-
scribed previously. The proximate cause of the hazard is            Figure 3: Performance of Learnt Policies (from [Jia et al., 2020])
“System recommends sudden change” (in excess of 0.75
mcg/kg/min) which has three potential causes relating to the          We start with DSR1 and consider what can contribute to
three layers in the model in Fig. 1. The “Insufficient safety      this hazard, from an “AI safety” perspective, which involves
constraints derived” reflects inadequacy in hazard analysis;       understanding some of the details of the RL process. First,
if a hazard is missed or misunderstood then the system de-         the MDP only depends on the current state, that is, given the
veloped might not be safe. The “Insufficiently thorough de-        current state, the future state does not depend on the cumu-
velopment” means failure to meet the safety constraints and        lative history of past states. If the current state in the MDP
the desiderata for ML including avoiding relevant “AI Safety”      doesn’t capture the dose delta or relative dose change com-
problems such as those identified in [Amodei et al., 2016]         pared with the previous dose, there is no guarantee that the
and [Leike et al., 2018]. This can also be seen as a failure to    agent will learn an optimal policy avoiding the sudden ma-
achieve “model alignment”. The “Implementation defects”            jor dose change. This is an example of mis-specified feature
refers to code-level problems, such as “divide by zero” that       spaces [Bobu et al., 2018] and is corrected by extending the
can have unpredictable effects.                                    state space to enable the agent to take into account the differ-
   There are four DSRs arising from the bow-tie diagram:           ence between the current step and the previous step in terms
   DSR0: reduce changes in vasopressor dosage of more than         of vasopressor dose while learning the policy.
   Second, the process of learning an optimal policy is to min-
imise the cost function. If the cost function incorporate the
safety constraint, then it can guide the agent to learn a safer
policy. Thus, the cost function was modified to include an ad-
ditional safety constraint “penalising” large changes in dose
(specifically those over 0.75 mcg/kg/min between dose rec-
ommendations) which is one of the approaches suggested in
[Leike et al., 2018]. Therefore, two changes were made in
order to guide the agent to learn a safer policy by altering the                 Figure 4: Extract from Pylint Log File
state space to include dose delta and by adding a regulariza-
tion term in the cost function to “penalise” sudden changes,
see [Jia et al., 2020] for more details, including the explicit
cost functions used.                                                               Figure 5: Example of Code Rating
   After the implementation of these two alterations we have
learnt a new modified policy and Fig.3b shows that the mod-
ified policy has fewer sudden major changes compared to the           The evidence to meet DSR3 is based on the log file, show-
original learnt policy shown in Fig. 3a (the rate of such sud-     ing that the error count (E) is zero and, desirably, an assess-
den major changes of vasopressor dose has been reduced by          ment of the other entries to “sentence” them and to decide
77.5%). Particularly, we found that only 7.87% of patients         which need to be addressed, and which can be tolerated. For
have this sudden major change in the modified policy, which        example, some of the refactoring comments (R) can indi-
is much closer to the 2.6% in the clinician policy. Therefore,     cate aspects of the program that are hard to test, and which
this is considered to be much safer and to support DSR1 and        may therefore weaken the value of the evidence in support
hence DSR0, improving the “model alignment”.                       of DSR2 if not corrected. Others, e.g. C0321, are merely
   The evidence to support DSR2 is multi-faceted. DSR2 (a)         stylistic and would not undermine the evidence from layer 2
can be supported through a development log, and other de-          against DSR2.
velopment artefacts to show how the three stages of the ML
lifecycle (indicated in section 2.2) have been implemented ap-     5   Discussion
propriately. In general, judgement is needed on sufficiency of     Our model provides a “big picture” which allows the view-
evidence. DSR2 (b) is supported by the performance data,           points from the AI and safety communities to be reconciled.
see Fig.3b, and the comparison with Fig. 3a which show the         The illustration of our model in section 4 has shown that these
results of encoding the safety constraint in the learnt policy.    viewpoints can be drawn together constructively to improve
                                                                   the safety of a system employing ML.
Layer 3: Safety-Critical Software Engineering                         It is worth considering the relationship of “AI/ML safety”
This layer is the province of (software) safety specialists. It    issues, e.g. [Amodei et al., 2016] [Leike et al., 2018] [Bobu et
“provides” coding standards to support the ML development          al., 2018] and the ML lifecycle [Ashmore et al., 2019] to sup-
and static analysis techniques to support demonstration of the     port layer 2 in our collaborative model. As already mentioned
integrity of the developed software. The evidence to support       some of these problems and desiderata in the ML lifecycle
DSR3 comes from the use of the static analysis techniques in       overlap, e.g. “distributional shift” and robustness. Further, is-
this case.                                                         sues such as the “reward-result gap” and mis-specified feature
   The software we developed in this case study is written in      spaces can be seen to relate to system safety concerns through
Python and uses the TensorFlow library, thus we have cho-          the case study. If a systematic relationship can be established
sen Pylint (see: https: www.pylint.org) which supports both        between the problems identified in the “AI Safety” commu-
coding standards and error detection to do the static analysis.    nity and desiderata in the ML lifecycle, then this would fur-
Fig. 4 shows a fragment of the report (log file) from running      ther cement the links between the viewpoints of the AI and
Pylint on our code (the code module named “deeprl”). The           safety communities.
labels are as follows:                                                The illustration at layer 3, Safety-Critical Software Engi-
   C: coding convention violation;                                 neering, was limited due to space constraints. A decision was
   R: for “refactoring” to improve the score against some          made to focus on code-level issues for the paper, but other
quality metric;                                                    techniques are relevant. For example, several standards pro-
   E: for programming errors, likely a “bug”;                      mote the use of formal methods. Whilst full formal speci-
   W : for warnings, e.g. minor programming errors or stylis-      fication of ML systems is likely to be difficult due to the se-
tic issues.                                                        mantic gap [Burton et al., 2020] it may be possible to use par-
   Fig. 4 shows the log file for a stage in the development        tial formal specifications for critical (safety) properties [Salay
of the “deeprl” module. The progress in improving the code         and Czarnecki, 2019] which also opens up the opportuni-
module can be seen via the overall rating in Fig. 5. Pylint is     ties for more formal verification. Further, all safety-critical
quite “pedantic” (the log files are usually very long), so it is   systems should be tested, and there is a need for systematic
very hard to get a score of 10 – but it is important to remove     approaches to testing for systems employing AI or ML, e.g.
the type E problems (and F which are fatal and prevent the         considering how to guide testing based on estimates of resid-
analysis from proceeding).                                         ual risk [Wotawa, 2019]. In general, there is a need to con-
sider good practice in Safety-Critical Software Engineering,        and see how it relates to that of the other community.
to identify what techniques can be drawn across into AI/ML             Due to the speed of development of AI and ML and their
development. As ML software tends to be developed in a dy-          safety-related applications, not least in autonomous vehicles,
namic and iterative fashion this suggests that it would be de-      the standards are lagging behind the technology – and, ar-
sirable to draw on work on agile approaches to safety-critical      guably, the gap is growing. The collaborative model, or its
software development such as [Hanssen et al., 2018] as well.        future refinements building on more detailed models, e.g. the
    However, as noted above, most functional safety standards       ML lifecycle [Ashmore et al., 2019] and assurance processes
use SILs (or variants thereof) and alter the requirements for       [Picardi et al., 2020], should provide a framework for pro-
techniques to apply to each stage of the software development       ducing future standards for safety critical systems using ML.
process based on SIL. The discussion of the ML lifecycle            It remains to be seen whether or not SILs form part of such
[Ashmore et al., 2019] identifies multiple ways of address-         standards – their introduction and use has been pragmatic (as
ing some of the desiderata, but it is not obvious that these        much to manage cost as safety) and, at minimum, the ra-
can be “ranked” in terms of contribution to risk reduction and      tionale for their introduction should be reconsidered as stan-
hence arranged in SIL-order. Whilst it might be possible to         dards for ML-based systems are developed.
preserve the SIL concept at layer 3 in the collaborative model,        However, as noted above, there are open issues, perhaps
it is much less obvious how to do this at layer 2. Either this      one of the most important is whether or not the “AI Safety”
means the two communities have a long way to go before they         problems can be mapped to the ML lifecycle model and thus
can fully support the notion of SILs for ML-based systems, or       addressed in a unified way with the ML desiderata in the mid-
there needs to be an acknowledgement that the SIL concept           dle layer of our collaborative model. These open issues can
is not readily applicable in the context of ML-based systems.       best be addressed through greater collaboration between the
    Additionally, standards for safety-critical systems often set   two communities and they will be a focus in our future work.
stringent safety targets, e.g. unsafe failure rates of one in 10
million operating hours. In contrast, ML developers are often       Acknowledgements
content with false positive/false negative rates of the order of    This work funded by the Assuring Autonomy International
a few percent – in some applications, e.g. autonomous driv-         Programme at the University of York and by Bradford Teach-
ing, this might equate to many “failures” per hour! These           ing Hospitals NHS Foundation Trust. The views expressed in
figures seem incompatible. However, the safety targets set in       this paper are those of the authors and not necessarily those
standards are at system level, not algorithm level, so they can     of the NHS, or the Department of Health and Social Care.
be reconciled, at least in some cases, with suitable system ar-
chitectures including redundancy and diversity. Nonetheless,        References
more work is needed to show how to bridge this “gap” and to         [Allen, 2014] John M Allen. Understanding vasoactive med-
find ways of demonstrating that AI/ML-based systems meet              ications: focus on pharmacology and effective titration.
the stringent unsafe failure-rate targets that are widely used        Journal of Infusion Nursing, 37(2):82–86, 2014.
for safety-critical systems.                                        [Amodei et al., 2016] Dario Amodei, Chris Olah, Jacob
    The case study uses RL and it was relatively easy to change       Steinhardt, Paul Christiano, John Schulman, and Dan
the feature space and to introduce a safety constraint in the         Mané. Concrete problems in AI safety. arXiv preprint
cost function to satisfy the derived safety requirements in a         arXiv:1606.06565, 2016.
way that is traceable. With other ML techniques, e.g. un-
supervised learning, it may be less easy to embed the safety        [ANSI & UL, 2020] ANSI & UL. Standard for Evaluation of
constraint in the learning process and it may be necessary            Autonomous Products. https://www.shopulstandards.com/
instead to design the system with a separate “monitor” that           ProductDetail.aspx?productid=UL4600, 2020.
polices the system behaviour against the safety constraint, as      [Ashmore et al., 2019] Rob Ashmore, Radu Calinescu, and
suggested in [McDermid et al., 2019], or to use other forms of        Colin Paterson. Assuring the machine learning lifecy-
diversity and redundancy as discussed above. Defining run-            cle: Desiderata, methods, and challenges. arXiv preprint
time monitors is not always straightforward, so the generality        arXiv:1905.04223, 2019.
of our collaborative model across the wide and growing range        [Behzadan and Hsu, 2019] Vahid Behzadan and William
of ML techniques remains an open issue.                               Hsu. Rl-based method for benchmarking the adversarial
                                                                      resilience and robustness of deep reinforcement learning
6   Conclusions                                                       policies. In International Conference on Computer Safety,
                                                                      Reliability, and Security, pages 314–325. Springer, 2019.
The collaborative model has shown how to link the view-
points of the AI and safety communities, with the DSRs and          [Bobu et al., 2018] Andreea Bobu, Andrea Bajcsy, Jaime F
evidence flows providing the critical links in Fig. 1. Whilst         Fisac, and Anca D Dragan. Learning under misspecified
the model is abstract, the case study has enabled us to show          objective spaces. arXiv preprint arXiv:1810.05157, 2018.
that the ideas can be made “concrete” although it has not been      [Burton et al., 2020] Simon Burton, Ibrahim Habli, Tom
possible to provide full technical detail at all the layers. It       Lawton, John McDermid, Phillip Morgan, and Zoe Porter.
is hoped that this model will help to facilitate broader en-          Mind the gaps: Assuring the safety of autonomous sys-
gagement between the AI and safety communities by giving              tems from an engineering, ethical, and legal perspective.
a structure in which they can recognise their own viewpoint           Artificial Intelligence, 279:103201, 2020.
[Everitt et al., 2019] Tom Everitt, Ramana Kumar, Victoria       [McDermid, 2017] John McDermid. Playing catch-up: The
   Krakovna, and Shane Legg. Modeling agi safety frame-             fate of safety engineering. In Developments in System
   works with causal influence diagrams. arXiv preprint             Safety Engineering, Proceedings of the Twenty-fifth Safety-
   arXiv:1906.08663, 2019.                                          Critical Systems Symposium, Bristol, UK, ISBN, pages
[Fadale et al., 2014] Kristin Lavigne Fadale, Denise Tucker,        978–1540796288, 2017.
   Jennifer Dungan, and Valerie Sabol.              Improving    [Mnih et al., 2015] Volodymyr Mnih, Koray Kavukcuoglu,
   nurses’ vasopressor titration skills and self-efficacy via       David Silver, Andrei A Rusu, Joel Veness, Marc G Belle-
   simulation-based learning. Clinical Simulation in Nurs-          mare, Alex Graves, Martin Riedmiller, Andreas K Fidje-
   ing, 10(6):e291–e299, 2014.                                      land, Georg Ostrovski, et al. Human-level control through
[Gallagher, 2020] James Gallagher.         ‘Alarming’ one in        deep reinforcement learning. Nature, 518(7540):529–533,
   five deaths due to sepsis. https://www.bbc.co.uk/news/           2015.
   health-51138859, 2020. Accessed: 2020-03-01.                  [NHS Digital, 2018] NHS Digital. DCB0160: Clinical risk
[Hanssen et al., 2018] Geir Kjetil Hanssen, Tor Stålhane,          management: its Application in the Deployment and Use
                                                                    of health IT Systems. 2018.
   and Thor Myklebust. SafeScrum R -Agile Development of
   Safety-Critical Software. Springer, 2018.                     [Picardi et al., 2020] Chiara Picardi, Colin Paterson, Richard
[Hatton, 2004] Les Hatton. Safer language subsets: an               Hawkins, Radu Calinescu, and Ibrahim Habli. Assurance
                                                                    argument patterns and processes for machine learning in
   overview and a case history, misra c. Information and Soft-
                                                                    safety-related systems. In Proceedings of the Workshop on
   ware Technology, 46(7):465–472, 2004.
                                                                    Artificial Intelligence Safety (SafeAI 2020), 2020.
[Hospira UK Ltd, 2018] Hospira UK Ltd. Noradrenaline             [Pumfrey, 1999] David John Pumfrey. The principled design
   (Norepinephrine) 1 mg/ml Concentrate for Solution for In-        of computer system safety analyses. PhD thesis, University
   fusion. https://www.medicines.org.uk/emc/product/4115/           of York, 1999.
   smpc, 2018. Accessed: 2020-03-01.
                                                                 [Raghu et al., 2017] Aniruddh Raghu,          Matthieu Ko-
[IEC, 2010] IEC. IEC 61508-3:2010, Functional safety
                                                                    morowski, Imran Ahmed, Leo Celi, Peter Szolovits, and
   of electrical/electronic/programmable electronic safety-         Marzyeh Ghassemi. Deep reinforcement learning for
   related systems - Part 3: Software requirements. https:          sepsis treatment. arXiv preprint arXiv:1711.09602, 2017.
   //webstore.iec.ch/publication/5517, 2010.
                                                                 [Salay and Czarnecki, 2019] Rick Salay and Krzysztof Czar-
[Jia et al., 2020] Yan Jia, John Burden, Tom Lawton, and            necki. Improving ml safety with partial specifications. In
   Ibrahim Habli. Safe reinforcement learning for sepsis            International Conference on Computer Safety, Reliability,
   treatment. In 2020 IEEE International conference on              and Security, pages 288–300. Springer, 2019.
   healthcare informatics (ICHI), pages 1–7. IEEE, 2020.
                                                                 [Sutton and Barto, 2018] Richard S Sutton and Andrew G
[Johnson et al., 2016] Alistair EW Johnson, Tom J Pollard,          Barto. Reinforcement learning: An introduction. MIT
   Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad                press, 2018.
   Ghassemi, Benjamin Moody, Peter Szolovits, Leo An-
   thony Celi, and Roger G Mark. Mimic-iii, a freely ac-         [Waechter et al., 2014] Jason Waechter, Anand Kumar,
   cessible critical care database. Scientific data, 3:160035,      Stephen E Lapinsky, John Marshall, Peter Dodek, Yaseen
   2016.                                                            Arabi, Joseph E Parrillo, R Phillip Dellinger, and Allan
                                                                    Garland. Interaction between fluids and vasoactive agents
[Kelly and Weaver, 2004] Tim Kelly and Rob Weaver. The              on mortality in septic shock: a multicenter, observational
   goal structuring notation – a safety argument notation. In       study. Critical care medicine, 42(10):2158–2168, 2014.
   Proceedings of the dependable systems and networks 2004
   workshop on assurance cases, page 6. Citeseer, 2004.          [Wotawa, 2019] Franz Wotawa. On the importance of system
                                                                    testing for assuring safety of ai systems. In AISafety@
[Leike et al., 2018] Jan Leike, David Krueger, Tom Everitt,         IJCAI, 2019.
   Miljan Martic, Vishal Maini, and Shane Legg. Scalable
   agent alignment via reward modeling: a research direction.
   arXiv preprint arXiv:1811.07871, 2018.
[Marik, 2015] PE Marik. The demise of early goal-directed
   therapy for severe sepsis and septic shock. Acta Anaesthe-
   siologica Scandinavica, 59(5):561–567, 2015.
[McCormick and Chapin, 2015] John W McCormick and
   Peter C Chapin. Building high integrity applications with
   SPARK. Cambridge University Press, 2015.
[McDermid et al., 2019] John McDermid, Yan Jia, and
   Ibrahim Habli. Towards a framework for safety assurance
   of autonomous systems. In Artificial Intelligence Safety
   2019, pages 1–7. CEUR Workshop Proceedings, 2019.