Safety of Artificial Intelligence: A Collaborative Model∗ John McDermid1 , Yan Jia1 1 Department of Computer Science, University of York, York, UK {john.mcdermid, yj914}@york.ac.uk Abstract to produce and often substantially lag technological devel- opments. This situation is beginning to change, particularly Achieving and assuring the safety of systems that related to autonomous vehicles (AVs) and, for example, the use artificial intelligence (AI), especially machine British Standards Institution (BSI) has an active programme learning (ML), pose some specific challenges that developing Publicly Available Specifications (PAS) for AVs require unique solutions. However, that does not and in the USA a standard, UL 4600 [ANSI & UL, 2020], mean that good safety and software engineering has recently been published. practices are no longer relevant. This paper shows Thus, both communities are very active but our experi- how the issues associated with AI and ML can ence, e.g. through the Assuring Autonomy International Pro- be tackled by integrating with established safety gramme (AAIP), suggests that the initiatives in the safety and software engineering practices. It sets out a community are having limited impact on the AI community, three-layer model, going from top to bottom: sys- and that the safety community is struggling to come to grips tem safety/functional safety; “AI/ML safety”; and with the subtleties and complexities of ML. Part of the reason safety-critical software engineering. This model for this is the absence of common terminology and a frame- gives both a basis for achieving and assuring safety work that enables the two communities to collaborate. The and a structure for collaboration between safety en- intent here is to provide a model and some terminology which gineers and AI/ML specialists. The model is illus- can facilitate greater collaboration. trated with a healthcare use case which uses deep Against this background, the paper is organised as follows. reinforcement learning for treating sepsis patients. Section 2 reviews some of the rather “disconnected” activities It is argued that this model is general and that it in the AI and safety communities, and identifies foundations should underpin future standards and guidelines for for our collaborative model. In section 3 the model is pre- safety of this class of system which employ ML, sented, showing how long-established safety-critical software particularly because the model can facilitate collab- development principles can support “AI safety” activities, oration between the different communities. which in turn support system/functional safety. This model is illustrated in section 4 using a case study from healthcare. 1 Introduction Section 5 presents a discussion, particularly considering how the model might be developed and utilised to support collab- There is a growing recognition of the challenges posed by the oration between the AI and safety communities and this leads use of artificial intelligence (AI), and more specifically ma- into the conclusions in Section 6. chine learning (ML), in safety-critical systems. These chal- lenges have been recognised by the AI community and there have been several influential publications, e.g. on “concrete 2 Safety of AI: Two Communities problems in AI safety” [Amodei et al., 2016], which have This section aims to identify both differences in views of the drawn the community’s attention to the potential for harm AI and safety communities and a basis on which to build arising from undesirable, and unanticipated, learnt behaviour the collaborative model. Both communities run regular “AI of ML-based systems. Safety” events. However, the attendance at such events is In parallel, the safety community has been considering the skewed, so they don’t really act as a meeting of minds. Rather impact of AI and ML on system safety with an emphasis on than viewing this as a sociological problem, it is more helpful autonomous systems (AS), e.g. including the “gaps” in engi- to consider the differences in technical viewpoints between neering processes that arise from the use of AI and ML [Bur- the communities to build the collaborative model. ton et al., 2020]. In practice, development of safety-critical systems is strongly influenced by standards but they are slow The AI community viewpoint is considered first, then the safety perspective is introduced in two parts. First, we con- ∗ sider broad safety processes, then we consider “good prac- Copyright c 2020 the authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) tice” in safety-critical software engineering. 2.1 “AI Safety” using relevance of inputs (to the robot’s set of hypotheses) There is a growing interest in “AI safety” and many AI prac- and only learning from relevant inputs, thus reducing the im- titioners and researchers have sought to identify and address pact of mis-specified feature or objective spaces. potential problems. We start with a seminal paper which sets Further, there is considerable interest in the problem of out “concrete problems in AI safety” [Amodei et al., 2016] “adversarial” resilience for RL with many proposed solution arising from the use of reinforcement learning (RL). A brief approaches, e.g. [Behzadan and Hsu, 2019]. This is un- introduction to RL is given in section 4.1; here all we need to doubtedly relevant in some application domains, and needs know is that with RL the system learns by getting “rewards” to be addressed in a complete approach to using AI in safety- from interaction with the environment and seeks to maximise critical contexts, but it is outside the scope of this paper. its reward to achieve the desired behaviour, represented as a More philosophically, the challenges of “AI Safety” arise “policy”. The paper uses a cleaning robot as an example and because systems are developed using “narrow” or “specific” identifies undesirable (in safety terms, hazardous) behaviours AI so they do not have an understanding (semantic model) for the robot. The description of these undesirable behaviours of real-world objects that would be achieved by Artificial given here is intended to be general in terms of AI technolo- General Intelligence (AGI). Whilst AGI is a distant prospect, gies and their applications: some work on AGI safety is producing interesting results, e.g. causal influence diagrams that bear a resemblance to the mod- 1. Avoiding negative side effects – ensuring that the be- els used in safety engineering and thus might be a route to haviour meets safety constraints; from a safety perspec- bridging between these communities [Everitt et al., 2019]. tive this equates to avoiding hazards; 2. Avoiding reward hacking – ensuring that the behaviour 2.2 System Safety/Functional Safety doesn’t get a high reward by “gaming” and producing System safety is a relatively long-established discipline, gen- solutions that are valid in some literal sense but don’t erally believed to have originated in US military projects in meet the designer’s intent; the 1940s. There are now well-established processes which include hazard identification and risk assessment to guide the 3. Scalable oversight – this concerns the ability of the hu- design to produce acceptably safe systems. Although not uni- man to interact with the system both to monitor it and to versally accepted, in many industries, e.g. healthcare [NHS respond to requests for confirmation of decisions prior Digital, 2018], it is common for the results of the develop- to actions being taken; ment and safety processes to result in the production of a 4. Safe exploration – when the system needs to try new safety case, a structured argument, supported by evidence, things to learn a better solution, how can negative out- intended to justify that a system is acceptably safe for a spe- comes (i.e. new hazards) be avoided; cific application in a specific operating environment [Kelly and Weaver, 2004]. 5. Robustness to distributional shift – how the system System safety has evolved and the term “functional safety” adapts to changes in the operational environment, e.g. is often used for safety processes addressing computers and for a medical diagnosis system which is developed us- software. Many of the standards for functional safety provide ing an Asian population but deployed in Europe. requirements for product design and for the system, software These problems can arise from a number of underlying and hardware development processes. They often use safety causes, e.g. inappropriate reward functions and mis-specified integrity levels (SILs) to rank the (potential) risk posed by feature spaces. Work at DeepMind [Leike et al., 2018] iden- the system and vary the requirements with SIL, for example tified similar problems to [Amodei et al., 2016] but also high- requiring higher levels of redundancy and diversity in the sys- lighted the “reward-result gap”, where the agent fails to con- tem architecture at higher SILs (higher risk). verge on an optimal policy. Their proposed research direction System and functional safety have continued to evolve but focuses on “reward modelling” and points out a number of generally lagging behind technology developments [McDer- approaches including leveraging existing data, imposing side mid, 2017] – this lag is perhaps most apparent with the intro- constraints and adversarial training, to achieve “agent align- duction of AI and ML into systems. Although there are some ment”. The “desiderata” (desirable properties) for their work emerging standards such as UL4600 [ANSI & UL, 2020] gen- are that the results are: scalable, economic and pragmatic. It erally they set requirements “around” the AI/ML elements but also rests on two assumptions (slightly rephrased): are not clear about what form of evidence would be needed • It is possible to learn user intentions to sufficiently high to show that these requirements are met. accuracy; Instead, for advice on the safe use of AI and ML, one has to turn to the research community rather than standards. Ini- • For many learning tasks, evaluation of outcomes is eas- tiatives such as the AAIP have pioneered work on safety of ier than producing the correct behaviour. autonomous systems employing AI and ML. Two particularly Other work considers the issues of mis-specified feature important contributions are an ML lifecycle [Ashmore et al., spaces [Bobu et al., 2018]. When training robots based on 2019] and an assurance process known as AMLAS [Picardi et physical human-robot interaction the robot may not have a al., 2020] that builds on the ML lifecycle. The ML lifecycle rich enough hypothesis space to capture everything the user has three stages, and identifies desiderata for each stage: cares about so the robot may “misinterpret” user input and 1. Data management – including collecting, pre-processing learn inappropriate actions. The proposed solution includes and analysing data to be used for model learning; 2. Model learning – including the model selection and choice of hyper-parameters used to control the learning process; 3. Model verification – including the use of mathematical analysis and testing. As an example of the desiderata, robustness in model learn- ing considers the model’s ability to perform well in circum- stances where the inputs encountered at run time are different to those present in the training data, e.g. due to environmen- tal uncertainty, such as flooded roads, which is analogous to “distributional shift” in [Amodei et al., 2016]. This shows the possibility of reconciling the viewpoints of the two commu- Figure 1: The Collaborative Model nities to provide common terminology to underpin a collabo- rative model. 1. System safety/functional safety – application of a clas- 2.3 Safety-Critical Software Engineering sical safety process to understand hazards, produce de- There are many good practices in developing safety-critical rived safety requirements and to gather evidence from software, some of which are reflected in standards such as layers 2 and 3 into the safety/assurance case; IEC 61508 Part 3 [IEC, 2010]. These include use of for- mal methods for specifying requirements, coverage criteria 2. AI/ML safety – developing the ML systems to meet for testing, and traceability from specifications to programs their performance objectives and satisfaction of the to assist in verification. The required practices also vary with ML lifecycle desiderata as well as the derived safety SIL, for example with more stringent requirements for test requirements, and providing evidence to support the coverage at higher SILs. It is not possible to cover all of these safety/assurance case; practices here but we use two of them to illustrate our points 3. Safety-critical software engineering – application of here. good software engineering practices to the development First, all programming languages have constructs which of the AI/ML software. can lead programmers to make mistakes. This has led to the definitions of so-called “safe subsets” of programming lan- ML systems are generally developed in an agile manner, often guages, e.g. the C language [Hatton, 2004], which restrict the with daily builds as new training data becomes available; this use of the language so as to avoid the most error-prone con- is reflected in the iterative loop between the top two layers. structs. These subsets are well-defined and it is practicable to However, for simplicity, the description of the layers ignores use tools to help police their use. the iteration. Second, static analysis assesses a program without execut- ing it. This can identify “undesirable features” in programs, 3.2 Layer 1: System Safety/Functional Safety including indexing outside array bounds and dividing by zero, The system safety/functional safety layer has three major el- that could lead to undefined behaviour. Even if such “undesir- ements: able features” are not “unsafe”, their presence may undermine the results of other analyses or test results, so static analysis 1. Hazard analysis – use of hazard analysis techniques or is of broad utility in showing the integrity of programs. Lan- domain knowledge to identify hazards and to estimate guage subsets and static analysis are not disjoint concepts and the associated risks; some of the most powerful tools combine the two, e.g. [Mc- 2. Derived safety requirements (DSRs) – based on the iden- Cormick and Chapin, 2015]. tified hazards, establishing requirements for the ML and These practices address conventional programming lan- other elements of the system so that their contribution to guages but ML models are programs too and the collaborative hazards is controlled or their role in mitigating hazards model explores their relevance for ML. is clearly defined; 3. Safety/assurance case – arguments for the safety of 3 The Collaborative Model: An “AI the system, supported where appropriate by evidence sandwich” from the ML layer that the software meets the relevant The collaborative model aims to provide a structure that links desiderata and derived safety requirements. the different world views of the AI and safety communi- Often requirements for AI/ML systems are articulated at a ties. The AI elements are in the middle, with safety elements high level, e.g. to perform better than a human, and it can be above and below – hence the “sandwich” analogy. difficult to map down to concrete requirements on the AI/ML components (the semantic gap [Burton et al., 2020]). This 3.1 The Model is an open issue for reconciling conventional systems engi- The collaborative model, see Fig. 1, has three layers which neering with AI/ML, but it is not always so difficult when we number top-down. The intent of the layers is: considering DSRs, see the case study in section 4. 3.3 Layer 2: “AI/ML Safety” performing actions and receiving feedback from the environ- The “AI/ML safety” layer is intended to encompass the ML ment [Sutton and Barto, 2018]. Often the environment is rep- software development including verification and has three resented using a Markov Decision Process (MDP). A policy major elements: defines the agent’s behaviour and maps the perceived states of the environment to actions for the agent to take. There 1. Model alignment – meeting the design intent which is are many different RL algorithm and the case study uses a informed by the hazard analysis, as the intent, inter alia, widely used modern RL algorithm known as double deep Q- is to avoid hazards; Networks (double DQN) [Mnih et al., 2015]. 2. Data collection and ML model development – these are the first two stages of the ML lifecycle [Ashmore et al., 4.2 Sepsis Case 2019] and are informed by the derived safety require- Sepsis is a life-threatening organ dysfunction caused by a ments; dysregulated host response to infection. It is estimated that 3. Satisfaction of the ML lifecycle desiderata – show- one in five deaths worldwide is caused by sepsis [Gallagher, ing that the relevant desiderata have been met, includ- 2020], but the optimal treatment strategy for sepsis remains ing verifying satisfaction of the derived safety require- unclear [Marik, 2015]. Evidence suggests that current prac- ments and production of relevant evidence to support the tices in the administration of intravenous fluids and vasopres- safety/assurance case. sors are suboptimal [Waechter et al., 2014]. Consequently, researchers have used RL to learn the optimal treatment strat- The term “model alignment” is used as a generalisation of egy, e.g. [Raghu et al., 2017]. “agent alignment” and is intended also to include the avoid- First, we re-implemented the work from [Raghu et al., ance of problems such as distributional shift. 2017] using double DQN. The state space for the MDP in- cluded patients’ demographics, Elixhauser premorbid status, 3.4 Layer 3: Safety-Critical Software Engineering vital signs, laboratory values, fluids and vasopressors re- This layer incorporates good practices from safety-critical ceived. The action space for the MDP is discretised into 25 software engineering to ensure the integrity of the code; for possible actions with 5 possible choices for intravenous flu- brevity only two elements are illustrated: ids and vasopressors respectively, as shown in Table I. Table I 1. Coding standards – the use of guidelines/rules to avoid also shows the detailed dose ranges and dose medians for the the more error-prone constructs in programming lan- five vasopressor choices; this is important as the case study guages, supporting ML model development; focuses on the safety of vasopressor administration. Note that vasopressor dosage is shown in mcg/kg/min of Nore- 2. Static Analysis/verification – the use of static analysis pinephrine equivalent. The maximum dosage change occurs tools to help identify undesirable features in programs when the recommendation changes from action 0 to action 4, so they can be eliminated, supporting satisfaction of ML or vice versa, in the following step to treat the same patient. desiderata. This change is 0 to 0.786 mcg/kg/min, as 0.786 mcg/kg/min There are many different static analysis tools; an example is is the median of the fourth quartile and is considered to be a presented in section 4. dangerous dose change in one step in clinical practice. A sud- den major change in vasopressor dosage can result in acute 4 The Case study hypotension, hypertension or cardiac arrhythmias [Fadale et al., 2014] [Hospira UK Ltd, 2018] [Allen, 2014] (hypoten- The case study used to illustrate the collaborative model sion can arise from rapidly decreasing doses, with hyperten- is from healthcare, particularly the use of RL to derive sion or arrythmias arising from rapidly increasing doses). “optimal” policies for treatment of sepsis, which is a life- Next, we evaluated the original learnt policy and discov- threatening condition and a major cause of fatalities in hos- ered that it contains far more of these sudden major changes pitals. The case study is from our previously published work when recommending the vasopressor dosage than are found [Jia et al., 2020], but extended here to address the three lay- in the clinicians’ treatments based on the real data – MIMIC ers in the collaborative model. Key aspects of the case study III [Johnson et al., 2016] (here we refer to these treatments as are introduced against each layer in the model, with the great- the clinician policy for ease of comparison). These results are est emphasis placed on the middle layer as this is where the shown in Fig. 3a. Thus the initial learnt policy raises some approaches of the AI and safety communities come together. safety concerns and we used the collaborative model to guide Before presenting the case study, we first give a brief intro- us develop a safer learnt policy. duction to RL; this is not intended to be an exhaustive discus- sion of what is a very complex topic but enough to enable the Layer 1: Hazard and Derived Safety Requirements case study to be understood. More details on the RL approach This layer is largely the province of the safety and domain can be found in [Jia et al., 2020]. specialists, i.e. clinicians in this case. In an ideal world, the hazard analysis would be based on 4.1 Reinforcement Learning a clinical pathway (a model of the treatment process, includ- RL is a very powerful ML technique which is widely used ing key decisions). For brevity, we illustrate the approach in complex decision making tasks to find an “optimal” pol- in terms of a single hazard identified using domain knowl- icy. It consists of an agent interacting with its environment by edge. The hazard is defined as: “sudden change in vasopres- Table 1: Dosage Actions (from [Jia et al., 2020]) 0.75 mcg/kg/min between treatment steps for an individual patient closer to clinician policy (maps to “System recom- mends sudden change”); DSR1: accurately identify hazards, hazards causes and safety constraints (maps to “Insufficient safety constraints de- rived”); DSR2: (a) meet the desiderata for the ML lifecycle [Ash- more et al., 2019] and (b) show satisfaction of safety con- straints arising from DSR1, (maps to “insufficiently thorough development”); sor dosage”. DSR3: avoid implementation deficiencies that could give It can be seen from Fig 3a that the original learnt policy is rise to unintended behaviour (maps to “Implementation de- “less safe” than the clinician policy, specifically 35% of the fects”). patients have a sudden major change in the learnt policy as DSR1, DSR2 and DSR3 are intended to support DSR0, opposed to 2.6% in the clinician policy. This can be viewed and should guide the development of the ML in the middle as an example of the hazard and it is visible when evaluat- layer in Fig. 1. Our focus here is on DSR1 and DSR2, in ing the learnt policy, which is consistent with the assumption order to satisfy DSR0, as this shows most clearly the links in [Leike et al., 2018] that “evaluation of outcomes is easier between the concerns of the safety and AI communities. than producing the correct behaviour”. It can also be seen Evidence that the DSRs have been met should form part as showing a failure to achieve “model alignment” and this of the safety case. Safety arguments are often produced triggers an iteration around the first two layers of the collab- graphically, e.g. using the Goal Structuring Notation (GSN) orative model, producing explicit DSRs. [Kelly and Weaver, 2004]. For brevity, we do not set out In a normal safety analysis a systematic approach would the safety argument here, but discuss the evidence needed to be taken to identify causes of hazards, and there are many meet DSR0 in the following two subsections. The evidence approaches to identifying and representing hazard causes, for DSR1 comes from layer 1 in Fig. 1. In a full development e.g. SHARD for software-intensive systems [Pumfrey, 1999]. it would rest on the rigour of the hazard analysis process and However, what we are interested in here is understanding the the suitability of domain knowledge; for the purpose of this potential causes of the hazard across the layers in the Collab- paper we assume that an appropriate hazard has been identi- orative Model, and we adapt the well-known “bow-tie” dia- fied, based on clinical knowledge, hence DSR1 is satisfied. gram for this purpose. This enables us to show causes and consequences of the hazard derived from an understanding of Layer 2: “AI/ML Safety” the potential limitations at each layer, see Fig. 2. Figure 2: Bow-tie showing Hazard, Causes and Consequences (a) Original learnt policy (b) Our modified policy with safety constraint In Fig. 2 the consequences are the clinical outcomes de- scribed previously. The proximate cause of the hazard is Figure 3: Performance of Learnt Policies (from [Jia et al., 2020]) “System recommends sudden change” (in excess of 0.75 mcg/kg/min) which has three potential causes relating to the We start with DSR1 and consider what can contribute to three layers in the model in Fig. 1. The “Insufficient safety this hazard, from an “AI safety” perspective, which involves constraints derived” reflects inadequacy in hazard analysis; understanding some of the details of the RL process. First, if a hazard is missed or misunderstood then the system de- the MDP only depends on the current state, that is, given the veloped might not be safe. The “Insufficiently thorough de- current state, the future state does not depend on the cumu- velopment” means failure to meet the safety constraints and lative history of past states. If the current state in the MDP the desiderata for ML including avoiding relevant “AI Safety” doesn’t capture the dose delta or relative dose change com- problems such as those identified in [Amodei et al., 2016] pared with the previous dose, there is no guarantee that the and [Leike et al., 2018]. This can also be seen as a failure to agent will learn an optimal policy avoiding the sudden ma- achieve “model alignment”. The “Implementation defects” jor dose change. This is an example of mis-specified feature refers to code-level problems, such as “divide by zero” that spaces [Bobu et al., 2018] and is corrected by extending the can have unpredictable effects. state space to enable the agent to take into account the differ- There are four DSRs arising from the bow-tie diagram: ence between the current step and the previous step in terms DSR0: reduce changes in vasopressor dosage of more than of vasopressor dose while learning the policy. Second, the process of learning an optimal policy is to min- imise the cost function. If the cost function incorporate the safety constraint, then it can guide the agent to learn a safer policy. Thus, the cost function was modified to include an ad- ditional safety constraint “penalising” large changes in dose (specifically those over 0.75 mcg/kg/min between dose rec- ommendations) which is one of the approaches suggested in [Leike et al., 2018]. Therefore, two changes were made in order to guide the agent to learn a safer policy by altering the Figure 4: Extract from Pylint Log File state space to include dose delta and by adding a regulariza- tion term in the cost function to “penalise” sudden changes, see [Jia et al., 2020] for more details, including the explicit cost functions used. Figure 5: Example of Code Rating After the implementation of these two alterations we have learnt a new modified policy and Fig.3b shows that the mod- ified policy has fewer sudden major changes compared to the The evidence to meet DSR3 is based on the log file, show- original learnt policy shown in Fig. 3a (the rate of such sud- ing that the error count (E) is zero and, desirably, an assess- den major changes of vasopressor dose has been reduced by ment of the other entries to “sentence” them and to decide 77.5%). Particularly, we found that only 7.87% of patients which need to be addressed, and which can be tolerated. For have this sudden major change in the modified policy, which example, some of the refactoring comments (R) can indi- is much closer to the 2.6% in the clinician policy. Therefore, cate aspects of the program that are hard to test, and which this is considered to be much safer and to support DSR1 and may therefore weaken the value of the evidence in support hence DSR0, improving the “model alignment”. of DSR2 if not corrected. Others, e.g. C0321, are merely The evidence to support DSR2 is multi-faceted. DSR2 (a) stylistic and would not undermine the evidence from layer 2 can be supported through a development log, and other de- against DSR2. velopment artefacts to show how the three stages of the ML lifecycle (indicated in section 2.2) have been implemented ap- 5 Discussion propriately. In general, judgement is needed on sufficiency of Our model provides a “big picture” which allows the view- evidence. DSR2 (b) is supported by the performance data, points from the AI and safety communities to be reconciled. see Fig.3b, and the comparison with Fig. 3a which show the The illustration of our model in section 4 has shown that these results of encoding the safety constraint in the learnt policy. viewpoints can be drawn together constructively to improve the safety of a system employing ML. Layer 3: Safety-Critical Software Engineering It is worth considering the relationship of “AI/ML safety” This layer is the province of (software) safety specialists. It issues, e.g. [Amodei et al., 2016] [Leike et al., 2018] [Bobu et “provides” coding standards to support the ML development al., 2018] and the ML lifecycle [Ashmore et al., 2019] to sup- and static analysis techniques to support demonstration of the port layer 2 in our collaborative model. As already mentioned integrity of the developed software. The evidence to support some of these problems and desiderata in the ML lifecycle DSR3 comes from the use of the static analysis techniques in overlap, e.g. “distributional shift” and robustness. Further, is- this case. sues such as the “reward-result gap” and mis-specified feature The software we developed in this case study is written in spaces can be seen to relate to system safety concerns through Python and uses the TensorFlow library, thus we have cho- the case study. If a systematic relationship can be established sen Pylint (see: https: www.pylint.org) which supports both between the problems identified in the “AI Safety” commu- coding standards and error detection to do the static analysis. nity and desiderata in the ML lifecycle, then this would fur- Fig. 4 shows a fragment of the report (log file) from running ther cement the links between the viewpoints of the AI and Pylint on our code (the code module named “deeprl”). The safety communities. labels are as follows: The illustration at layer 3, Safety-Critical Software Engi- C: coding convention violation; neering, was limited due to space constraints. A decision was R: for “refactoring” to improve the score against some made to focus on code-level issues for the paper, but other quality metric; techniques are relevant. For example, several standards pro- E: for programming errors, likely a “bug”; mote the use of formal methods. Whilst full formal speci- W : for warnings, e.g. minor programming errors or stylis- fication of ML systems is likely to be difficult due to the se- tic issues. mantic gap [Burton et al., 2020] it may be possible to use par- Fig. 4 shows the log file for a stage in the development tial formal specifications for critical (safety) properties [Salay of the “deeprl” module. The progress in improving the code and Czarnecki, 2019] which also opens up the opportuni- module can be seen via the overall rating in Fig. 5. Pylint is ties for more formal verification. Further, all safety-critical quite “pedantic” (the log files are usually very long), so it is systems should be tested, and there is a need for systematic very hard to get a score of 10 – but it is important to remove approaches to testing for systems employing AI or ML, e.g. the type E problems (and F which are fatal and prevent the considering how to guide testing based on estimates of resid- analysis from proceeding). ual risk [Wotawa, 2019]. In general, there is a need to con- sider good practice in Safety-Critical Software Engineering, and see how it relates to that of the other community. to identify what techniques can be drawn across into AI/ML Due to the speed of development of AI and ML and their development. As ML software tends to be developed in a dy- safety-related applications, not least in autonomous vehicles, namic and iterative fashion this suggests that it would be de- the standards are lagging behind the technology – and, ar- sirable to draw on work on agile approaches to safety-critical guably, the gap is growing. The collaborative model, or its software development such as [Hanssen et al., 2018] as well. future refinements building on more detailed models, e.g. the However, as noted above, most functional safety standards ML lifecycle [Ashmore et al., 2019] and assurance processes use SILs (or variants thereof) and alter the requirements for [Picardi et al., 2020], should provide a framework for pro- techniques to apply to each stage of the software development ducing future standards for safety critical systems using ML. process based on SIL. The discussion of the ML lifecycle It remains to be seen whether or not SILs form part of such [Ashmore et al., 2019] identifies multiple ways of address- standards – their introduction and use has been pragmatic (as ing some of the desiderata, but it is not obvious that these much to manage cost as safety) and, at minimum, the ra- can be “ranked” in terms of contribution to risk reduction and tionale for their introduction should be reconsidered as stan- hence arranged in SIL-order. Whilst it might be possible to dards for ML-based systems are developed. preserve the SIL concept at layer 3 in the collaborative model, However, as noted above, there are open issues, perhaps it is much less obvious how to do this at layer 2. Either this one of the most important is whether or not the “AI Safety” means the two communities have a long way to go before they problems can be mapped to the ML lifecycle model and thus can fully support the notion of SILs for ML-based systems, or addressed in a unified way with the ML desiderata in the mid- there needs to be an acknowledgement that the SIL concept dle layer of our collaborative model. These open issues can is not readily applicable in the context of ML-based systems. best be addressed through greater collaboration between the Additionally, standards for safety-critical systems often set two communities and they will be a focus in our future work. stringent safety targets, e.g. unsafe failure rates of one in 10 million operating hours. In contrast, ML developers are often Acknowledgements content with false positive/false negative rates of the order of This work funded by the Assuring Autonomy International a few percent – in some applications, e.g. autonomous driv- Programme at the University of York and by Bradford Teach- ing, this might equate to many “failures” per hour! These ing Hospitals NHS Foundation Trust. The views expressed in figures seem incompatible. However, the safety targets set in this paper are those of the authors and not necessarily those standards are at system level, not algorithm level, so they can of the NHS, or the Department of Health and Social Care. be reconciled, at least in some cases, with suitable system ar- chitectures including redundancy and diversity. Nonetheless, References more work is needed to show how to bridge this “gap” and to [Allen, 2014] John M Allen. Understanding vasoactive med- find ways of demonstrating that AI/ML-based systems meet ications: focus on pharmacology and effective titration. the stringent unsafe failure-rate targets that are widely used Journal of Infusion Nursing, 37(2):82–86, 2014. for safety-critical systems. [Amodei et al., 2016] Dario Amodei, Chris Olah, Jacob The case study uses RL and it was relatively easy to change Steinhardt, Paul Christiano, John Schulman, and Dan the feature space and to introduce a safety constraint in the Mané. Concrete problems in AI safety. arXiv preprint cost function to satisfy the derived safety requirements in a arXiv:1606.06565, 2016. way that is traceable. With other ML techniques, e.g. un- supervised learning, it may be less easy to embed the safety [ANSI & UL, 2020] ANSI & UL. Standard for Evaluation of constraint in the learning process and it may be necessary Autonomous Products. https://www.shopulstandards.com/ instead to design the system with a separate “monitor” that ProductDetail.aspx?productid=UL4600, 2020. polices the system behaviour against the safety constraint, as [Ashmore et al., 2019] Rob Ashmore, Radu Calinescu, and suggested in [McDermid et al., 2019], or to use other forms of Colin Paterson. Assuring the machine learning lifecy- diversity and redundancy as discussed above. Defining run- cle: Desiderata, methods, and challenges. arXiv preprint time monitors is not always straightforward, so the generality arXiv:1905.04223, 2019. of our collaborative model across the wide and growing range [Behzadan and Hsu, 2019] Vahid Behzadan and William of ML techniques remains an open issue. Hsu. Rl-based method for benchmarking the adversarial resilience and robustness of deep reinforcement learning 6 Conclusions policies. In International Conference on Computer Safety, Reliability, and Security, pages 314–325. Springer, 2019. The collaborative model has shown how to link the view- points of the AI and safety communities, with the DSRs and [Bobu et al., 2018] Andreea Bobu, Andrea Bajcsy, Jaime F evidence flows providing the critical links in Fig. 1. Whilst Fisac, and Anca D Dragan. Learning under misspecified the model is abstract, the case study has enabled us to show objective spaces. arXiv preprint arXiv:1810.05157, 2018. that the ideas can be made “concrete” although it has not been [Burton et al., 2020] Simon Burton, Ibrahim Habli, Tom possible to provide full technical detail at all the layers. It Lawton, John McDermid, Phillip Morgan, and Zoe Porter. is hoped that this model will help to facilitate broader en- Mind the gaps: Assuring the safety of autonomous sys- gagement between the AI and safety communities by giving tems from an engineering, ethical, and legal perspective. a structure in which they can recognise their own viewpoint Artificial Intelligence, 279:103201, 2020. [Everitt et al., 2019] Tom Everitt, Ramana Kumar, Victoria [McDermid, 2017] John McDermid. Playing catch-up: The Krakovna, and Shane Legg. Modeling agi safety frame- fate of safety engineering. In Developments in System works with causal influence diagrams. arXiv preprint Safety Engineering, Proceedings of the Twenty-fifth Safety- arXiv:1906.08663, 2019. Critical Systems Symposium, Bristol, UK, ISBN, pages [Fadale et al., 2014] Kristin Lavigne Fadale, Denise Tucker, 978–1540796288, 2017. Jennifer Dungan, and Valerie Sabol. Improving [Mnih et al., 2015] Volodymyr Mnih, Koray Kavukcuoglu, nurses’ vasopressor titration skills and self-efficacy via David Silver, Andrei A Rusu, Joel Veness, Marc G Belle- simulation-based learning. Clinical Simulation in Nurs- mare, Alex Graves, Martin Riedmiller, Andreas K Fidje- ing, 10(6):e291–e299, 2014. land, Georg Ostrovski, et al. Human-level control through [Gallagher, 2020] James Gallagher. ‘Alarming’ one in deep reinforcement learning. Nature, 518(7540):529–533, five deaths due to sepsis. https://www.bbc.co.uk/news/ 2015. health-51138859, 2020. Accessed: 2020-03-01. [NHS Digital, 2018] NHS Digital. DCB0160: Clinical risk [Hanssen et al., 2018] Geir Kjetil Hanssen, Tor Stålhane, management: its Application in the Deployment and Use of health IT Systems. 2018. and Thor Myklebust. SafeScrum R -Agile Development of Safety-Critical Software. Springer, 2018. [Picardi et al., 2020] Chiara Picardi, Colin Paterson, Richard [Hatton, 2004] Les Hatton. Safer language subsets: an Hawkins, Radu Calinescu, and Ibrahim Habli. Assurance argument patterns and processes for machine learning in overview and a case history, misra c. Information and Soft- safety-related systems. In Proceedings of the Workshop on ware Technology, 46(7):465–472, 2004. Artificial Intelligence Safety (SafeAI 2020), 2020. [Hospira UK Ltd, 2018] Hospira UK Ltd. Noradrenaline [Pumfrey, 1999] David John Pumfrey. The principled design (Norepinephrine) 1 mg/ml Concentrate for Solution for In- of computer system safety analyses. PhD thesis, University fusion. https://www.medicines.org.uk/emc/product/4115/ of York, 1999. smpc, 2018. Accessed: 2020-03-01. [Raghu et al., 2017] Aniruddh Raghu, Matthieu Ko- [IEC, 2010] IEC. IEC 61508-3:2010, Functional safety morowski, Imran Ahmed, Leo Celi, Peter Szolovits, and of electrical/electronic/programmable electronic safety- Marzyeh Ghassemi. Deep reinforcement learning for related systems - Part 3: Software requirements. https: sepsis treatment. arXiv preprint arXiv:1711.09602, 2017. //webstore.iec.ch/publication/5517, 2010. [Salay and Czarnecki, 2019] Rick Salay and Krzysztof Czar- [Jia et al., 2020] Yan Jia, John Burden, Tom Lawton, and necki. Improving ml safety with partial specifications. In Ibrahim Habli. Safe reinforcement learning for sepsis International Conference on Computer Safety, Reliability, treatment. In 2020 IEEE International conference on and Security, pages 288–300. Springer, 2019. healthcare informatics (ICHI), pages 1–7. IEEE, 2020. [Sutton and Barto, 2018] Richard S Sutton and Andrew G [Johnson et al., 2016] Alistair EW Johnson, Tom J Pollard, Barto. Reinforcement learning: An introduction. MIT Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad press, 2018. Ghassemi, Benjamin Moody, Peter Szolovits, Leo An- thony Celi, and Roger G Mark. Mimic-iii, a freely ac- [Waechter et al., 2014] Jason Waechter, Anand Kumar, cessible critical care database. Scientific data, 3:160035, Stephen E Lapinsky, John Marshall, Peter Dodek, Yaseen 2016. Arabi, Joseph E Parrillo, R Phillip Dellinger, and Allan Garland. Interaction between fluids and vasoactive agents [Kelly and Weaver, 2004] Tim Kelly and Rob Weaver. The on mortality in septic shock: a multicenter, observational goal structuring notation – a safety argument notation. In study. Critical care medicine, 42(10):2158–2168, 2014. Proceedings of the dependable systems and networks 2004 workshop on assurance cases, page 6. Citeseer, 2004. [Wotawa, 2019] Franz Wotawa. On the importance of system testing for assuring safety of ai systems. In AISafety@ [Leike et al., 2018] Jan Leike, David Krueger, Tom Everitt, IJCAI, 2019. Miljan Martic, Vishal Maini, and Shane Legg. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871, 2018. [Marik, 2015] PE Marik. The demise of early goal-directed therapy for severe sepsis and septic shock. Acta Anaesthe- siologica Scandinavica, 59(5):561–567, 2015. [McCormick and Chapin, 2015] John W McCormick and Peter C Chapin. Building high integrity applications with SPARK. Cambridge University Press, 2015. [McDermid et al., 2019] John McDermid, Yan Jia, and Ibrahim Habli. Towards a framework for safety assurance of autonomous systems. In Artificial Intelligence Safety 2019, pages 1–7. CEUR Workshop Proceedings, 2019.