<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Complementary Risk Acceptance Criteria to Structure Assurance Cases for Safety-Critical AI Components</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Kläs</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rasmus Adler</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lisa Jöckel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Janek Groß</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Reich Fraunhofer IESE</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kaiserslautern</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>michael.klaes</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>rasmus.adler</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>lisa.joeckel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>janek.gross</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>jan.reich}@iese.fraunhofer.de</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Artificial Intelligence (AI), particularly current Machine Learning approaches, promises new and innovative solutions also for realizing safety-critical functions. Assurance cases can support the potential certification of such AI applications by providing an assessable, structured argument explaining why safety is achieved. Existing proposals and patterns for structuring the safety argument help to structure safety measures, but guidance for explaining in a concrete use case why the safety measures are actually sufficient is limited. In this paper, we investigate this and other challenges and propose solutions. In particular, we propose considering two complementary types of risk acceptance criteria as assurance objectives and provide, for each objective, a structure for the supporting argument. We illustrate our proposal on an excerpt of an automated guided vehicle use case and close with questions triggering further discussions on how to best use assurance cases in the context of AI certification.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>AI, which in this paper we understand as complex data-driven
models provided by Machine Learning (ML), promises
improved or additional functionalities that are essential for
autonomous systems, e.g., perception for self-driving vehicles.</p>
      <p>In many cases, such functionalities are safety-critical, so it is
highly likely that AI becomes safety-critical as well, meaning
that its failure can contribute to accidents. There are already
various reports on fatal accidents due to AI-related failures in
autonomous vehicles [Pietsch, 2021; Wakabayashi, 2018].</p>
      <p>In consequence, regulation [European Commission, 2021]
and certification for AI in safety-critical components is being
proposed. Regulation and certification are powerful means to
prevent the market introduction of unsafe products. This
contributes not only to safety but also to the economy as a few
unsafe products could affect user acceptance of all similar
products. The predictability of legal decisions can thus
contribute to economic success as long as liability risk and costs
for complying with regulations and standards are not
unreasonably high and hinder meaningful innovations.</p>
      <p>Unfortunately, existing safety standards are difficult to
apply in the context of AI [Salay and Czarnecki, 2018] and
revisions are still ongoing [ISO/IEC, 2021]. Therefore, we
currently do not have any standards that we can easily apply for
certifying AI.</p>
      <p>Argument safety claims with assurance cases (ACs) as an
established approach in safety engineering may provide an
alternative basis for audits and certification in the context of
AI [BSI, 2021]. They could structure the arguments for those
parts of a solution that are individual and highly innovative.</p>
      <p>Moreover, they could establish the basis for upcoming
evidence-based standards for AI certification.</p>
      <p>Initial proposals on how to apply the concept of ACs to AI
can be found in the literature. A prominent strategy is to argue
the safety objectives and safety requirements [Gauerhof et al.,
2020]. As the proposed strategy and patterns abstract from
specific safety objectives and derived safety requirements,
such approaches also largely abstract from AI-specific safety
concerns and required safety measures. Guidance for
achieving and arguing safety is thus inherently limited.</p>
      <p>One approach for overcoming this limitation is to argue
using known AI-related safety concerns and how they are
addressed by AI-specific safety measures [Schwalbe et al.,
2020]. A disadvantage is that it is hard to argue completeness
for the identified and addressed safety concerns.
Furthermore, such approaches can not explain yet what safety
measures and metrics with the respective thresholds need to
be applied to achieve a defined level of safety. To give just
one example, neither practical experience nor empirical
evidence exists on defining a specific neuron coverage level that
would be considered as sufficient when testing a deep neural
network for a concrete application.</p>
      <p>We think that the concepts and ideas introduced in existing
AC proposals can be aligned in a more comprehensible and
convincing argumentation if the risk acceptance criteria on
which the question of ‘How safe is safe enough?’ is founded,
is made explicit in the AC structure itself. We will show that
this allows, on the one hand, becoming explicit with respect
to AI-specific safety measures and, on the other hand,
soundly arguing higher-level safety-objectives.</p>
      <p>Contribution. Specifically, we propose using an AC
structure that splits at an early stage into two main claims and
related arguments. The first claim refers to the achievement
Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
of a probabilistic target value with a certain level of
confidence derived from applying a quantitative risk acceptance
criteria. The second claim is that the risk due to “failures”
caused by the AI is as low as reasonably practicable due to
safety measures applied during the AI lifecycle. In the
absence of evidence-based target values for specific safety
measures, we propose to monitor quality assurance activities
on a cost-benefit base and define respective stop criteria.</p>
      <p>This ensures, on the one hand, that quantitative objectives
are explicitly argued and underpinned with evidences. On the
other hand, the argumentation over the proposed lifecycle
stages contributes to a more comprehensive and justifiable
derivation of reasonable safety measures but without the need
for predefine targets for specific safety measures. The aim of
this paper is to stimulate the discussion about how to argue
safety for AI-based functions by rethinking traditional AC
patterns and strategies.</p>
      <p>Structure. The remainder of this paper is structured as
follows: First, we give some background on quality assurance in
the context of AI and introduce the concept of ACs as applied
in safety engineering (Sec. 2). Next, we discuss existing
proposals on how ACs could be used in the context of AI (Sec.
3). Then we introduce an example use case and illustrate our
proposal for structuring ACs (Sec. 4). Finally, we discuss a
selection of open question (Sec. 5) and conclude the paper
with an outlook on possible implications (Sec. 6).
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
    </sec>
    <sec id="sec-3">
      <title>Quality Assurance for AI</title>
      <p>AI-based software components raise new challenges for
quality assurance due to their functionality being derived from
data. Commonly, challenges and safety concerns like lack of
specification or interpretability are described [Adler et al.,
2019; Ashmore et al., 2019; Felderer and Ramler, 2021;
Sämann et al., 2020; Willers et al., 2020]. Several papers
collect existing methods and map them to mentioned challenges
[Adler et al., 2019; Sämann et al., 2020; Schwalbe and Schels
2019; Willers et al., 2020]. This raises two questions: whether
the list of safety concerns is complete, and to which extent
the available methods sufficiently address the safety concerns
[Adler et al., 2019]. We are currently not aware of any work
that could provide a sufficient answer on these questions.</p>
      <p>
        Another approach is to structure possible quality assurance
activities and measures according to the phases of the AI
lifecycle in which they are applied. Studer et al. [
        <xref ref-type="bibr" rid="ref16 ref25 ref33 ref5">2021</xref>
        ]
propose, for example, a process model based on CRISP-DM,
which is often used in data analysis projects, introducing a
quality assurance methodology for each project phase.
Ashmore et al. [
        <xref ref-type="bibr" rid="ref14">2019</xref>
        ] provide a survey of quality assurance
methods generating evidences for key assurance requirements
being met in each phase of the AI lifecycle. Here, there is a need
to show that the quality assurance methods applied during a
phase address all assurance requirements related to this
phase, and that the list of assurance requirements is complete.
      </p>
      <p>
        However, it is difficult to obtain a complete list of
quantitative quality assurance requirements. These strongly depend
on the task of the AI-based component and its application
context. Quality modeling approaches can contribute to a
more comprehensive list of quality requirements [Mayr et al.,
2012]. Siebert et al. [
        <xref ref-type="bibr" rid="ref16 ref25 ref33 ref5">2021</xref>
        ] propose a systematic approach for
building such a quality model for a concrete AI-based system
that defines the required aspects for each entity of the
AIbased system and how they can be measured. Still, further
research is needed to better understand (1) to which extent an
evidence generated by a certain method contributes to
arguing safety, (2) what suitable performance indicators for the
evidences are, and (3) when a certain method should be
preferred over another for a given context.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Assurance Cases</title>
      <p>
        ACs are heavily used in practice to assure safety. In
particular, if it is very challenging to argue safety, as in the case of
autonomous systems. In recent years, standards like UL 4600
[UL, 2021] or reports [Zenzic, 2020] have addressed the
development of such AC. The application rule VDE-AR-E
2842-61 [VDE, 2020] already proposes using ACs also for
other critical aspects of trustworthiness, such as fairness, as
illustrated by Hauer et al. [
        <xref ref-type="bibr" rid="ref16 ref25 ref33 ref5">2021</xref>
        ].
      </p>
      <p>An AC is defined as a reasoned, auditable created artifact
that supports the contention that its top-level claim (or set of
claims) is satisfied, including systematic argumentation and
the underlying evidence and explicit assumptions that support
the claim(s) [ISO/IEC/IEEE, 2019].</p>
      <p>The left part of Fig. 1 illustrates the three main building
blocks of an AC: (1) its top-level claims typically referring to
achieved objectives or fulfilled constraints, (2) an
argumentation supporting the top-level claims, and (3) evidences on
which the argument is based. The right part illustrates the
argumentation in a tree structure and its assumptions. The tree
is built from reasoning steps that connect lower-level claims
with a higher claim that can be concluded from these
lowerlevel claims. If the conclusion is only valid under some
assumptions, these assumptions shall be made explicit.</p>
      <p>There are different languages for modeling ACs, like the
Goals Structuring Notation (GSN) [SCSC, 2018] or Claim
Argument Evidence Notation [Adelard LLP, 2021]. The
common meta-model of these languages is defined in the
Structured Assurance Case Metamodel (SACM) [OMG,
2020]. This paper do not refer to a specific language but focus
on the fundamental idea of structuring the argument.
From a safety perspective, ACs are considered a promising
approach for arguing safety for AI-based systems, and
various authors have already proposed strategies and patterns.</p>
      <p>
        Picardi et al. [
        <xref ref-type="bibr" rid="ref14">2019</xref>
        ] presented an AC pattern for ML
models in clinical diagnosis systems, which they later refined and
supplemented by a process for generating evidences during
the ML lifecycle [Picardi et al., 2020]. The activities and
desiderata during the ML lifecycle are referred from Ashmore
et al. [
        <xref ref-type="bibr" rid="ref14">2019</xref>
        ]. The ML assurance claim is argued based on ML
safety requirements, operating environment, ML model,
development, and test data. In this context, the link between
system safety requirements and ML safety requirements is
addressed [Gauerhof et al., 2020]. In the recently published
AMLAS report, Hawkins et al. [
        <xref ref-type="bibr" rid="ref16 ref25 ref33 ref5">2021</xref>
        ] also provide generic
argument patterns and a process for ML safety assurance
scoping, ML safety requirements, ML data, model learning,
model verification, and model deployment.
      </p>
      <p>
        Wozniak et al. [
        <xref ref-type="bibr" rid="ref15 ref21 ref34 ref38">2020</xref>
        ] propose an argument pattern for
safety assurance that is aligned with the reasoning for
software and hardware in ISO 26262. They argue satisfaction of
an ML safety requirement over correctly decomposing the
safety requirements into sub-requirements and their
satisfaction, appropriate data acquisition, model design, as well as
implementation and training of the ML model.
      </p>
      <p>
        A strategy that does not argue the fulfillment of ML safety
requirements is provided by Gauerhof et al. [
        <xref ref-type="bibr" rid="ref26">2018</xref>
        ]. They
argue that the intended functionality is met by a sufficient
reduction of the root causes of functional insufficiencies, which
encompass underspecification, semantic and deductive gap.
      </p>
      <p>
        Based on previous works [Schwalbe and Schels, 2019;
2020], Schwalbe et al. [
        <xref ref-type="bibr" rid="ref15 ref21 ref34 ref38">2020</xref>
        ] propose arguing the sufficient
absence of risk for deep neural networks (DNN) arising from
the insufficiencies they see in their black-box nature, simple
performance issues, incorrect internal logic, and instability.
They propose a collection of measures to address these
insufficiencies, which include V&amp;V as well as best practices
during the creation of DNNs and on the system level.
      </p>
      <p>In summary, our review indicates that existing work is
driven by the safety community, which adapts established
safety patterns and concepts to AI. However, the presented
patterns are still on a rather abstract level, and the
applicability on a concrete use case comprehensively illustrated from
the top-level claim down toward the evidences has not been
described yet so far. This might indicate that transferring
traditional patterns to AI-based systems proves to be difficult.</p>
      <p>We observed two major challenges in argumentation for
which existing strategies and patterns still provide
insufficient support. (1) Completeness in the refinement of claims
in sub-claims appears difficult to show, especially, when
approaches argue over the refinement of safety requirements to
AI/ML requirements or about addressing ML insufficiencies.
For example, if we have a (most likely) incomplete list of
insufficiencies, we cannot argue about addressing each
insufficiency. (2) Considering the current state of AI quality
assurance, the proposed patterns commonly struggle with bridging
the gap between a low-level quantitative evidence, e.g.,
achieving a specific neuron coverage during AI testing, and
the claim of sufficient safety for the given application in a
convincing manner.</p>
      <p>We pinpointed as potential cause of these problems the fact
that the risk acceptance criterion underlying the top-level
claim on which the argumentation is based is either implicit
or different criteria are mixed and are thus not easy to
distinguish during refinement. We therefore claim that a clear
differentiation will allow more specific argumentation patterns
and better attribution of evidences to sub-claims.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Building Safety Assurance Cases for AI</title>
      <p>In this section, we will first introduce the example we will
use to illustrate our concepts. Then we will motivate the
consideration of a combination of two risk acceptance criteria to
structure ACs for AI. Finally, we will introduce a lifecycle
model and use it to argue completeness of the provided
refinement.
4.1</p>
    </sec>
    <sec id="sec-6">
      <title>Background of the Selected Example</title>
      <p>Automated guided vehicles (AGV) are driverless vehicles
that transport material. They are used in industrial
applications for realizing the flow of material and their safety
concepts do not rely on AI [DIN, 2017]. However, their
application is limited due to limited understanding of the
environment and their safety concept. Autonomous mobile robots
(AMR) overcome these disadvantages compared to
operatorcontrolled vehicle by using more sensors and AI. However,
the goal of achieving similar performance and flexibility as
an operator-controlled vehicle is hard to realize without using
AI in safety-critical functions like collision avoidance.
Operators of forklifts adapt their speed and safety distance
according to various aspects of the persons at risk, including speed,
motion path, eye contact, hand gestures signaling right of
way, etc. To implement a conservative version of such a
human-like collision avoidance system, the AMR needs an
AIbased component that understands whether a person at risk
has recognized the AMR and gives way to it. A critical failure
in this context is that the AMR falsely detects the signaling
of right of way. Such safety-critical false detections have to
be avoided sufficiently to assure that the AMR drives as least
as safe as an operator.
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>What does sufficient mean?</title>
      <p>The answer to the question of what sufficient means to
prevent a safety-critical failure like ‘false detections of a human
gesture’ depends on the related risks and the risk acceptance
criteria, as safety is defined as acceptable risk [IEC, 2010].</p>
      <p>We should keep two important aspects in mind when
discussing criteria for risk acceptance in settings where AI is part
of a safety-critical function: (a) AI is an emerging technology
that is still heavily in flux, with unforeseeable developments
and improvements in the upcoming years. Thus, coming up
with a fixed set of safety measures does not appear to be
reasonable. The argument that these safety measures minimize
risks as far as reasonably practicable easily becomes invalid.
Besides, it would be hard to argue that these measures are as
effective as existing ones in safety standards for traditional
software. (b) AI is also mainly applied to realize functions
that cannot be provided yet by traditional technological
solutions.</p>
      <p>A risk acceptance criterion that seems reasonable to apply
in the context of AI – considering (a) – states that the residual
risk after the application of safety measures should be As Low
As Reasonably Practicable (ALARP). The meaning of
‘reasonably practicable’ is not static but depends on the state of
the technology and the intended application, including the
underlying business case and related practical restrictions.
Considering ALARP as part of the argumentation assures that
when progress in technology allows for safer solutions, we
will see progress in safety.</p>
      <p>However, doing one’s best to avoid and mitigate risks is
obviously not enough to argue that the best was sufficient.
Accordingly, ALARP is only used in an ALARP region,
which is the region between an upper tolerance limit marking
unacceptable risk and a lower tolerability limit. Having this
in mind is of crucial importance when applying ALARP to
AI since the current state of AI technology might not be
advanced enough to realize a given application in a sufficiently
safe manner. For example, a state-of-the-art traffic sign
recognition algorithm might get one of 200 stop signs wrong
[INI, 2019]. If used as part of an autonomous vehicle, it may,
as a result, regularly ignore someone's right of way. The
algorithm might be as good as reasonably practicable but is still
not sufficiently safe to be applied in this specific application.</p>
      <p>Thus, we need at least a second risk acceptance criterion
that gives us a fixed limit.</p>
      <p>Most existing products have been developed according to
functional safety standards that follow the risk acceptance
criterion Minimum Endogenous Mortality (MEM). The idea
of MEM is that a technical system must not create a
significant risk compared to globally existing risks. For example, a
product should cause a minimal increase in overall death rates
compared to the existing population death rates. This idea
leads to very challenging safety requirements and low target
failure rates. Depending on the specific task, such low failure
rates might be hard to achieve in practice if AI is involved.</p>
      <p>An alternative criterion given a fixed target is Globalement
au moins aussi bon (GAMAB), which says that new technical
systems shall be at least as safe as comparable existing ones.
However, due to (b) it is hardly applicable in case of many
AI-based functions because no technical systems exist yet
that provide similar functions.</p>
      <p>An approach related to GAMAB is the idea of having a
‘positive risk balance’ (PRB). PRB is defined in ISO/TR
4804 as the ‘benefit of sufficiently mitigating residual risk of
traffic participation due to automated vehicles’ together with
Note 1 ‘This includes the expectation that automated vehicles
cause less crashes (3.7) on average compared to those made
by drivers’ [ISO/TR, 2020]. The idea of comparing the new
AI-based solution with the existing sociotechnical system can
lead to less challenging target failure rates compared to
MEM. This opens up new opportunities for arguing safety.</p>
      <p>In this paper, we do not discuss how to use this opportunity
to derive a target failure rate for an AI-based safety-critical
function, as this is very specific for the function and its usage
context, but not specific for AI. We do also not discuss how
to get from the target failure rate to a target upper boundary
on the uncertainty for AI outcomes. Instead, we assume in
our example that we would end up with a PRB-derived upper
boundary on safety-related uncertainty (u) that we could
accept for the AI outcomes: ‘The AI must not falsely detect a
signal for the right of way that was not actually given in more
than one of N cases’.</p>
      <p>Fig. 2 illustrates the relationship between ALARP and a
target-based criterion such as MEM, GAMAB, or PRB when
providing arguments that an AI-based solution is safe.</p>
      <p>ALARP can be considered as requesting a certain alpha
given by the ratio between the reduction of safety-related
uncertainty in the AI outcomes and the required effort/cost.
Given the business case for the planned solution and the state
of technology, this alpha might vary and is achieved in Fig. 2
at point B. Simply speaking, we request that as long as safety
measures exist that would increase safety with reasonable
investment, they are carried out. How this rather abstract
constraint can be further refined will be discussed in the context
of the AI lifecycle presented in Sec. 4.3.</p>
      <p>The upper boundary on acceptable safety-related
uncertainty u that is derived from the target-based criterion is
illustrated in Fig. 2 as a horizontal line. We consequently need to
argue that we are confident that the actual safety-related
uncertainty is below u. Please note that this is not achieved at
point A, but first at C, which we will discuss further,
including its implications when talking about testing in Sec. 4.3.</p>
      <p>Finally, we will always end up in one of two kinds of
situations: a situation where the target-based criterion dominates,
i.e., it defines the required investment (cf. Fig. 2), or a
situation where ALARP dominates the required investment. An
interesting question, which is, however, not directly related
to safety, is whether a solution requiring more investment
than reasonably practicable should actually be targeted.</p>
      <p>Uncertainty of
AI Failure</p>
      <sec id="sec-7-1">
        <title>PRB  u: maximum acceptable uncertainty (A)</title>
        <p>satisfied on confidence level cl (C)
u
A B</p>
        <p>C</p>
      </sec>
      <sec id="sec-7-2">
        <title>ALARP  α: reasonable</title>
        <p>cost-benefit ratio (B)
Investment required
to satisfy ALARP + PRB
Effort/
Cost</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Arguing considering the AI lifecycle</title>
      <p>As illustrated above, it seems reasonable to argue two
separate risk acceptance criteria. It is also advisable to argue each
criterion independently. Important for the argumentation,
especially for the argumentation of ALARP, are strategies that
assure that the refinement of the claims into sub-claims is
complete. An accepted way, which we also consider as most
promising, is to use a lifecycle model to argue completeness
and localize safety measures.</p>
      <p>
        The lifecycle model for AI components presented in Fig. 3
builds on existing work, in particular on the work of Ashmore
et al. [
        <xref ref-type="bibr" rid="ref14">2019</xref>
        ] and Gauerhof et al. [
        <xref ref-type="bibr" rid="ref15 ref21 ref34 ref38">2020</xref>
        ]. We adapted their
proposals. The objective was to achieve an even clearer
separation and better assignability of datasets, objectives and
corresponding safety measures to the individual phases. In
addition, we tried to keep the phases sufficiently generic to
be applicable for the various development processes in data
science projects that we are aware of.
      </p>
      <p>We distinguish between specification, construction,
analysis, testing, and operation. The proposed lifecycle model
explicitly does not include a 'data' phase. Subsuming
data-related activities in single phase neither matches reality nor
gives weight to the topic of data, which is at the core of any
AI lifecycle. Especially since different data with different
qualities are consumed in different phases, we modeled
individual data lifecycles as parallel streams that provide the
foundation for the evidences created in the AI lifecycle.</p>
      <p>Sufficiently safe</p>
      <p>Argue about a
combination of risk
acceptance criteria
Safety risk is as low as
reasonable practicable
Argue about relevant
lifecycle phases</p>
      <p>Quantitative safety
target is satisfied
Argue about relevant
lifecycle phases</p>
      <p>Target is
derived
from PRB
or MEM
Top level
AI-related claim
Top level
AI-related strategy
Claims making
risk acceptance
criteria explicit
Strategy to argue
completeness of
refinement
AI lifecycle
phases
with quality assurance
measures (founded on
appropriate datasets)
Data lifecycle with
measures to assure
appropriate data</p>
      <p>Existing Data
The proposed separation also results in the fact that certain
phases exclusively contribute either to argue ALARP or the
target-based risk acceptance criterion, as we will show.</p>
      <p>Specification considers, among other things, the definition
of the AI task, the target application scope [Kläs and Vollmer,
2018], which is comparable to the operation design domain in
automotive, and safety-relevant as well as other quality
requirements including system constraints like computational
resources. Although the AI specification has some
specialties, activities are largely AI-independent. Nevertheless, it is
a key phase for both types of risk acceptance criteria. A
sufficiently complete and correct specification is a prerequisite
to assuring that the safety risk will be as low as reasonable
practicable by proving guardrails for the subsequent phases,
but it also constitutes the AI-specific safety target and the
scope in which this target has to be achieved. For example,
the AI must not falsely detect a signal for the right of way that
was not actually given in more than one of N cases in its
previously defined target application scope.</p>
      <p>Construction is an AI-specific phase. Its objective is to
build a model from a training dataset that is able to fulfill the
AI task in the target application scope considering the
requirements and constraints defined in the specification.</p>
      <p>During construction, many design decisions have to be
made, e.g., on the kind of model and its hyperparameters
including topology, learning algorithm, stop criteria, etc.</p>
      <p>Many of these decisions are based on trial and error, taking
into account experience, so construction is a highly iterative
process in a close feedback loop with the analysis phase.</p>
      <p>We will not be able to show during construction that we
achieved a certain quantitative target, since we focus on
fitting but not soundly testing the model. Thus, this phase plays
no role in arguing regarding the target-based risk acceptance
criteria. However, considering quality assurance measures
during construction is important to argue ALARP. The
applied quality assurance measures should be guided by the
safety target, but also by other quality requirements and
constraints identified during the specification. Commonly, it is
not possible to define fixed success criteria for the different
quality measures. For example, in most cases, it would not be
reasonable to enforce a specific type of model or topology, or
request a maximum batch size m and run at least e epochs.
Instead, we propose analyzing and monitoring the efficiency
of the measures carried out and stopping in accordance with
ALARP if a reasonable saturation is achieved. For example,
if performing a random search on appropriate hyperparameter
values, the search shall continue as long as the model shows
reasonable improvements.</p>
      <p>Analysis is also an AI-specific phase that is performed in
a close feedback loop with construction to provide guardrails
for improving construction and indicating the achievement of
saturation for constructive quality assurance measures.
Analysis comprises besides means for explainability also “testing”
the model on validation data to estimate and monitor the
model performance with respect to the safety target.
However, although techniques are applied that are similar to the
techniques applied in the testing phase, the analysis phase
differs from the testing phase in that objective is to gather
insides to further improve the AI model rather than provide
evidence for the achievement of the specified safety target.
Therefore, the quality assurance measures in the analysis
phase help to argue ALARP but do not contribute to arguing
regarding the target-based risk acceptance criteria. In analogy
to the construction phase, it is difficult to define a priori
targets for most quality measures in the analysis phase. Rather,
their effect and thus their potential contribution to the safety
target must be monitored and continuously evaluated.</p>
      <p>Testing is also commonly considered to be AI-specific.
Unlike analysis, the objective of the testing phase is to
generate evidences on the achievement of the quantitative safety
target. In providing these evidences, testing depends on the
specification, including the definition of the AI task and the
target application scope. Moreover, it relies on specific
qualities of the test data that are not so relevant, for example, for
training data, such as that the data fulfills some
representativeness criteria and that it was not used previously during
construction or analysis. Since a test dataset can always
provide only a sample of all possible cases in the target
application scope, we need to underpin the evidence on satisfying
the safety target with some statistical confidence (cf. Fig. 2)
[Kläs and Sembach, 2019]. The confidence level cl, which is
independent of the target, may be set based on criticality or
requested integrity. For example, we might request that the
probability that we falsely confirm our target ‘The AI must
not falsely detect a signal for the right of way that was not
actually given in more than one of N cases in its previously
defined target application scope.’ is less than 1-cl = 0.0001.</p>
      <p>Moreover, it is important to understand that quality
assurance measures in the testing phase are not applied to further
improve the AI solution and thus does not provide evidences
to argue ALARP. Instead, they help argue that we are
confident that we have met the quantitative safety target.</p>
      <p>Operation in the sense considered here comprises
deployment, usage, maintenance, and retirement. Although most
aspects are not AI-specific, some are and need to be addressed
with appropriate safety measures. On the one hand, measures
for assuring ALARP include the collection of relevant
information during operation to further improve the AI solution as
part of maintenance. Moreover, situations have to be detected
in which the AI solution can only provide outcomes with high
uncertainty, in order to allow for appropriate
countermeasures to be taken on the system level to improve the overall
safety. Such situations may include settings where lighting
conditions make falsely detecting a signal for the right of way
much more likely. On the other hand, evidence on satisfying
the safety target obtained from testing strongly relies on
assumptions regarding the target application scope; if the AI
solution is applied in a different setting or relevant
characteristics of the application change, this evidence is no longer
valid. Therefore, safety measures have to be taken during
operation to detect such deviation between the target application
scope and the actual application scope.
5</p>
    </sec>
    <sec id="sec-9">
      <title>Discussion</title>
      <p>We proposed a strategy for arguing the safety of an AI-based
safety function combining two risk acceptance criteria. The
structure can help to come up with a sound argument but there
are ways of how one could attack this argument. A possible
attack on the ALARP argument is that the body of knowledge
concerning the effectiveness and the best combinations of
measures is not mature enough. A possible attack on the
quantitative claim based on PRB or MEM is that there is not
enough practical experience and empirical evidence. A
possible response to this attack is to collect data during operation
and to use market monitoring to strengthen the argument.
This approach is described already by the Safety Performance
Indicator [Koopman and Wagner, 2019] or GQM+Strategies
[Basili et al., 2010] but it needs to be tailored to the focused
argument for AI. By evaluating the reasoning with data, a
mature body of knowledge can be developed over time and
reflected in safety standards for AI.</p>
      <p>Considering standardization, we see three options for using
ACs. The first is to demand in a safety standard the
development of an AC for the considered product. The second is to
describe in product- or domain-specific safety standards a
generic AC that shall be instantiated. The third is to develop a
product- or domain-specific AC and use this AC to develop a
checklist-based safety standard where safety measures are
chosen depending on the specific criticality/integrity level.</p>
      <p>Considering certification, we see two main aspects. The
first is that the AC needs to comply with the standard
describing what the AC should look like. The second and more
important aspect is that the AC itself needs to be sound, so that
it can be accepted by the certification body. The challenge
here is that the review of the AC becomes easily more
elaborative than a checklist-based approach, meaning the
certification body needs much greater expertise. Furthermore, the
certification body can no longer give up responsibility for the
safety of the system by saying that it is only responsible for
compliance with standards but not for system safety.
However, this aspect is not specific for AI and is generally true for
the certification of complex systems by means of ACs.
6</p>
    </sec>
    <sec id="sec-10">
      <title>Conclusion</title>
      <p>We conclude that ACs have the potential to justify the usage
of AI in safety-critical systems. A prerequisite is, however,
that they argue that risks are as low as reasonably practicable
(ALARP) and that a reasonable target based on a quantitative
risk acceptance criterion has been chosen and is fulfilled. We
presented the first approach for explicitly augmenting the
achievement of these complementary objectives for AI.</p>
      <p>We also see the potential of the proposed structure for
traditional software as it would enforce claims about the
effectiveness of safety measures. It would put into question
whether one is really following the ALARP principle when
choosing safety measures according to recommendations
given by safety standards. It would also raise the question of
how effective software safety measures are and call for
empirical evidences about their effectiveness.</p>
      <p>Last but not least, we advocate that the concept of ACs
from the safety community should be carried over to the AI
community. In particular, researchers with a background in
empirical studies and data quality need to be involved in the
development and review of AI-related ACs.</p>
    </sec>
    <sec id="sec-11">
      <title>Acknowledgments</title>
      <p>Parts of this work have been funded by the Observatory for
Artificial Intelligence in Work and Society (KIO) of the
Denkfabrik Digitale Arbeitsgesellschaft in the project "KI
Testing &amp; Auditing". We would also like to thank Sonnhild
Namingha for an initial review of this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>[Adelard</surname>
            <given-names>LLP</given-names>
          </string-name>
          ,
          <year>2021</year>
          ]
          <article-title>Adelard LLP</article-title>
          .
          <source>CAE FRAMEWORK</source>
          ,
          <year>2021</year>
          , https://claimsargumentsevidence.org/. Accessed 10 May
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Adler et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Adler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Akram</surname>
          </string-name>
          , P. Bauer, p: Feth,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gerber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jedlitschka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jöckel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kläs</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Schneider</surname>
          </string-name>
          .
          <article-title>Hardening of Artificial Neural Networks for Use in SafetyCritical Applications -</article-title>
          A Mapping Study,
          <year>2019</year>
          . https://arxiv.org/abs/
          <year>1909</year>
          .03036.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Ashmore et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ashmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Calinescu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Paterson</surname>
          </string-name>
          .
          <source>Assuring the Machine Learning Lifecycle: Desiderata</source>
          , Methods, and Challenges, ACM Computing Surveys,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Basili et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Basili</surname>
          </string-name>
          , et al.
          <source>Linking Software Development and Business Strategy Through Measurement. Computer</source>
          .
          <volume>43</volume>
          (
          <issue>4</issue>
          ):
          <fpage>57</fpage>
          -
          <lpage>65</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[BSI</source>
          ,
          <year>2021</year>
          ] BSI,
          <string-name>
            <surname>Fraunhofer</surname>
            <given-names>HHI</given-names>
          </string-name>
          ,
          <article-title>Verband der TÜV</article-title>
          .
          <source>Towards Auditable AI Systems</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[DIN</source>
          ,
          <year>2017</year>
          ]
          <source>DIN EN ISO 3691-1</source>
          :
          <fpage>2017</fpage>
          -
          <article-title>Industrial trucks - Safety requirements</article-title>
          and verification,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[European Commission</source>
          ,
          <year>2021</year>
          ]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          .
          <source>Proposal for a Regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act)</source>
          ,
          <year>2021</year>
          . https://ec.europa.eu/newsroom/dae/redirection/item/709090.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Felderer and Ramler</source>
          , 2021]
          <string-name>
            <given-names>M.</given-names>
            <surname>Felderer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramler</surname>
          </string-name>
          .
          <article-title>Quality Assurance for AI-based Systems: Overview and Challenges</article-title>
          .
          <source>In Software Quality: Future Perspectives on Software Engineering Quality</source>
          .
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Gauerhof et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gauerhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hawkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Picardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Paterson</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Habli.</surname>
          </string-name>
          <article-title>Assuring the Safety of Machine Learning for Pedestrian Detection at Crossings</article-title>
          .
          <source>In Proc. of SAFECOMP 2020</source>
          . Springer, pp.
          <fpage>197</fpage>
          -
          <lpage>212</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Gauerhof et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gauerhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Munk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Burton</surname>
          </string-name>
          .
          <source>Structuring Validation Targets of a Machine Learning</source>
          Function Applied to Automated Driving.
          <source>In Proc. of SAFECOMP</source>
          <year>2018</year>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Hauer et al.,
          <year>2021</year>
          ] Hauer,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Adler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            , and
            <surname>Zweig</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Assuring Fairness of Algorithmic Decision Making (ITEQS 2021)</article-title>
          .
          <source>In Proc. of Int. Conf. on Software Testing</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Hawkins et al.,
          <year>2021</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hawkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Paterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Picardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Calinescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Habli</surname>
          </string-name>
          .
          <source>Guidance on the Assurance of Machine Learning in Autonomous Systems (AMLAS)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[IEC 2010] IEC 61508-5</source>
          :
          <fpage>2010</fpage>
          - Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related
          <string-name>
            <surname>Systems</surname>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[INI</source>
          ,
          <year>2019</year>
          ]
          <article-title>Institut für Neuroinformatik</article-title>
          .
          <source>German Traffic Sign Benchmarks</source>
          ,
          <year>2019</year>
          . https://benchmark.ini.rub.de/gtsrb_results.html. Accessed 10 May
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [ISO/TR,
          <year>2020</year>
          ] ISO/TR 4804:
          <fpage>2020</fpage>
          -
          <article-title>Road vehicles - Safety and cybersecurity for automated driving systems - Design, verification</article-title>
          and validation,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[ISO/IEC</source>
          , 2021
          <source>] ISO/IEC AWI TR 5469 - Artificial intelligence - Functional safety and AI systems</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [ISO/IEC/IEEE, 2019] ISO/IEC/IEEE 15026-
          <fpage>1</fpage>
          :
          <fpage>2019</fpage>
          -
          <article-title>Systems and software engineering - Systems and software assurance - Part 1: Concepts and vocabulary</article-title>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Kläs and Sembach</source>
          , 2019]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kläs</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Sembach</surname>
          </string-name>
          .
          <article-title>Uncertainty Wrappers for Data-Driven Models</article-title>
          .
          <source>In Proc. of SAFECOMP 2019</source>
          . Springer, pp.
          <fpage>358</fpage>
          -
          <lpage>364</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[Kläs and Vollmer</source>
          , 2018]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kläs</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.M.</given-names>
            <surname>Vollmer</surname>
          </string-name>
          .
          <article-title>Uncertainty in Machine Learning Applications: A Practice-Driven Classification of Uncertainty</article-title>
          .
          <source>In Proc. of SAFECOMP</source>
          <year>2019</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Mayr et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mayr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Plösch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kläs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lampasona</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Saft</surname>
          </string-name>
          .
          <article-title>A Comprehensive Code-Based Quality Model for Embedded Systems: Systematic Development and Validation by Industrial Projects</article-title>
          .
          <source>In ISSRE 2012</source>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>290</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[OMG</source>
          ,
          <year>2020</year>
          ] Object Management Group.
          <source>About the Structured Assurance Case Metamodel Specification Version 2.1</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[Koopman and Wagner</source>
          , 2019]
          <string-name>
            <given-names>P.</given-names>
            <surname>Koopman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wagner</surname>
          </string-name>
          .
          <article-title>Positive Trust Balance for Self-driving Car Deployment</article-title>
          .
          <source>In Proc. of SAFECOMP 2020 Workshops</source>
          , pp.
          <fpage>351</fpage>
          -
          <lpage>357</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Picardi et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>C.</given-names>
            <surname>Picardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hawkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Paterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Habli</surname>
          </string-name>
          .
          <article-title>A Pattern for Arguing the Assurance of Machine Learning in Medical Diagnosis Systems</article-title>
          .
          <source>In Proc. of SAFECOMP</source>
          <year>2019</year>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>179</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [Picardi et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>C.</given-names>
            <surname>Picardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Paterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hawkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Calinescu</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Habli.</surname>
          </string-name>
          <article-title>Assurance Argument Patterns and Processes for Machine Learning in Safety-Related Systems</article-title>
          .
          <source>In Proc. of SafeAI</source>
          <year>2020</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>[Pietsch</source>
          , 2021]
          <string-name>
            <given-names>B.</given-names>
            <surname>Pietsch</surname>
          </string-name>
          .
          <article-title>2 Killed in Driverless Tesla Car Crash, Officials Say</article-title>
          . The New York Times,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[SCSC</source>
          ,
          <year>2018</year>
          ]
          <article-title>Safety-Critical Systems Club</article-title>
          .
          <source>GSN Community Standard Version 2 Draft 1</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>[Salay and Czarnecki</source>
          , 2018]
          <string-name>
            <given-names>R.</given-names>
            <surname>Salay</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Czarnecki</surname>
          </string-name>
          .
          <source>Using Machine Learning Safely in Automotive Software: An Assessment and Adaption of Software Process Requirements in ISO 26262</source>
          ,
          <year>2018</year>
          . https://arxiv.org/abs/
          <year>1808</year>
          .01614.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [Sämann et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sämann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schlicht</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Hüger</surname>
          </string-name>
          .
          <article-title>Strategy to Increase the Safety of a DNN-based Perception for HAD Systems</article-title>
          ,
          <year>2020</year>
          . https://arxiv.org/abs/
          <year>2002</year>
          .08935.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [Schwalbe et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>G.</given-names>
            <surname>Schwalbe</surname>
          </string-name>
          , et al.
          <article-title>Structuring the Safety Argumentation for Deep Neural Network Based Perception in Automotive Applications</article-title>
          .
          <source>In Proc. of SAFECOMP</source>
          <year>2020</year>
          , pp.
          <fpage>383</fpage>
          -
          <lpage>394</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>[Schwalbe and Schels</source>
          , 2020]
          <string-name>
            <given-names>G.</given-names>
            <surname>Schwalbe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Schels</surname>
          </string-name>
          .
          <article-title>A Survey on Methods for the Safety Assurance of Machine Learning Based Systems</article-title>
          .
          <source>In: Proc. of European Congress on Embedded Real Time Software and Systems</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <source>[Schwalbe and Schels</source>
          , 2019]
          <string-name>
            <given-names>G.</given-names>
            <surname>Schwalbe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Schels</surname>
          </string-name>
          .
          <article-title>Strategies for Safety Goal Decomposition for Neural Networks</article-title>
          .
          <source>In Proc. of ACM Computer Science in Cars Symposium</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [Siebert et al.,
          <year>2021</year>
          ]
          <string-name>
            <given-names>J.</given-names>
            <surname>Siebert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Joeckel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heidrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Trendowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nakamichi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ohashi</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Namba.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yamamoto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Aoyama</surname>
          </string-name>
          .
          <article-title>Construction of a Quality Model for Machine Learning Systems</article-title>
          ,
          <source>Software Quality Journal. Special Issue Information Systems Quality</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <source>[UL</source>
          ,
          <year>2021</year>
          ]
          <string-name>
            <given-names>Underwriters</given-names>
            <surname>Laboratories</surname>
          </string-name>
          .
          <article-title>Presenting the Standard for Safety for the Evaluation of Autonomous Vehicles and Other Products</article-title>
          . https://ul.org/UL4600. Accessed 10 May
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <source>[VDE</source>
          ,
          <year>2020</year>
          ]
          <article-title>VDE-</article-title>
          <string-name>
            <surname>AR-E</surname>
          </string-name>
          2842-61-1:
          <fpage>2020</fpage>
          -
          <lpage>07</lpage>
          - Development and trustworthiness of autonomous/cognitive systems,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <source>[Wakabayashi</source>
          , 2018]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wakabayashi</surname>
          </string-name>
          .
          <article-title>Self-Driving Uber Car Kills Pedestrian in Arizona</article-title>
          . The New York Times,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [Willers et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>O.</given-names>
            <surname>Willers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sudholt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Raafatnia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Abrecht</surname>
          </string-name>
          .
          <article-title>Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks</article-title>
          .
          <source>In Proc. of SAFECOMP</source>
          <year>2020</year>
          , pp.
          <fpage>336</fpage>
          -
          <lpage>350</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [Wozniak et al.,
          <year>2020</year>
          ]
          <string-name>
            <given-names>E.</given-names>
            <surname>Wozniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cârlan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Acar-Celik</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>H.</given-names>
            <surname>Putzer</surname>
          </string-name>
          .
          <article-title>A Safety Case Pattern for Systems with Machine Learning Components</article-title>
          .
          <source>In Proc. of SAFECOMP 2020</source>
          . Springer, pp.
          <fpage>370</fpage>
          -
          <lpage>382</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <source>[Zenzic</source>
          , 2020]
          <article-title>Zenzic-UK Ltd</article-title>
          .
          <article-title>Zenzic-Safety-Framework-Report2.0-final</article-title>
          ,
          <year>2020</year>
          . https://zenzic.io/reports-and
          <article-title>-resources/safetycase-framework/</article-title>
          . Accessed 10 May
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>