<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Monitoring of AI Models:</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fateh Kaakai</string-name>
          <email>fateh.kaakai@thalesgroup.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul-Marie Raffi</string-name>
          <email>paul-marie.raffi@ext.irt-systemx.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRT SystemX</institution>
          ,
          <addr-line>Nano-INNOV - Bât 863, 2, Boulevard Thomas Gobert, 91120, Palaiseau</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SafeAI2023: The AAAI's Workshop on Artificial Intelligence Safety</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Thales Research &amp; Technology France - 1</institution>
          ,
          <addr-line>avenue Augustin Fresnel Palaiseau 91767 Cedex</addr-line>
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>The paper has been entirely written by Fateh KAAKAI. Paul-Marie RAFFI has contributed to the online monitor development and the use case study</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Online monitoring is an architectural pattern well-known to safety engineers, but it had to be adapted to AI technologies. In this paper, an innovative multi-time scale online monitoring architecture is presented. The main idea is to combine several monitoring timescales - PresentTime Monitoring (PTM), Near-Past Monitoring (NPM), and Near-Future Monitoring (NFM) on different monitoring assets (inputs, internal states, and outputs of the AI model) to ensure a high anomaly detection rate by design of the online monitor.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Online monitoring</kwd>
        <kwd>Multi-timescale</kwd>
        <kwd>AI</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Model</kwd>
        <kwd>Safety</kwd>
        <kwd>Anomaly Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the industry, it is commonly established that
the main objective of online monitoring of AI
models (also called Run Time Assurance in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
or safety control structure in [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]) is to detect (i)
any deviation of the AI component deployed in
production from the expected behavior (i.e., intent
specified at the system level and allocated to the
AI model), and (ii) precursors of the occurrence
of failure conditions (i.e., feared events at the
system boundaries) based on a predefined set of
safety properties. Deploying a monitoring
component running in parallel with the AI model
is a practical way to manage the risk induced by a
model for which it is not possible or feasible to
formally demonstrate the achievement of the
performance and the safety objectives resulting
from the system analyses. Online monitoring is an
architectural pattern well-known to safety
engineers, but it had to be adapted to AI
technologies. In an ideal world, the AI model can
perform its prediction over its entire Operational
Design Domain (ODD) with the expected level of
performance (e.g., 99.9% correct predictions, and
this accuracy is maintained over time in
operation). However, in practice, if we consider
for example machine learning models in many
recent papers, most of the time it is very difficult
to achieve more than 99% accuracy (see for
example the tables of results in [
        <xref ref-type="bibr" rid="ref22 ref23 ref24">22, 23, 24</xref>
        ]),
which is an average of one wrong prediction out
of 100 inferences in production. But should we
hastily conclude that 1% of bad prediction
systematically triggers unexpected behavior
leading to a system failure condition? In practice,
from the industrial experience of the authors and
for a wide range of industrial applications, a single
error does not directly lead to hazardous or
catastrophic events, because the system design
has eliminated Single Points Of Failure (SPOF)
(e.g., application of the following guidelines
ARP4754A [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and ARP4761 [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] in the
aeronautical domain). Therefore, based on this
assumption (no SPOF in the system), it implies
that a single failure of an AI component (i.e., an
incorrect prediction at a given time) cannot
directly lead to hazardous or catastrophic events.
However, what about the case where the AI
component has persistent failures (i.e., the model
fails to infer the correct prediction at a given point
in time, and continues to fail during a subsequent
time interval)? This could increase the residual
risk due to a higher probability of having (during
that time interval where the AI component
continues to fail) a combination of multiple
internal failures in the system leading to a system
failure condition. To detect persistent failures,
considering the dynamics of the system and thus
the "time" variable is a major issue. In this paper,
an innovative multi-time scale online monitoring
architecture is presented. The main idea is to
combine several monitoring timescales -
PresentTime Monitoring (PTM), Near-Past Monitoring
(NPM), and Near-Future Monitoring (NFM) - on
different monitoring assets (inputs, internal states,
and outputs of the AI model) to ensure a high
anomaly detection rate by design of the online
monitor.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Summary of Related Works</title>
      <p>
        To tackle the topic of monitoring AI models,
some works started to define a taxonomy of
anomalies that are specific to AI technologies [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1,
2, 3</xref>
        ] but, to the best of our knowledge, no
taxonomy is an undisputed reference. Other works
tried approaches to perform runtime verification
such as Monitoring Based on Past Experiences [
        <xref ref-type="bibr" rid="ref4 ref5">4,
5</xref>
        ], and Monitoring Based on Inconsistencies
During Inference [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We can also find many
papers on Out of Distribution Detection using
either Data-Driven Out-of-Distribution Detection
[
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ], Detection by reconstruction [
        <xref ref-type="bibr" rid="ref10 ref8">8, 10</xref>
        ],
Detection by test-time adversarial attacks [
        <xref ref-type="bibr" rid="ref11 ref12">11,
12</xref>
        ], or Anomaly Detection for Time Series [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Another group of work is dedicated to Uncertainty
Prediction including, Bayesian Neural Networks
[
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ], MC Dropout [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Ensemble Methods
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], and Single-forward uncertainty estimation
[
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Multi-timescale Monitoring</title>
      <p>The context of the online monitoring function is
described in Figure 1 below.</p>
      <p>The AI-based product consisting of one or several
integrated AI models is depicted in the black oval.
This product can be an item (i.e., a component),
or a subsystem of an entire system. The generic
term "product" is used in the following. The
product receives at inference time operational
data from sensors.
Above the product, in a white oval, the online
monitoring item receives both the external inputs
and outputs of the product, as well as some
information about the internal states of the
product using pre-designed probes placed in the
product's software code or hardware. At the top of
Figure 1, an item in charge of continuously
collecting all relevant operational data is usually
required to feed a complementary offline
monitoring function. The offline monitoring
function may have several objectives according to
the use case such as (but not limited to): (i)
calculating offline metrics, (ii) fine tune some of
the online monitor parameters, (iii) detecting data
and concept drifts, and (iv) act as a hypervisor of
the online monitor. At the bottom right-hand side
of Figure 1, a controller is responsible for
synthesizing the output produced by the product
and the verdict of the monitor to compute, based
on certain business logic (that is in general
specific to the use case), the final output, which is
so-called the “safe output”.</p>
      <p>To illustrate the product to be monitored, consider
the very simplified didactic example in Figure 2,
which represents a linear physical phenomenon
 =  ( ) =   +  to be approximated by an AI
model  ̂ =  ̂(  ), where tk is the system clock
which also clocks the monitoring item (at each
time tk the device acquires data to produce a
verdict;  0 is the product start-up time).
Let’s also assume that system requirements
specify a set of properties to be satisfied by the
product in operation. These properties are
business-driven and thus specific to each use case.
In general, these properties may be (but are not
limited to) (i) functional properties related to the
nominal expected behavior of the product like
performance requirements, (ii) safety properties
identified by safety risk analyses, (iii) security
properties determined by security risk analyses,
(iv) explainability properties coming from human
factor analyses. To keep the logic of a simplified
didactic example, consider that there is only one
general property materialized by robustness
bounds depicted by the green dashed segments in
Figure 2. The area bounded by these two green
dashed segments defines the validity domain ℧ of
the product output  ̂ . The very simplified general
safety property2 can therefore be expressed as
follows:
∀  ≥  0,  ̂( ) ∈ ℧
(1)
Regarding the design of the product, since
sufficient data were collected and are available to
characterize the physical phenomenon to be
modeled, it has been decided to use Machine
Learning (ML) technology to design the product
(e.g., using an artificial neural network). The ML
model is obtained after several iterations of
learning and is depicted by the blue curve in
Figure 2. It is deliberately not perfect. Indeed, it is
possible to observe several operating points of the
ML model output  ̂ fall outside the validity
domain ℧ and do not satisfy the safety property
(1). The online monitoring function aims to detect
all operating points that violate the system
properties – let’s call them by the generic term
anomaly in the rest of this paper. To be efficient,
the online monitor should ensure a sufficient
anomaly detection rate, and this is precisely the
ultimate goal of the multi-scale monitoring
framework which is the main contribution of this
paper.</p>
      <p>The principle of multi-scale monitoring is
described in Figure 3. It consists in combining
several monitoring timescales: monitoring of the
product at the present time, monitoring over a
configurable time window in the near past, and
2 In section 5 (Application), an industrial use case is presented with
more complex properties to be monitored.
monitoring over a configurable time window in
the near future.
To illustrate the combination of these three
different timescale monitoring functions, let us
continue the discussion on the trivial example of
Figure 4. At time tx, the ML model output  ̂
overpasses the robustness boundaries, and it is
expected that the Present Time Monitoring (PTM)
will be able to detect such abnormal behavior.
Between times tx and ty, it is possible to observe
that  ̂ starts to unexpectedly oscillate. It is an
unintended behavior that could be a precursor of
a failure of the ML model. Since this oscillation
phenomenon should be observed and confirmed
on several clock cycles, it is expected that the
Near Past Monitoring (NPM) will be the
appropriate monitoring timescale to detect such
oscillation anomaly.
At time tz,  ̂ has an abrupt trend that will make it
overpass the robustness boundaries at the next
clock cycles. Here, the Near-Future Monitoring
(NFM) is the most appropriate monitoring
function to detect such potentially abnormal
behavior since it is based on trend
analysis.Through this didactic example, one can
observe that an efficient combination of these
three different monitoring timescales  NPM,
PTM, and NFM  allows one to detect several
classes of anomalies and to achieve this by
designing a high online detection rate when the AI
model is in production.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Industrial Design Principles</title>
      <p>In the previous sections, a first design principle
has been presented through the new
multitimescale monitoring framework that aims at
increasing by design the anomaly detection rate.
However, there are many other design principles
of online monitors that are important as well.
Below is a synthesis of the main industrial design
principles collected and formalized by major
international industrial groups within the frame of
the French research program Confiance.ai3. All
these design principles are not detailed in this
paper since each of them would require a full
technical paper to be comprehensively presented.
 Design Principle 1: The monitoring
function should by design ensure completeness
of anomaly detection while minimizing false
alarms
 Design Principle 2: The sophistication of
the monitoring function should be
proportionate to the criticality level of the AI
function
 Design Principle 3: The monitoring
function should be smart to manage
complexity and performance issues
 Design Principe 4: The monitoring
function should not have any safety adversarial
common mode of failure with the monitored
AI function
 Design Principle 5: The monitoring
function itself should not has an unacceptable
impact on the system safety and security
(innocuity)</p>
      <p>In the next section, an industrial use case from
the automotive domain is presented to illustrate
some of the concepts presented earlier.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Application</title>
      <p>The application used to present some results
related to multi-timescale online monitoring is
called the Renault Welding Use Case.</p>
      <p>The industrial context is a plant producing
mechanical components used for the ground
connection of motor vehicles and the mechanical
parts of interest in this use case are only the parts
of the rear axle. During the manufacturing process
shown at the top of Figure 5 (see Operational
Platform (OP) #120), metal parts are welded
together. The mechanical quality of the final
component depends on the quality of the weld.</p>
      <p>Until now, a systematic inspection of the weld
is carried out by a specialized human operator on
a screen like the one at the bottom of Figure 5 (see
Display OP#120). The screen displays different
3 See www.confiance.ai involving more than 40 partners including
large industrial groups such as: Airbus, Air Liquide, Atos, Naval
Group, Renault Group, Safran, Sopra Steria, Thales, Valeo, and
others (full list on the web site).
photos of the same weld taken by different
cameras from different angles. Based on its
experience, the human operator classifies the weld
as “compliant”, “not compliant” or “unknown”
(see examples of welds in Figure 6). This last
status "unknown" leads to a further deeper
technical evaluation of the manufacturing
process.</p>
      <p>(a) Compliant Weld
(b) Non-Compliant Weld</p>
      <p>In practice, the overwhelming majority of
welds are compliant (robotized welding), and
from a human factor perspective, this situation is
likely to decrease the attention of the operator in
charge of quality control. To mitigate this risk,
Renault launched a project to develop an AI-based
system to assist the operator in charge of the
quality control of welds as depicted in Figure 7.
performs an automated preliminary conformity
assessment of the weld quality. The WCM is
developed using supervised ML technology based
on labeled datasets containing historical data of
compliant and non-compliant welds. The design
details of the WCM are not important in this paper
since it is considered a black box by the online
monitor that only looks at its inputs and its outputs
and not at its internal states as shown in Figure 7).
The WCM provided by Renault reaches very good
performance (measured with an f1 score) but only
on a given domain, called ODD, that is
characterized according to operational
parameters. Based on a dedicated study of the
ODD done with Renault representatives, two
operational parameters have been considered in
this study as the most impacting the performance
of the ML Model (and therefore of the correctness
weld conformity classification): (i) image
brightness and (ii) image blur. Thus, the 2
properties to be monitored are expressed as
follows:
∀  ≥  0,   ∈ ℧</p>
      <p>ℎ
∀  ≥  0,   ∈ ℧
(2)
(3)
Where   is the image received by the WCM at
time tk and ℧ ℎ and ℧ are respectively
a projection of the full WCM ODD on the two
targeted operational parameters – i.e., image
brightness and image blur. Besides, ℧ ℎ
and ℧ are not calculated theoretically, they
are determined based on test campaigns with
augmented data as illustrated for the brightness
ODD in Figure 8.</p>
      <p>The AI-based product in Figure 7 is called
Welding Classification Model (WCM) and it
Once ℧Brightness and ℧Blur are determined, it
is possible to develop dedicated monitoring
functions to detect Out-Of-Distribution (OOD)
input images since we can observe in Figure 8 that
the f1 performance score of WCM drops sharply
outside ℧Brightness (depicted with the white arrow
at the top of Figure 8). And it is the same for the
blur (not represented in the paper to avoid
overloaded information). The rules-based design
of the online PTM monitoring functions for
brightness and blur OOD detection are detailed in
Tables 1, 2, and 3. Examples of anomalies
detected in the Welding use case are represented
in Figure 10 (brightness anomalies) and Figure 9
(blur anomalies).
The performance of the developed PTM
monitoring functions has been evaluated by the
LNE4 which is an independent partner of the
Confiance.ai Program specializing in calibration,
testing, and certification under the trusteeship of
the French Ministry for the Economy and Finance
with oversight for Industry.</p>
      <p>LNE randomly selected 11,000 images for the
evaluation set, and identified the following
evaluation metrics:
 Analysis of the classification of the
monitor compared to the noise for each image:
­ True positive: the image has a medium
noise or important noise, and the
monitor raised an alarm
­ True negative: the image has a slight
noise, or no noise, and the monitor did
not raise an alarm
­ False positive: the image has a slight
noise, or no noise and the monitor
raised an alarm
­ False negative: the image has a
medium noise or important noise, and
the monitor did not raise an alarm
 Using these four values, the precision,
recall, and f-measure are computed
­ Precision: total of true positives by the
total of detected positives (true and
false)
­ Recall: the total of true positives by
the total of real positives (true
positives and false negatives)
­ F-measure: harmonic mean of the
precision and recall</p>
      <p>The recall metric is very important in domains
such as automotive quality controls where you
want to minimize the chance of missing positive
(i.e., missing to detect a non-compliant weld) by
predicting false negatives (i.e., a non-compliant
weld is predicted as a compliant one and there is
no alarm sent by the monitor). These are typically
4 https://www.lne.fr/en
5 Vertical motion blur
cases where missing a positive case has a much
bigger safety impact than wrongly classifying
something as positive.</p>
      <p>These evaluation results show that the
rulebased PTM OOD functions have a good or a very
good recall (90%  recall  100%). However, the
precision results (and thus the f-measure scores)
show that the number of false positive alarms is
still high and needs to be reduced in a further
version of the monitoring functions.</p>
      <p>There is no result presented in this paper on
NPM and NFM functions since the development
of these monitoring functions are in progress
within the Confiance.ai Program. The results of
ongoing research on NPM and NFM will be
published in future papers, as well as the results of
the integrated multi-timescale monitor combining
PTM, NPM, and NFM into a single monitoring
item.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This work has been supported by the French
government under the "France 2030” program, as
part of the SystemX Technological Research
Institute within the Confiance.ai Program
(www.confiance.ai).</p>
      <p>A special thanks to Guillaume BERNARD
from LNE who contributed to the independent
evaluation of the monitor, and to Dominique
TACHET &amp; Meriem LAFOU from Renault who
provided valuable support on the Renault
Welding use case.</p>
    </sec>
    <sec id="sec-7">
      <title>7. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ruff</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kauffmann</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandermeulen</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montavon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samek</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kloft</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dietterich</surname>
            ,
            <given-names>T. G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-R.</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>A unifying review of deep and shallow anomaly detection</article-title>
          .
          <source>Proceedings of the IEEE.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Meinke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Towards neural networks that provably know when they don't know</article-title>
          .
          <source>In ICLR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Courville</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Detecting semantic anomalies</article-title>
          .
          <source>In Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>34</volume>
          , pages
          <fpage>3154</fpage>
          -
          <lpage>3162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Mohseni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pitale</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Practical solutions for machine learning safety in autonomous vehicles</article-title>
          . arXiv preprint arXiv:
          <year>1912</year>
          .09630.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hecker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Van Gool</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Failure prediction for autonomous driving</article-title>
          .
          <source>In 2018 IEEE Intelligent Vehicles Symposium (IV)</source>
          , pages
          <fpage>1792</fpage>
          -
          <lpage>1799</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berrio</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Worrall</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nebot</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Automated evaluation of semantic segmentation robustness for autonomous driving</article-title>
          .
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          ,
          <volume>21</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1951</fpage>
          -
          <lpage>1963</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Mohammadi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fathy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sabokrou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Image/video deep anomaly detection: A survey</article-title>
          .
          <source>arXiv preprint arXiv:2103</source>
          .
          <fpage>01739</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Chalapathy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chawla</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Deep learning for anomaly detection: A survey</article-title>
          . arXiv preprint arXiv:
          <year>1901</year>
          .03407.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Daxberger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hernández-Lobato</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Bayesian variational autoencoders for unsupervised out-of-distribution detection</article-title>
          . arXiv preprint arXiv:
          <year>1912</year>
          .05651.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Yuille</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Synthesize then compare: Detecting failures and anomalies for semantic segmentation</article-title>
          .
          <source>In ECCV.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shlens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Explaining and harnessing adversarial examples</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>6572</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>A simple unified framework for detecting out-of-distribution samples and adversarial attacks</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>7167</fpage>
          -
          <lpage>7177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Box</surname>
            ,
            <given-names>G. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenkins</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reinsel</surname>
            ,
            <given-names>G. C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ljung</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Time series analysis: forecasting and control</article-title>
          . John Wiley &amp; Sons.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Beggel</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kausler</surname>
            ,
            <given-names>B. X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiegg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfeiffer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bischl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Time series anomaly detection based on shapelet learning</article-title>
          .
          <source>Computational Statistics</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <fpage>945</fpage>
          -
          <lpage>976</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Practical variational inference for neural networks</article-title>
          .
          <source>In NeurIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>MacKay</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          (
          <year>1992</year>
          ).
          <article-title>A practical Bayesian framework for backpropagation networks</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Dropout as a bayesian approximation: Representing model uncertainty in deep learning</article-title>
          .
          <source>In ICML.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Lakshminarayanan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pritzel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Blundell</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Simple and scalable predictive uncertainty estimation using deep ensembles</article-title>
          .
          <source>In NeurIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Malinin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gales</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Predictive uncertainty estimation via prior networks</article-title>
          .
          <source>In NeurIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Sensoy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>L. M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kandemir</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Evidential deep learning to quantify classification uncertainty</article-title>
          .
          <source>In NeurIPS.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Peterson</surname>
            ,
            <given-names>E. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeVore</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cooper</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Carr</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Run Time Assurance as an Alternate Concept to Contemporary Development Assurance Processes</article-title>
          ,
          <source>NASA report NASA/CR-2020-220586</source>
          , https://ntrs.nasa.gov/citations/20200003114.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hashmi</surname>
            <given-names>K. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pagani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liwicki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stricker</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Afzal</surname>
            ,
            <given-names>M. Z.</given-names>
          </string-name>
          (
          <year>2021</year>
          ) .
          <article-title>Survey and Performance Analysis of Deep Learning Based Object Detection in Challenging Environments</article-title>
          , In Sensors. https://pdfs.semanticscholar.org/5040/0b478 dda3eebb966f71e8d8f90718a0e2854.pdf
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Schmarje</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santarossa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schröder</surname>
          </string-name>
          , SM., and
          <string-name>
            <surname>Koch</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>A Survey on Semi, Self- and Unsupervised Learning in Image Classification</article-title>
          . In IEEE Acess. https://arxiv.org/pdf/
          <year>2002</year>
          .08721.pdf
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>D. Dakshayani</given-names>
            <surname>Himabindu</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Praveen Kumar</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Survey on Computer Vision Architectures for Large Scale Image Classification using Deep Learning</article-title>
          . In
          <source>International Journal of Advanced Computer Science and Applications</source>
          . https://pdfs.semanticscholar.
          <source>org/03e1/3f250 da93bcaf1d760fe40f97e465e5083fa.pdf</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Dobbe</surname>
            ,
            <given-names>R. I. J.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <source>System Safety and Artificial Intelligence. In The Oxford Handbook of AI Governance</source>
          . https://academic.oup.com/editedvolume/41989/chapter/377785597
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>SAE</given-names>
            <surname>International</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Guidelines for Development of Civil Aircraft and Systems</article-title>
          , ARP4754A.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>SAE</given-names>
            <surname>International</surname>
          </string-name>
          (
          <year>1996</year>
          ).
          <article-title>Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems</article-title>
          and Equipment, ARP4761.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>