=Paper= {{Paper |id=Vol-3762/547 |storemode=property |title=Developing safe and explainable autonomous agents: from simulation to the real world |pdfUrl=https://ceur-ws.org/Vol-3762/547.pdf |volume=Vol-3762 |authors=Federico Bianchi,Alberto Castellini,Alessandro Farinelli,Luca Marzari,Daniele Meli,Francesco Trotti,Celeste Veronese |dblpUrl=https://dblp.org/rec/conf/ital-ia/0002CFMMTV24 }} ==Developing safe and explainable autonomous agents: from simulation to the real world== https://ceur-ws.org/Vol-3762/547.pdf
                                Developing safe and explainable autonomous agents:
                                from simulation to the real world
                                Federico Bianchi1,† , Alberto Castellini1,† , Alessandro Farinelli1,† , Luca Marzari1,† ,
                                Daniele Meli1,*,† , Francesco Trotti1,† and Celeste Veronese1,†
                                1
                                    University of Verona, strada Le Grazie 15, 37135, Verona, Italy


                                                 Abstract
                                                 Responsible artificial intelligence is the next challenge of research to foster the deployment of autonomous systems in the real
                                                 world. In this paper, we focus on safe and explainable design and deployment of autonomous agents, e.g., robots. In particular,
                                                 we present our recent contributions to: i) safe and explainable planning, leveraging on safe Reinforcement Learning (RL) and
                                                 neurosymbolic planning; ii) effective deployment of RL policies via model-based control; iii) formal verification of the safety
                                                 of deep RL policies; and iv) explainable anomaly detection of complex real systems.

                                                 Keywords
                                                 Safe Reinforcement Learning, Formal verification of neural networks, Neurosymbolic AI, Planning under uncertainty



                                1. Introduction                                                                   from deep RL, via formal verification. Finally, we present
                                                                                                                  solutions for efficient and explainable anomaly detection
                                Artificial Intelligence (AI) and robotics are pervading                           in autonomous systems.
                                everyday activities, from industrial automation [1] to
                                environmental monitoring [2]. As more and more so-
                                phisticated autonomous cognitive systems interact with                            2. Safe and explainable planning
                                humans in complex scenarios, the development of re-
                                sponsible AI solutions [3] becomes a fundamental design                           We assume the autonomous agent and the environment
                                requirement, as prescribed also by the latest international                       are represented as a Markov Decision Process (MDP)
                                regulations 1 . Responsible AI involves several aspects,                          𝑀 = ⟨𝑆, 𝐴, 𝑇, 𝑅⟩, defining respectively the state space,
                                including safety, transparency and trustability [4]. Safety                       the action space, the transition map, and the reward map.
                                regards providing guarantees about the behavior of AI                             The first approach is based on Safe Policy Improvement
                                systems, e.g., autonomous robotic systems, in terms of                            (SPI) [6] and Monte Carlo Tree Search (MCTS) [7], which
                                performance and potential harm to the surrounding en-                             performs simulations in a model of the real environment
                                vironment or humans. Transparency and trustability are                            to estimate the optimal policy online. The second solution
                                related to the perception of humans interacting with the                          combines MCTS with symbolic and logical reasoning, to
                                AI system, e.g., the explainability and compliance of the                         guide the exploration of the RL agent towards better
                                system’s behaviour to the expectation of humans from a                            pathways.
                                moral or rational perspective [5].
                                   In this paper, we summarize our main contributions                             2.1. Safe Policy Improvement with MCTS
                                in the field of responsible AI. We focus on autonomous
                                agents, e.g., robots, and present our approach to responsi-                                            Safe RL [9] investigates how to learn policies that max-
                                ble autonomy at different developmental stages. We first                                               imize the performance of the agent, while respecting
                                describe our solutions for safe and explainable planning                                               safety constraints during learning. One popular approach
                                in autonomous agents, via safe Reinforcement Learning                                                  is Safe Policy Improvement with Baseline Bootstrapping
                                (RL) and neurosymbolic approaches. We also analyze                                                     (SPIBB) [10]. SPIBB starts from a baseline policy 𝜋0 (e.g.,
                                the problem of safe and compliant transfer of a planned                                                a sub-optimal expert-designed policy). The algorithm
                                policy on a physical robotic system, combining RL with                                                 then collects a batch dataset of trajectories (i.e., state-
                                model-based control. We then investigate how to provide                                                action pairs), and uses the baseline policy on less fre-
                                formal guarantees of safety for black-box policies, e.g.,                                              quent state-action pairs. However, it does not scale to
                                                                                                                                       large state and action spaces.
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-                                   To improve scalability, we recently introduced Monte
                                nized by CINI, May 29-30, 2024, Naples, Italy                                                          Carlo  Tree Search Safe Policy Improvement with Baseline
                                *
                                  Corresponding author.
                                †                                                                                                      Bootstrapping (MCTS-SPIBB) [8]. The algorithm exploits
                                  Authors are in alphabetical order.
                                $ daniele.meli@univr.it (D. Meli)
                                                                                                                                       MCTS to estimate 𝜋𝐼 online, hence it can scale to large
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License domains, while keeping the asymptotic guarantees of con-
                                          Attribution 4.0 International (CC BY 4.0).
                                1
                                  European AI Act                                                                                      vergence of SPIBB [8]. In [8] we compared MCTS-SPIBB




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
         a)
                                                                         trajectories (belief-action pairs) of POMDP executions
                                                                         collected offline. Specifically, given a set of task-related
                                                                         concepts 𝐹 provided by the user to describe the belief
                                                                         space, offline trajectories are converted to a logical for-
                                                                         malism, where logical predicates encode concepts in 𝐹 .
         b)                                                              As an example, consider the paradigmatic POMDP rock-
                                                                         sample scenario depicted in Figure 2a, where a robotic
                                                                         agent must collect valuable rocks (green dots) avoiding
                                                                         worthless ones (red dots) in a grid world. The state of
                                                                         the POMDP includes information about the position of
                                                                         agents and rocks, and the probability (belief) of rocks
                                                                         to be valuable. The state can be translated to a logical
                                                                         representation in terms of the following concepts in 𝐹 :
Figure 1: Safe Policy Improvement: a) Comparison of per-                 the Manhattan distance D between the agent and each
formance among SPI algorithms; b) Scalability comparison                 rock R dist(R,D) and the probability P of a rock R to
between MCTS-SPIBB and SPIBB [8].                                        be valuable guess(R,P). Defining semantic concepts
                                                                         about the domain is easier than defining directly policy
                                                                         specifications, since it simply involves a re-interpretation
with several state-of-the-art SPI algorithms on bench-                   of the state formalization.
mark domains (see Figure 1.a). Furthermore, we showed                       We preliminarily learn policy specifications from tra-
that on very large state spaces, such as the standard                    jectories collected from a rocksample agent operating in a
SysAdmin benchmark2 with up to 35 machines, MCTS-                        12×12 grid with 4 rocks. We adopt the logical formalism
SPIBB is the only SPI algorithm capable of computing                     of Answer Set Programming (ASP) [18], which represents
improved policies (see Figure 1.b).                                      the state of the art for planning in first-order logic [19].
                                                                         Our approach requires relatively few training trajectories
                                                                         (less than 800 in rocksample) to learn interpretable trans-
2.2. Planning with logics in MCTS                                        parent policy specifications. Moreover, learned heuristics
MCTS may require a large number of online simulations                    allow POMCP to use significantly fewer online simula-
when the state and action spaces are large. This be-                     tions per step of execution (Figure 2b, achieving compara-
comes even more critical in Partially Observable MDPs                    ble performance with respect to expert-designed specifi-
(POMDPs), where part of the state is uncertain, hence a                  cations (pref ). Finally, the heuristics generalize to unseen
particle filter must be used to sample and estimate the                  problem instances, e.g., enhancing scalability to larger
actual state of the system, starting from a probability                  grid sizes (Figure 2c) which require a longer planning
distribution called the belief. Recent online solvers for                horizon, typically challenging for MCTS-based solvers.
POMDPs, e.g., Partially Observable Monte Carlo Plan-                     In [20], we also showed that this approach can be used
ning (POMCP) [11] and Determinized Sparse Partially                      to derive policy explanations of black-box model-free RL
Observable Trees (DESPOT) [12] require the definition of                 agents, in the context of autonomous driving.
task-specific policy heuristics, in order to efficiently bias
the exploration towards most fruitful policies. Moreover,
it is essential to guarantee the exploration of only safe                3. Safe deployment in the real
policies.                                                                   world
   To this aim, in [13] we proposed an approach based on
maximum satisfiability modulo theory [14] to probabilis-                 The policy computed by a RL-based planner, e.g., POMCP
tically verify the adherence of the policy computed by                   for POMDPs, cannot always be effectively and safely de-
POMCP to a set of user-defined specifications, expressed                 ployed on a real robotic system. Indeed, MCTS-based
in a fragment of first-order logic. In this way, we can                  planners perform online simulations based on a model
shield undesired actions in MCTS simulations, and in-                    of the environment, but the chosen policy must be
crease the explainability of the generated policy thanks                 adapted to the inevitable unmodeled inaccuracies and
to the logic formalism. However, defining the logical                    non-linearities of the physical plant. To overcome this
policy specifications may be tedious and error-prone in                  problem, in [21] we implemented the two-layer archi-
realistic complex domains. For this reason, in [15, 16]                  tecture depicted in Figure 3, combining a high-level con-
we proposed an approach based on inductive logic pro-                    troller based on POMCP with a low-level model-based
gramming [17] to learn logical policy heuristics from                    controller: The low-level controller is designed using
                                                                         the inverse dynamics technique [22, 23], that allows
2
    SysAdmin: https://jair.org/index.php/jair/article/view/10341/24723   to linearize via feedback the system. In particular, let
                                      (a)                                    (b)                                (c)

Figure 2: a) Rocksample setup; b) Results of [16] with few simulations and c) on larger grids.


        High-level controller                 Low-level controller



                                    𝑢𝑟𝑒𝑓                  𝑢
                        POMDP               Control                  Agent

                                                          𝑥

                                                      𝑜
                           Belief
                           update




Figure 3: Block diagram of the two-layer architecture


                                                                                   Figure 4: Drone paths. The black and blue arrows are, respec-
𝑥̇ = 𝑓 (𝑥) + 𝑔(𝑥)𝑢 be a non-linear dynamical system de-                            tively, the desired yaw angle and drone initial yaw angle
pending on state 𝑥 and command action 𝑢. The controller
is obtained as 𝑢 = 𝑔(𝑥)1
                          (−𝑓 (𝑥) + 𝑣), where 𝑣 is an aux-
iliary control signal. Therefore, the low-level controller
exploits the auxiliary control signal 𝑣, which is mapped
as reference values for the controller, to compute the
command 𝑢. The high-level controller is formalized as a
POMDP that exploits the linearized closed-loop model to                                                 (a)                       (b)
select the best local action 𝑢 for the agents. In particular,                      Figure 5: Realistic robotic scenarios: a) robotic mapless navi-
the POMCP provides the sub-optimal reference values for                            gation; b) autonomous colonoscopy navigation.
the low-level controller optimizing user-defined objec-
tives, encoded in the reward function. Note, the two-layer
have different control loop sample rates; the low-level
has to be fast since it has to provide the commands to                             4. Formal verification of deep RL
the agent, while the high-level can be slower since it
                                                                                   Trained RL policies, especially model-free RL policies
generates the reference values for the low-level.
                                                                                   encoded in a Deep Neural Network (DNN), do not guar-
   The two-layer approach is tested in a scenario where
                                                                                   antee to provably meet the safety standards required in
an aerial drone has to reach a target area, avoiding some
                                                                                   the real world. For instance, DNNs are vulnerable to the
no-fly zones and minimizing fuel consumption or attitude
                                                                                   so-called adversarial inputs, i.e., minimal input variations
error. Therefore, the reward function is composed of four
                                                                                   that fool the system to output an undesired value (or
contributions: an attractive potential component to reach
                                                                                   action) [24]. Consequently, in recent years, Formal Ver-
the target, a repulsive component to avoid the no-fly zone,
                                                                                   ification (FV) of DNNs (aka DNN-Verification) has been
the fuel consumption and the heading error. The last
                                                                                   developed to provide formal guarantees on the behavior
two components are weighted to rank between different
                                                                                   of these systems [25]. In particular, given a predefined
objectives. Figure 4 shows the trajectory followed by
                                                                                   safety property, the goal of DNN-Verification is to assert
the drone optimizing only the fuel consumption (black
                                                                                   whether at least one input configuration exists that vio-
line), both fuel and attitude (red dashed line) and only the
                                                                                   lates the property. However, given the non-convex and
attitude error (green dotted line). The black line follows
                                                                                   non-linear nature of DNNs, verifying safety properties
the shortest path to minimize the fuel, the red line follows
                                                                                   in the worst case has been shown to be an NP-complete
the shortest path but near the target position the attitude
                                                                                   problem [26]. Moreover, the standard binary response
error component increases to align the drone with the
                                                                                   of DNN-verification (safe vs. unsafe) does not provide
desired attitude (black arrow). The green line follows the
                                                                                   sufficient information to compare the safety of different
optimal path to minimize only the attitude.
                                                                                   DNNs.
Table 1
Results of model selection. SAT indicates property violation.   training loop allowed us to obtain a more robust approach
Θ’s denote the safety property not to touch the colon wall in
                                                                with respect to other existing Safe DRL methodologies in
any cardinal direction.
                                                                the context of autonomous navigation, promoting safer
                    Safety Properties                           behaviors while maintaining similar or better returns.
              Θ↑       Θ↓     Θ←     Θ→      FV selection
 Method       SAT     SAT    SAT     SAT      Safe models
  PPO         300     246     80      167          0            5. Explainable and data-efficient
 L-PPO        221     198     53      161          3
                                                                   anomaly detection
                                                                Autonomous systems operating in the real world are re-
   To overcome these limitations, in [27], we proposed a
                                                                quired to reliably work over long periods of time (Long
novel quantitative formulation of the DNN-verification
                                                                Term Autonomy, LTA) under changing and unpredictable
problem, allowing to enumerate all unsafe regions for a
                                                                environmental conditions. In this context, anomaly detec-
given domain of interest and thus rank the models on the
                                                                tion is crucial to promptly identify situations that diverge
portion of unsafe regions they may have. However, we
                                                                from the desired behaviour. Specifically, unsupervised
showed that this problem turns out to be #P-hard. Hence,
                                                                anomaly detection aims to idenfity anomalies related to
in [28] we proposed 𝜖-ProVe. Exploiting a controllable
                                                                the global behavior of the system [34, 35, 36], monitor-
underestimation of the output reachable sets obtained
                                                                ing multivariate time series generated from sensors and
via statistical prediction of tolerance limits [29], the al-
                                                                actuators and starting from the only knowledge of the
gorithm provides a tight —with provable probabilistic
                                                                nominal (i.e., anomaly-free) behavior.
guarantees— lower estimate of the (un)safe areas.
                                                                   We recently proposed two contributions in this area.
   We validated DNN-Verification in realistic robotic
                                                                namely, an online approach for detecting anomalous be-
safety-critical scenarios. In particular, in [30], we showed
                                                                haviors of robotic systems involved in complex LTA sce-
that DNN-Verification can be used to rank different suc-
                                                                narios (HHAD) [37], and an adversarial data augmen-
cessful DNN models according to the level of safety,
                                                                tation and retraining approach (HHAD-AUG) [38]. In
verifying collision avoidance in robotic mapless navi-
                                                                HHAD [37], we use Hidden Markov Models (HMMs)
gation. We then applied a similar pipeline in a more
                                                                to represent the nominal behavior of a robot. We then
safety-critical domain, namely autonomous colonoscopy
                                                                evaluate online the dissimilarity between the probability
navigation for colorectal detection with deep RL [31]
                                                                distribution of multivariate sensor time series in a sliding
(Figure 5). In particular, we trained an agent to navigate
                                                                window and the emission probability of the related HMM
the endoscope in patient-specific colon models based on
                                                                hidden states. We adopt the Hellinger distance [39] as a
endoscopic images, using Constrained RL (CRL) to im-
                                                                distance measure since it is bounded (thus it lends itself
pose a safety cost for the agent to touch colon walls at
                                                                to simpler interpretation and thresholding) and it is less
the training stage. Nevertheless, due to the Lagrangian
                                                                noisy, hence more informative and discriminative.
relaxation implemented by CRL to perform constrained
                                                                   In HHAD-AUG [38], we address the usual lack (or
optimization, safety may not be guaranteed. Hence, we
                                                                paucity) of anomalous examples and the noise that char-
adopted a model selection strategy that harnesses FV to
                                                                acterizes time series of real systems. We propose a data
evaluate the safety of a vast pool of trained policies to
                                                                augmentation method based on perturbed (adversarial)
select the one the meets all the behavioral preferences
                                                                time series [40], having the advantage of not requiring
specified. The results of our study are reported in Ta-
                                                                any prior knowledge about the application domain and
ble 1 over 300 trained models, finding 3 completely safe
                                                                data conformation. We generate adversarial examples
models that provably meet the safety requirements.
                                                                only for nominal points, optimizing a loss function based
   Finally, to address the necessity of running the FV
                                                                on the Hellinger distance between the observed and the
process only after training due to its computational com-
                                                                expected data distributions.
plexity, in [32] we proposed an unconstrained DRL frame-
                                                                   We evaluate our data augmentation and re-training ap-
work that leverages a novel sample-based method to ap-
                                                                proach on several public datasets, plus one collected from
proximate local violations of input-output conditions to
                                                                our aquatic drones developed in the EU H2020 project
foster the learning of safer behaviors inside the training
                                                                INTCATCH [41]. Results show that (i) the adversarial
loop. However, such conditions are typically hard-coded
                                                                generation algorithms can generate meaningful adversar-
and require task-level knowledge, making their applica-
                                                                ial examples for HHAD, employing them to significantly
tion intractable in challenging safety-critical tasks. To
                                                                improve the performance of HHAD; (ii) our data augmen-
this end, in [33], we introduced the Collection and Refine-
                                                                tation method yields higher performance than examples
ment of Online Properties (CROP) framework to collect
                                                                generated by state-of-the-art augmentation methods; (iii)
and refine safety properties during training. The combi-
                                                                adversarial examples generated considering the Hellinger
nation of CROP with approximate violation inside the
                                                                distance yield higher improvement than examples gen-
                                                                 References
                                                                  [1] Z. Jan, F. Ahamed, W. Mayer, N. Patel, G. Gross-
                                                                      mann, M. Stumptner, A. Kuusk, Artificial intelli-
                                                                      gence for industry 4.0: Systematic review of ap-
                                                                      plications, challenges, and opportunities, Expert
                                                                      Systems with Applications 216 (2023) 119456.
                                                                  [2] M. Zuccotto, A. Castellini, D. L. Torre, L. Mola,
                                                                      A. Farinelli, Reinforcement learning applications
                                                                      in environmental sustainability: a review, Artificial
                                                                      Intelligence Review 57 (2024) 88.
                                     |X|
                                                                  [3] B. Shneiderman, Responsible ai: Bridging from
Figure 6: Average F1-score for the original detector HHAD             ethics to practice, Communications of the ACM 64
and augmented detectors [38]: H-AUG (ours, based on                   (2021) 32–35.
Hellinger distance), L-AUG (ours, based on log-likelihood), R-    [4] P. Mikalef, K. Conboy, J. E. Lundström, A. Popovič,
AUG (random-based baseline), D-AUG (drift-based baseline),            Thinking responsibly about responsible AI and ‘the
G-AUG (gaussian-based baseline), and S-AUG (SMOTE-based               dark side’of AI, 2022.
baseline) on different training set sizes in the INTCATCH         [5] I. Gabriel, Artificial intelligence, values, and align-
dataset. Averages are computed over 30 datasets, for each             ment, Minds and machines 30 (2020) 411–437.
dataset size.                                                     [6] P. Scholl, F. Dietrich, C. Otte, S. Udluft, Safe policy
                                                                      improvement approaches and their limitations, in:
                                                                      International Conference on Agents and Artificial
erated considering standard log-likelihood; (v) the low               Intelligence, Springer, 2022, pp. 74–98.
computational complexity and high parallelizability of            [7] M. Świechowski, K. Godlewski, B. Sawicki,
the proposed method allow for a fast data augmentation                J. Mańdziuk, Monte carlo tree search: A review
and retraining of HHAD. Figure 6 shows the results on                 of recent modifications and applications, Artificial
the INTCATCH dataset [41].                                            Intelligence Review 56 (2023) 2497–2562.
   Finally, we have recently addressed the problem of             [8] A. Castellini, F. Bianchi, E. Zorzi, T. D. Simão,
explainable anomaly detection, in order to provide useful             A. Farinelli, M. T. J. Spaan, Scalable safe policy
information about the source of the anomaly for easier                improvement via Monte Carlo tree search, in: Pro-
repair. To this aim, in [42] we showed that causal dis-               ceedings of the 40th International Conference on
covery based on Conditional Mutual Information (CMI)                  Machine Learning (ICML 2023), PMLR, 2023, pp.
between time series can achieve higher performance than               3732–3756.
standard deep learning antomaly detectors, on a bench-            [9] R. Sutton, A. Barto, Reinforcement learning, An
mark robotic dataset of the Pepper service robot3 . Our               introduction, 2nd ed., MIT Press, 2018.
methodology evaluates the variation of CMI between               [10] R. Laroche, P. Trichelair, R. Tachet Des Combes,
time series, thus providing a useful hint to the root cause           Safe policy improvement with baseline bootstrap-
of the anomaly. Moreover, it builds a nominal model                   ping, in: Proceedings of the 36th International
of the real physical relations between variables of the               Conference on Machine Learning (ICML), PMLR,
system, thus resulting in higher robustness and more                  2019, pp. 3652–3661.
accurate anomaly detection, compared to DNN methods              [11] D. Silver, J. Veness, Monte-carlo planning in large
(95% vs 90 % F1-score and 100% precision).                            pomdps, Advances in neural information process-
                                                                      ing systems 23 (2010).
                                                                 [12] N. Ye, A. Somani, D. Hsu, W. S. Lee, Despot: Online
6. Conclusion and future works                                        pomdp planning with regularization, Journal of
                                                                      Artificial Intelligence Research 58 (2017) 231–266.
Our methodologies aim at increasing transparency and
                                                                 [13] G. Mazzi, A. Castellini, A. Farinelli, Risk-aware
safety at different development levels, from planning to
                                                                      shielding of Partially Observable Monte Carlo Plan-
execution and verification. Our current research direc-
                                                                      ning policies, Artificial Intelligence 324 (2023).
tion includes the online integration of symbolic learning
                                                                 [14] C. Barrett, R. Sebastiani, S. A. Seshia, C. Tinelli,
and formal verification approaches into RL, focusing on
                                                                      Satisfiability modulo theories, 2021.
the current scalability issues.
                                                                 [15] G. Mazzi, D. Meli, A. Castellini, A. Farinelli, Learn-
                                                                      ing logic specifications for soft policy guidance in
                                                                      POMCP, in: Proceedings of the 2023 International
3
    https://sites.google.com/diag.uniroma1.it/robsec-data             Conference on Autonomous Agents and Multiagent
     Systems, AAMAS ’23, IFAAMAS, 2023, pp. 373–381.             [31] D. Corsi, L. Marzari, A. Pore, A. Farinelli, A. Casals,
[16] D. Meli, A. Castellini, A. Farinelli, Learning logic             P. Fiorini, D. Dall’Alba, Constrained reinforce-
     specifications for policy guidance in POMDPs: an                 ment learning and formal verification for safe
     inductive logic programming approach, Journal of                 colonoscopy navigation, in: 2023 IEEE/RSJ Inter-
     Artificial Intelligence Research 79 (2024) 725–776.              national Conference on Intelligent Robots and Sys-
[17] S. Muggleton, L. De Raedt, Inductive logic program-              tems (IROS), IEEE, 2023, pp. 10289–10294.
     ming: Theory and methods, The Journal of Logic              [32] E. Marchesini, L. Marzari, A. Farinelli, C. Amato,
     Programming 19 (1994) 629–679.                                   Safe deep reinforcement learning by verifying task-
[18] S. C. Tran, E. Pontelli, M. Balduccini, T. Schaub,               level properties, AAMAS ’23, International Foun-
     Answer set planning: a survey, Theory and Practice               dation for Autonomous Agents and Multiagent Sys-
     of Logic Programming 23 (2023) 226–298.                          tems, 2023, p. 1466–1475.
[19] D. Meli, H. Nakawala, P. Fiorini, Logic program-            [33] L. Marzari, E. Marchesini, A. Farinelli, Online safety
     ming for deliberative robotic task planning, Artifi-             property collection and refinement for safe deep
     cial Intelligence Review 56 (2023) 9011–9049.                    reinforcement learning in mapless navigation, in:
[20] C. Veronese, D. Meli, F. Bistaffa, M. Rodríguez-Soto,            2023 IEEE International Conference on Robotics
     A. Farinelli, J. A. Rodríguez-Aguilar, Inductive logic           and Automation (ICRA), IEEE, 2023, pp. 7133–7139.
     programming for transparent alignment with mul-             [34] A. Castellini, M. Bicego, F. Masillo, M. Zuccotto,
     tiple moral values, in: CEUR WORKSHOP PRO-                       A. Farinelli, Time series segmentation for state-
     CEEDINGS, 2023, pp. 84–88.                                       model generation of autonomous aquatic drones: A
[21] F. Trotti, A. Farinelli, R. Muradore, An online path             systematic framework, Engineering Applications
     planner based on pomdp for uavs, in: 2023 Euro-                  of Artificial Intelligence 90 (2020).
     pean Control Conference (ECC), IEEE, 2023.                  [35] A. Castellini, M. Bicego, D. Bloisi, J. Blum, F. Masillo,
[22] A. Isidori, Nonlinear control systems II, Springer,              S. Peignier, A. Farinelli, Subspace clustering for
     2013.                                                            situation assessment in aquatic drones: a sensitivity
[23] H. Khalil, Nonlinear Systems, Prentice Hall, 2002.               analysis for state-model improvement, Cybernetics
[24] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er-           and Systems 50 (2019) 658–671.
     han, I. Goodfellow, R. Fergus, Intriguing properties        [36] A. Castellini, F. Masillo, M. Bicego, D. Bloisi, J. Blum,
     of neural networks, arXiv:1312.6199 (2013).                      A. Farinelli, Subspace clustering for situation assess-
[25] C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett,             ment in aquatic drones, in: Proc. 33th ACM/SIGAPP
     M. J. Kochenderfer, et al., Algorithms for verifying             Symposium on Applied Computing, SAC, 2019, pp.
     deep neural networks, Foundations and Trends®                    930–937.
     in Optimization 4 (2021) 244–404.                           [37] D. Azzalini, A. Castellini, M. Luperto, A. Farinelli,
[26] G. Katz, C. Barrett, D. L. Dill, K. Julian, M. J. Kochen-        F. Amigoni, et al., HMMs for anomaly detection
     derfer, Reluplex: An efficient smt solver for veri-              in autonomous robots, in: Proc. AAMAS, 2020, p.
     fying deep neural networks, in: International con-               105–113.
     ference on computer aided verification, Springer,           [38] A. Castellini, F. Masillo, D. Azzalini, F. Amigoni,
     2017, pp. 97–117.                                                A. Farinelli, Adversarial data augmentation for
[27] L. Marzari, D. Corsi, F. Cicalese, A. Farinelli, The             hmm-based anomaly detection, IEEE Transactions
     #DNN-Verification Problem: Counting Unsafe In-                   on Pattern Analysis and Machine Intelligence 45
     puts for Deep Neural Networks, in: International                 (2023) 14131–14143.
     Joint Conference on Artificial Intelligence (IJCAI),        [39] E. Hellinger,       Neue begründung der theorie
     2023, pp. 217–224.                                               quadratischer formen von unendlichvielen verän-
[28] L. Marzari, D. Corsi, E. Marchesini, A. Farinelli,               derlichen, Journal für die reine und angewandte
     F. Cicalese, Enumerating safe regions in deep neural             Mathematik 136 (1909) 210–271.
     networks with provable probabilistic guarantees, in:        [40] F. Karim, S. Majumdar, H. Darabi, Adversarial at-
     Proceedings of the AAAI Conference on Artificial                 tacks on time series, IEEE Trans Pattern Anal Mach
     Intelligence, volume 38, 2024, pp. 21387–21394.                  Intell 43 (2020) 3309–3320.
[29] S. S. Wilks, Statistical prediction with special refer-     [41] A. Castellini, D. Bloisi, J. Blum, F. Masillo,
     ence to the problem of tolerance limits, The annals              A. Farinelli, Multivariate sensor signals collected
     of mathematical statistics 13 (1942) 400–409.                    by aquatic drones involved in water monitoring: A
[30] G. Amir, D. Corsi, R. Yerushalmi, L. Marzari,                    complete dataset, Data Brief 30 (2020) 105436.
     D. Harel, A. Farinelli, G. Katz, Verifying learning-        [42] D. Meli, Explainable online unsupervised anomaly
     based robotic navigation systems, in: 29th Inter-                detection for cyber-physical systems via causal dis-
     national Conference TACAS, Springer, 2023, pp.                   covery from time series, arXiv:2404.09871 (2024).
     607–627.