=Paper=
{{Paper
|id=Vol-3762/547
|storemode=property
|title=Developing safe and explainable autonomous agents: from simulation to the real world
|pdfUrl=https://ceur-ws.org/Vol-3762/547.pdf
|volume=Vol-3762
|authors=Federico Bianchi,Alberto Castellini,Alessandro Farinelli,Luca Marzari,Daniele Meli,Francesco Trotti,Celeste Veronese
|dblpUrl=https://dblp.org/rec/conf/ital-ia/0002CFMMTV24
}}
==Developing safe and explainable autonomous agents: from simulation to the real world==
Developing safe and explainable autonomous agents:
from simulation to the real world
Federico Bianchi1,† , Alberto Castellini1,† , Alessandro Farinelli1,† , Luca Marzari1,† ,
Daniele Meli1,*,† , Francesco Trotti1,† and Celeste Veronese1,†
1
University of Verona, strada Le Grazie 15, 37135, Verona, Italy
Abstract
Responsible artificial intelligence is the next challenge of research to foster the deployment of autonomous systems in the real
world. In this paper, we focus on safe and explainable design and deployment of autonomous agents, e.g., robots. In particular,
we present our recent contributions to: i) safe and explainable planning, leveraging on safe Reinforcement Learning (RL) and
neurosymbolic planning; ii) effective deployment of RL policies via model-based control; iii) formal verification of the safety
of deep RL policies; and iv) explainable anomaly detection of complex real systems.
Keywords
Safe Reinforcement Learning, Formal verification of neural networks, Neurosymbolic AI, Planning under uncertainty
1. Introduction from deep RL, via formal verification. Finally, we present
solutions for efficient and explainable anomaly detection
Artificial Intelligence (AI) and robotics are pervading in autonomous systems.
everyday activities, from industrial automation [1] to
environmental monitoring [2]. As more and more so-
phisticated autonomous cognitive systems interact with 2. Safe and explainable planning
humans in complex scenarios, the development of re-
sponsible AI solutions [3] becomes a fundamental design We assume the autonomous agent and the environment
requirement, as prescribed also by the latest international are represented as a Markov Decision Process (MDP)
regulations 1 . Responsible AI involves several aspects, 𝑀 = ⟨𝑆, 𝐴, 𝑇, 𝑅⟩, defining respectively the state space,
including safety, transparency and trustability [4]. Safety the action space, the transition map, and the reward map.
regards providing guarantees about the behavior of AI The first approach is based on Safe Policy Improvement
systems, e.g., autonomous robotic systems, in terms of (SPI) [6] and Monte Carlo Tree Search (MCTS) [7], which
performance and potential harm to the surrounding en- performs simulations in a model of the real environment
vironment or humans. Transparency and trustability are to estimate the optimal policy online. The second solution
related to the perception of humans interacting with the combines MCTS with symbolic and logical reasoning, to
AI system, e.g., the explainability and compliance of the guide the exploration of the RL agent towards better
system’s behaviour to the expectation of humans from a pathways.
moral or rational perspective [5].
In this paper, we summarize our main contributions 2.1. Safe Policy Improvement with MCTS
in the field of responsible AI. We focus on autonomous
agents, e.g., robots, and present our approach to responsi- Safe RL [9] investigates how to learn policies that max-
ble autonomy at different developmental stages. We first imize the performance of the agent, while respecting
describe our solutions for safe and explainable planning safety constraints during learning. One popular approach
in autonomous agents, via safe Reinforcement Learning is Safe Policy Improvement with Baseline Bootstrapping
(RL) and neurosymbolic approaches. We also analyze (SPIBB) [10]. SPIBB starts from a baseline policy 𝜋0 (e.g.,
the problem of safe and compliant transfer of a planned a sub-optimal expert-designed policy). The algorithm
policy on a physical robotic system, combining RL with then collects a batch dataset of trajectories (i.e., state-
model-based control. We then investigate how to provide action pairs), and uses the baseline policy on less fre-
formal guarantees of safety for black-box policies, e.g., quent state-action pairs. However, it does not scale to
large state and action spaces.
Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- To improve scalability, we recently introduced Monte
nized by CINI, May 29-30, 2024, Naples, Italy Carlo Tree Search Safe Policy Improvement with Baseline
*
Corresponding author.
† Bootstrapping (MCTS-SPIBB) [8]. The algorithm exploits
Authors are in alphabetical order.
$ daniele.meli@univr.it (D. Meli)
MCTS to estimate 𝜋𝐼 online, hence it can scale to large
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License domains, while keeping the asymptotic guarantees of con-
Attribution 4.0 International (CC BY 4.0).
1
European AI Act vergence of SPIBB [8]. In [8] we compared MCTS-SPIBB
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
a)
trajectories (belief-action pairs) of POMDP executions
collected offline. Specifically, given a set of task-related
concepts 𝐹 provided by the user to describe the belief
space, offline trajectories are converted to a logical for-
malism, where logical predicates encode concepts in 𝐹 .
b) As an example, consider the paradigmatic POMDP rock-
sample scenario depicted in Figure 2a, where a robotic
agent must collect valuable rocks (green dots) avoiding
worthless ones (red dots) in a grid world. The state of
the POMDP includes information about the position of
agents and rocks, and the probability (belief) of rocks
to be valuable. The state can be translated to a logical
representation in terms of the following concepts in 𝐹 :
Figure 1: Safe Policy Improvement: a) Comparison of per- the Manhattan distance D between the agent and each
formance among SPI algorithms; b) Scalability comparison rock R dist(R,D) and the probability P of a rock R to
between MCTS-SPIBB and SPIBB [8]. be valuable guess(R,P). Defining semantic concepts
about the domain is easier than defining directly policy
specifications, since it simply involves a re-interpretation
with several state-of-the-art SPI algorithms on bench- of the state formalization.
mark domains (see Figure 1.a). Furthermore, we showed We preliminarily learn policy specifications from tra-
that on very large state spaces, such as the standard jectories collected from a rocksample agent operating in a
SysAdmin benchmark2 with up to 35 machines, MCTS- 12×12 grid with 4 rocks. We adopt the logical formalism
SPIBB is the only SPI algorithm capable of computing of Answer Set Programming (ASP) [18], which represents
improved policies (see Figure 1.b). the state of the art for planning in first-order logic [19].
Our approach requires relatively few training trajectories
(less than 800 in rocksample) to learn interpretable trans-
2.2. Planning with logics in MCTS parent policy specifications. Moreover, learned heuristics
MCTS may require a large number of online simulations allow POMCP to use significantly fewer online simula-
when the state and action spaces are large. This be- tions per step of execution (Figure 2b, achieving compara-
comes even more critical in Partially Observable MDPs ble performance with respect to expert-designed specifi-
(POMDPs), where part of the state is uncertain, hence a cations (pref ). Finally, the heuristics generalize to unseen
particle filter must be used to sample and estimate the problem instances, e.g., enhancing scalability to larger
actual state of the system, starting from a probability grid sizes (Figure 2c) which require a longer planning
distribution called the belief. Recent online solvers for horizon, typically challenging for MCTS-based solvers.
POMDPs, e.g., Partially Observable Monte Carlo Plan- In [20], we also showed that this approach can be used
ning (POMCP) [11] and Determinized Sparse Partially to derive policy explanations of black-box model-free RL
Observable Trees (DESPOT) [12] require the definition of agents, in the context of autonomous driving.
task-specific policy heuristics, in order to efficiently bias
the exploration towards most fruitful policies. Moreover,
it is essential to guarantee the exploration of only safe 3. Safe deployment in the real
policies. world
To this aim, in [13] we proposed an approach based on
maximum satisfiability modulo theory [14] to probabilis- The policy computed by a RL-based planner, e.g., POMCP
tically verify the adherence of the policy computed by for POMDPs, cannot always be effectively and safely de-
POMCP to a set of user-defined specifications, expressed ployed on a real robotic system. Indeed, MCTS-based
in a fragment of first-order logic. In this way, we can planners perform online simulations based on a model
shield undesired actions in MCTS simulations, and in- of the environment, but the chosen policy must be
crease the explainability of the generated policy thanks adapted to the inevitable unmodeled inaccuracies and
to the logic formalism. However, defining the logical non-linearities of the physical plant. To overcome this
policy specifications may be tedious and error-prone in problem, in [21] we implemented the two-layer archi-
realistic complex domains. For this reason, in [15, 16] tecture depicted in Figure 3, combining a high-level con-
we proposed an approach based on inductive logic pro- troller based on POMCP with a low-level model-based
gramming [17] to learn logical policy heuristics from controller: The low-level controller is designed using
the inverse dynamics technique [22, 23], that allows
2
SysAdmin: https://jair.org/index.php/jair/article/view/10341/24723 to linearize via feedback the system. In particular, let
(a) (b) (c)
Figure 2: a) Rocksample setup; b) Results of [16] with few simulations and c) on larger grids.
High-level controller Low-level controller
𝑢𝑟𝑒𝑓 𝑢
POMDP Control Agent
𝑥
𝑜
Belief
update
Figure 3: Block diagram of the two-layer architecture
Figure 4: Drone paths. The black and blue arrows are, respec-
𝑥̇ = 𝑓 (𝑥) + 𝑔(𝑥)𝑢 be a non-linear dynamical system de- tively, the desired yaw angle and drone initial yaw angle
pending on state 𝑥 and command action 𝑢. The controller
is obtained as 𝑢 = 𝑔(𝑥)1
(−𝑓 (𝑥) + 𝑣), where 𝑣 is an aux-
iliary control signal. Therefore, the low-level controller
exploits the auxiliary control signal 𝑣, which is mapped
as reference values for the controller, to compute the
command 𝑢. The high-level controller is formalized as a
POMDP that exploits the linearized closed-loop model to (a) (b)
select the best local action 𝑢 for the agents. In particular, Figure 5: Realistic robotic scenarios: a) robotic mapless navi-
the POMCP provides the sub-optimal reference values for gation; b) autonomous colonoscopy navigation.
the low-level controller optimizing user-defined objec-
tives, encoded in the reward function. Note, the two-layer
have different control loop sample rates; the low-level
has to be fast since it has to provide the commands to 4. Formal verification of deep RL
the agent, while the high-level can be slower since it
Trained RL policies, especially model-free RL policies
generates the reference values for the low-level.
encoded in a Deep Neural Network (DNN), do not guar-
The two-layer approach is tested in a scenario where
antee to provably meet the safety standards required in
an aerial drone has to reach a target area, avoiding some
the real world. For instance, DNNs are vulnerable to the
no-fly zones and minimizing fuel consumption or attitude
so-called adversarial inputs, i.e., minimal input variations
error. Therefore, the reward function is composed of four
that fool the system to output an undesired value (or
contributions: an attractive potential component to reach
action) [24]. Consequently, in recent years, Formal Ver-
the target, a repulsive component to avoid the no-fly zone,
ification (FV) of DNNs (aka DNN-Verification) has been
the fuel consumption and the heading error. The last
developed to provide formal guarantees on the behavior
two components are weighted to rank between different
of these systems [25]. In particular, given a predefined
objectives. Figure 4 shows the trajectory followed by
safety property, the goal of DNN-Verification is to assert
the drone optimizing only the fuel consumption (black
whether at least one input configuration exists that vio-
line), both fuel and attitude (red dashed line) and only the
lates the property. However, given the non-convex and
attitude error (green dotted line). The black line follows
non-linear nature of DNNs, verifying safety properties
the shortest path to minimize the fuel, the red line follows
in the worst case has been shown to be an NP-complete
the shortest path but near the target position the attitude
problem [26]. Moreover, the standard binary response
error component increases to align the drone with the
of DNN-verification (safe vs. unsafe) does not provide
desired attitude (black arrow). The green line follows the
sufficient information to compare the safety of different
optimal path to minimize only the attitude.
DNNs.
Table 1
Results of model selection. SAT indicates property violation. training loop allowed us to obtain a more robust approach
Θ’s denote the safety property not to touch the colon wall in
with respect to other existing Safe DRL methodologies in
any cardinal direction.
the context of autonomous navigation, promoting safer
Safety Properties behaviors while maintaining similar or better returns.
Θ↑ Θ↓ Θ← Θ→ FV selection
Method SAT SAT SAT SAT Safe models
PPO 300 246 80 167 0 5. Explainable and data-efficient
L-PPO 221 198 53 161 3
anomaly detection
Autonomous systems operating in the real world are re-
To overcome these limitations, in [27], we proposed a
quired to reliably work over long periods of time (Long
novel quantitative formulation of the DNN-verification
Term Autonomy, LTA) under changing and unpredictable
problem, allowing to enumerate all unsafe regions for a
environmental conditions. In this context, anomaly detec-
given domain of interest and thus rank the models on the
tion is crucial to promptly identify situations that diverge
portion of unsafe regions they may have. However, we
from the desired behaviour. Specifically, unsupervised
showed that this problem turns out to be #P-hard. Hence,
anomaly detection aims to idenfity anomalies related to
in [28] we proposed 𝜖-ProVe. Exploiting a controllable
the global behavior of the system [34, 35, 36], monitor-
underestimation of the output reachable sets obtained
ing multivariate time series generated from sensors and
via statistical prediction of tolerance limits [29], the al-
actuators and starting from the only knowledge of the
gorithm provides a tight —with provable probabilistic
nominal (i.e., anomaly-free) behavior.
guarantees— lower estimate of the (un)safe areas.
We recently proposed two contributions in this area.
We validated DNN-Verification in realistic robotic
namely, an online approach for detecting anomalous be-
safety-critical scenarios. In particular, in [30], we showed
haviors of robotic systems involved in complex LTA sce-
that DNN-Verification can be used to rank different suc-
narios (HHAD) [37], and an adversarial data augmen-
cessful DNN models according to the level of safety,
tation and retraining approach (HHAD-AUG) [38]. In
verifying collision avoidance in robotic mapless navi-
HHAD [37], we use Hidden Markov Models (HMMs)
gation. We then applied a similar pipeline in a more
to represent the nominal behavior of a robot. We then
safety-critical domain, namely autonomous colonoscopy
evaluate online the dissimilarity between the probability
navigation for colorectal detection with deep RL [31]
distribution of multivariate sensor time series in a sliding
(Figure 5). In particular, we trained an agent to navigate
window and the emission probability of the related HMM
the endoscope in patient-specific colon models based on
hidden states. We adopt the Hellinger distance [39] as a
endoscopic images, using Constrained RL (CRL) to im-
distance measure since it is bounded (thus it lends itself
pose a safety cost for the agent to touch colon walls at
to simpler interpretation and thresholding) and it is less
the training stage. Nevertheless, due to the Lagrangian
noisy, hence more informative and discriminative.
relaxation implemented by CRL to perform constrained
In HHAD-AUG [38], we address the usual lack (or
optimization, safety may not be guaranteed. Hence, we
paucity) of anomalous examples and the noise that char-
adopted a model selection strategy that harnesses FV to
acterizes time series of real systems. We propose a data
evaluate the safety of a vast pool of trained policies to
augmentation method based on perturbed (adversarial)
select the one the meets all the behavioral preferences
time series [40], having the advantage of not requiring
specified. The results of our study are reported in Ta-
any prior knowledge about the application domain and
ble 1 over 300 trained models, finding 3 completely safe
data conformation. We generate adversarial examples
models that provably meet the safety requirements.
only for nominal points, optimizing a loss function based
Finally, to address the necessity of running the FV
on the Hellinger distance between the observed and the
process only after training due to its computational com-
expected data distributions.
plexity, in [32] we proposed an unconstrained DRL frame-
We evaluate our data augmentation and re-training ap-
work that leverages a novel sample-based method to ap-
proach on several public datasets, plus one collected from
proximate local violations of input-output conditions to
our aquatic drones developed in the EU H2020 project
foster the learning of safer behaviors inside the training
INTCATCH [41]. Results show that (i) the adversarial
loop. However, such conditions are typically hard-coded
generation algorithms can generate meaningful adversar-
and require task-level knowledge, making their applica-
ial examples for HHAD, employing them to significantly
tion intractable in challenging safety-critical tasks. To
improve the performance of HHAD; (ii) our data augmen-
this end, in [33], we introduced the Collection and Refine-
tation method yields higher performance than examples
ment of Online Properties (CROP) framework to collect
generated by state-of-the-art augmentation methods; (iii)
and refine safety properties during training. The combi-
adversarial examples generated considering the Hellinger
nation of CROP with approximate violation inside the
distance yield higher improvement than examples gen-
References
[1] Z. Jan, F. Ahamed, W. Mayer, N. Patel, G. Gross-
mann, M. Stumptner, A. Kuusk, Artificial intelli-
gence for industry 4.0: Systematic review of ap-
plications, challenges, and opportunities, Expert
Systems with Applications 216 (2023) 119456.
[2] M. Zuccotto, A. Castellini, D. L. Torre, L. Mola,
A. Farinelli, Reinforcement learning applications
in environmental sustainability: a review, Artificial
Intelligence Review 57 (2024) 88.
|X|
[3] B. Shneiderman, Responsible ai: Bridging from
Figure 6: Average F1-score for the original detector HHAD ethics to practice, Communications of the ACM 64
and augmented detectors [38]: H-AUG (ours, based on (2021) 32–35.
Hellinger distance), L-AUG (ours, based on log-likelihood), R- [4] P. Mikalef, K. Conboy, J. E. Lundström, A. Popovič,
AUG (random-based baseline), D-AUG (drift-based baseline), Thinking responsibly about responsible AI and ‘the
G-AUG (gaussian-based baseline), and S-AUG (SMOTE-based dark side’of AI, 2022.
baseline) on different training set sizes in the INTCATCH [5] I. Gabriel, Artificial intelligence, values, and align-
dataset. Averages are computed over 30 datasets, for each ment, Minds and machines 30 (2020) 411–437.
dataset size. [6] P. Scholl, F. Dietrich, C. Otte, S. Udluft, Safe policy
improvement approaches and their limitations, in:
International Conference on Agents and Artificial
erated considering standard log-likelihood; (v) the low Intelligence, Springer, 2022, pp. 74–98.
computational complexity and high parallelizability of [7] M. Świechowski, K. Godlewski, B. Sawicki,
the proposed method allow for a fast data augmentation J. Mańdziuk, Monte carlo tree search: A review
and retraining of HHAD. Figure 6 shows the results on of recent modifications and applications, Artificial
the INTCATCH dataset [41]. Intelligence Review 56 (2023) 2497–2562.
Finally, we have recently addressed the problem of [8] A. Castellini, F. Bianchi, E. Zorzi, T. D. Simão,
explainable anomaly detection, in order to provide useful A. Farinelli, M. T. J. Spaan, Scalable safe policy
information about the source of the anomaly for easier improvement via Monte Carlo tree search, in: Pro-
repair. To this aim, in [42] we showed that causal dis- ceedings of the 40th International Conference on
covery based on Conditional Mutual Information (CMI) Machine Learning (ICML 2023), PMLR, 2023, pp.
between time series can achieve higher performance than 3732–3756.
standard deep learning antomaly detectors, on a bench- [9] R. Sutton, A. Barto, Reinforcement learning, An
mark robotic dataset of the Pepper service robot3 . Our introduction, 2nd ed., MIT Press, 2018.
methodology evaluates the variation of CMI between [10] R. Laroche, P. Trichelair, R. Tachet Des Combes,
time series, thus providing a useful hint to the root cause Safe policy improvement with baseline bootstrap-
of the anomaly. Moreover, it builds a nominal model ping, in: Proceedings of the 36th International
of the real physical relations between variables of the Conference on Machine Learning (ICML), PMLR,
system, thus resulting in higher robustness and more 2019, pp. 3652–3661.
accurate anomaly detection, compared to DNN methods [11] D. Silver, J. Veness, Monte-carlo planning in large
(95% vs 90 % F1-score and 100% precision). pomdps, Advances in neural information process-
ing systems 23 (2010).
[12] N. Ye, A. Somani, D. Hsu, W. S. Lee, Despot: Online
6. Conclusion and future works pomdp planning with regularization, Journal of
Artificial Intelligence Research 58 (2017) 231–266.
Our methodologies aim at increasing transparency and
[13] G. Mazzi, A. Castellini, A. Farinelli, Risk-aware
safety at different development levels, from planning to
shielding of Partially Observable Monte Carlo Plan-
execution and verification. Our current research direc-
ning policies, Artificial Intelligence 324 (2023).
tion includes the online integration of symbolic learning
[14] C. Barrett, R. Sebastiani, S. A. Seshia, C. Tinelli,
and formal verification approaches into RL, focusing on
Satisfiability modulo theories, 2021.
the current scalability issues.
[15] G. Mazzi, D. Meli, A. Castellini, A. Farinelli, Learn-
ing logic specifications for soft policy guidance in
POMCP, in: Proceedings of the 2023 International
3
https://sites.google.com/diag.uniroma1.it/robsec-data Conference on Autonomous Agents and Multiagent
Systems, AAMAS ’23, IFAAMAS, 2023, pp. 373–381. [31] D. Corsi, L. Marzari, A. Pore, A. Farinelli, A. Casals,
[16] D. Meli, A. Castellini, A. Farinelli, Learning logic P. Fiorini, D. Dall’Alba, Constrained reinforce-
specifications for policy guidance in POMDPs: an ment learning and formal verification for safe
inductive logic programming approach, Journal of colonoscopy navigation, in: 2023 IEEE/RSJ Inter-
Artificial Intelligence Research 79 (2024) 725–776. national Conference on Intelligent Robots and Sys-
[17] S. Muggleton, L. De Raedt, Inductive logic program- tems (IROS), IEEE, 2023, pp. 10289–10294.
ming: Theory and methods, The Journal of Logic [32] E. Marchesini, L. Marzari, A. Farinelli, C. Amato,
Programming 19 (1994) 629–679. Safe deep reinforcement learning by verifying task-
[18] S. C. Tran, E. Pontelli, M. Balduccini, T. Schaub, level properties, AAMAS ’23, International Foun-
Answer set planning: a survey, Theory and Practice dation for Autonomous Agents and Multiagent Sys-
of Logic Programming 23 (2023) 226–298. tems, 2023, p. 1466–1475.
[19] D. Meli, H. Nakawala, P. Fiorini, Logic program- [33] L. Marzari, E. Marchesini, A. Farinelli, Online safety
ming for deliberative robotic task planning, Artifi- property collection and refinement for safe deep
cial Intelligence Review 56 (2023) 9011–9049. reinforcement learning in mapless navigation, in:
[20] C. Veronese, D. Meli, F. Bistaffa, M. Rodríguez-Soto, 2023 IEEE International Conference on Robotics
A. Farinelli, J. A. Rodríguez-Aguilar, Inductive logic and Automation (ICRA), IEEE, 2023, pp. 7133–7139.
programming for transparent alignment with mul- [34] A. Castellini, M. Bicego, F. Masillo, M. Zuccotto,
tiple moral values, in: CEUR WORKSHOP PRO- A. Farinelli, Time series segmentation for state-
CEEDINGS, 2023, pp. 84–88. model generation of autonomous aquatic drones: A
[21] F. Trotti, A. Farinelli, R. Muradore, An online path systematic framework, Engineering Applications
planner based on pomdp for uavs, in: 2023 Euro- of Artificial Intelligence 90 (2020).
pean Control Conference (ECC), IEEE, 2023. [35] A. Castellini, M. Bicego, D. Bloisi, J. Blum, F. Masillo,
[22] A. Isidori, Nonlinear control systems II, Springer, S. Peignier, A. Farinelli, Subspace clustering for
2013. situation assessment in aquatic drones: a sensitivity
[23] H. Khalil, Nonlinear Systems, Prentice Hall, 2002. analysis for state-model improvement, Cybernetics
[24] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er- and Systems 50 (2019) 658–671.
han, I. Goodfellow, R. Fergus, Intriguing properties [36] A. Castellini, F. Masillo, M. Bicego, D. Bloisi, J. Blum,
of neural networks, arXiv:1312.6199 (2013). A. Farinelli, Subspace clustering for situation assess-
[25] C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett, ment in aquatic drones, in: Proc. 33th ACM/SIGAPP
M. J. Kochenderfer, et al., Algorithms for verifying Symposium on Applied Computing, SAC, 2019, pp.
deep neural networks, Foundations and Trends® 930–937.
in Optimization 4 (2021) 244–404. [37] D. Azzalini, A. Castellini, M. Luperto, A. Farinelli,
[26] G. Katz, C. Barrett, D. L. Dill, K. Julian, M. J. Kochen- F. Amigoni, et al., HMMs for anomaly detection
derfer, Reluplex: An efficient smt solver for veri- in autonomous robots, in: Proc. AAMAS, 2020, p.
fying deep neural networks, in: International con- 105–113.
ference on computer aided verification, Springer, [38] A. Castellini, F. Masillo, D. Azzalini, F. Amigoni,
2017, pp. 97–117. A. Farinelli, Adversarial data augmentation for
[27] L. Marzari, D. Corsi, F. Cicalese, A. Farinelli, The hmm-based anomaly detection, IEEE Transactions
#DNN-Verification Problem: Counting Unsafe In- on Pattern Analysis and Machine Intelligence 45
puts for Deep Neural Networks, in: International (2023) 14131–14143.
Joint Conference on Artificial Intelligence (IJCAI), [39] E. Hellinger, Neue begründung der theorie
2023, pp. 217–224. quadratischer formen von unendlichvielen verän-
[28] L. Marzari, D. Corsi, E. Marchesini, A. Farinelli, derlichen, Journal für die reine und angewandte
F. Cicalese, Enumerating safe regions in deep neural Mathematik 136 (1909) 210–271.
networks with provable probabilistic guarantees, in: [40] F. Karim, S. Majumdar, H. Darabi, Adversarial at-
Proceedings of the AAAI Conference on Artificial tacks on time series, IEEE Trans Pattern Anal Mach
Intelligence, volume 38, 2024, pp. 21387–21394. Intell 43 (2020) 3309–3320.
[29] S. S. Wilks, Statistical prediction with special refer- [41] A. Castellini, D. Bloisi, J. Blum, F. Masillo,
ence to the problem of tolerance limits, The annals A. Farinelli, Multivariate sensor signals collected
of mathematical statistics 13 (1942) 400–409. by aquatic drones involved in water monitoring: A
[30] G. Amir, D. Corsi, R. Yerushalmi, L. Marzari, complete dataset, Data Brief 30 (2020) 105436.
D. Harel, A. Farinelli, G. Katz, Verifying learning- [42] D. Meli, Explainable online unsupervised anomaly
based robotic navigation systems, in: 29th Inter- detection for cyber-physical systems via causal dis-
national Conference TACAS, Springer, 2023, pp. covery from time series, arXiv:2404.09871 (2024).
607–627.