Developing safe and explainable autonomous agents: from simulation to the real world Federico Bianchi1,† , Alberto Castellini1,† , Alessandro Farinelli1,† , Luca Marzari1,† , Daniele Meli1,*,† , Francesco Trotti1,† and Celeste Veronese1,† 1 University of Verona, strada Le Grazie 15, 37135, Verona, Italy Abstract Responsible artificial intelligence is the next challenge of research to foster the deployment of autonomous systems in the real world. In this paper, we focus on safe and explainable design and deployment of autonomous agents, e.g., robots. In particular, we present our recent contributions to: i) safe and explainable planning, leveraging on safe Reinforcement Learning (RL) and neurosymbolic planning; ii) effective deployment of RL policies via model-based control; iii) formal verification of the safety of deep RL policies; and iv) explainable anomaly detection of complex real systems. Keywords Safe Reinforcement Learning, Formal verification of neural networks, Neurosymbolic AI, Planning under uncertainty 1. Introduction from deep RL, via formal verification. Finally, we present solutions for efficient and explainable anomaly detection Artificial Intelligence (AI) and robotics are pervading in autonomous systems. everyday activities, from industrial automation [1] to environmental monitoring [2]. As more and more so- phisticated autonomous cognitive systems interact with 2. Safe and explainable planning humans in complex scenarios, the development of re- sponsible AI solutions [3] becomes a fundamental design We assume the autonomous agent and the environment requirement, as prescribed also by the latest international are represented as a Markov Decision Process (MDP) regulations 1 . Responsible AI involves several aspects, 𝑀 = ⟨𝑆, 𝐴, 𝑇, 𝑅⟩, defining respectively the state space, including safety, transparency and trustability [4]. Safety the action space, the transition map, and the reward map. regards providing guarantees about the behavior of AI The first approach is based on Safe Policy Improvement systems, e.g., autonomous robotic systems, in terms of (SPI) [6] and Monte Carlo Tree Search (MCTS) [7], which performance and potential harm to the surrounding en- performs simulations in a model of the real environment vironment or humans. Transparency and trustability are to estimate the optimal policy online. The second solution related to the perception of humans interacting with the combines MCTS with symbolic and logical reasoning, to AI system, e.g., the explainability and compliance of the guide the exploration of the RL agent towards better system’s behaviour to the expectation of humans from a pathways. moral or rational perspective [5]. In this paper, we summarize our main contributions 2.1. Safe Policy Improvement with MCTS in the field of responsible AI. We focus on autonomous agents, e.g., robots, and present our approach to responsi- Safe RL [9] investigates how to learn policies that max- ble autonomy at different developmental stages. We first imize the performance of the agent, while respecting describe our solutions for safe and explainable planning safety constraints during learning. One popular approach in autonomous agents, via safe Reinforcement Learning is Safe Policy Improvement with Baseline Bootstrapping (RL) and neurosymbolic approaches. We also analyze (SPIBB) [10]. SPIBB starts from a baseline policy 𝜋0 (e.g., the problem of safe and compliant transfer of a planned a sub-optimal expert-designed policy). The algorithm policy on a physical robotic system, combining RL with then collects a batch dataset of trajectories (i.e., state- model-based control. We then investigate how to provide action pairs), and uses the baseline policy on less fre- formal guarantees of safety for black-box policies, e.g., quent state-action pairs. However, it does not scale to large state and action spaces. Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- To improve scalability, we recently introduced Monte nized by CINI, May 29-30, 2024, Naples, Italy Carlo Tree Search Safe Policy Improvement with Baseline * Corresponding author. † Bootstrapping (MCTS-SPIBB) [8]. The algorithm exploits Authors are in alphabetical order. $ daniele.meli@univr.it (D. Meli) MCTS to estimate 𝜋𝐼 online, hence it can scale to large © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License domains, while keeping the asymptotic guarantees of con- Attribution 4.0 International (CC BY 4.0). 1 European AI Act vergence of SPIBB [8]. In [8] we compared MCTS-SPIBB CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings a) trajectories (belief-action pairs) of POMDP executions collected offline. Specifically, given a set of task-related concepts 𝐹 provided by the user to describe the belief space, offline trajectories are converted to a logical for- malism, where logical predicates encode concepts in 𝐹 . b) As an example, consider the paradigmatic POMDP rock- sample scenario depicted in Figure 2a, where a robotic agent must collect valuable rocks (green dots) avoiding worthless ones (red dots) in a grid world. The state of the POMDP includes information about the position of agents and rocks, and the probability (belief) of rocks to be valuable. The state can be translated to a logical representation in terms of the following concepts in 𝐹 : Figure 1: Safe Policy Improvement: a) Comparison of per- the Manhattan distance D between the agent and each formance among SPI algorithms; b) Scalability comparison rock R dist(R,D) and the probability P of a rock R to between MCTS-SPIBB and SPIBB [8]. be valuable guess(R,P). Defining semantic concepts about the domain is easier than defining directly policy specifications, since it simply involves a re-interpretation with several state-of-the-art SPI algorithms on bench- of the state formalization. mark domains (see Figure 1.a). Furthermore, we showed We preliminarily learn policy specifications from tra- that on very large state spaces, such as the standard jectories collected from a rocksample agent operating in a SysAdmin benchmark2 with up to 35 machines, MCTS- 12×12 grid with 4 rocks. We adopt the logical formalism SPIBB is the only SPI algorithm capable of computing of Answer Set Programming (ASP) [18], which represents improved policies (see Figure 1.b). the state of the art for planning in first-order logic [19]. Our approach requires relatively few training trajectories (less than 800 in rocksample) to learn interpretable trans- 2.2. Planning with logics in MCTS parent policy specifications. Moreover, learned heuristics MCTS may require a large number of online simulations allow POMCP to use significantly fewer online simula- when the state and action spaces are large. This be- tions per step of execution (Figure 2b, achieving compara- comes even more critical in Partially Observable MDPs ble performance with respect to expert-designed specifi- (POMDPs), where part of the state is uncertain, hence a cations (pref ). Finally, the heuristics generalize to unseen particle filter must be used to sample and estimate the problem instances, e.g., enhancing scalability to larger actual state of the system, starting from a probability grid sizes (Figure 2c) which require a longer planning distribution called the belief. Recent online solvers for horizon, typically challenging for MCTS-based solvers. POMDPs, e.g., Partially Observable Monte Carlo Plan- In [20], we also showed that this approach can be used ning (POMCP) [11] and Determinized Sparse Partially to derive policy explanations of black-box model-free RL Observable Trees (DESPOT) [12] require the definition of agents, in the context of autonomous driving. task-specific policy heuristics, in order to efficiently bias the exploration towards most fruitful policies. Moreover, it is essential to guarantee the exploration of only safe 3. Safe deployment in the real policies. world To this aim, in [13] we proposed an approach based on maximum satisfiability modulo theory [14] to probabilis- The policy computed by a RL-based planner, e.g., POMCP tically verify the adherence of the policy computed by for POMDPs, cannot always be effectively and safely de- POMCP to a set of user-defined specifications, expressed ployed on a real robotic system. Indeed, MCTS-based in a fragment of first-order logic. In this way, we can planners perform online simulations based on a model shield undesired actions in MCTS simulations, and in- of the environment, but the chosen policy must be crease the explainability of the generated policy thanks adapted to the inevitable unmodeled inaccuracies and to the logic formalism. However, defining the logical non-linearities of the physical plant. To overcome this policy specifications may be tedious and error-prone in problem, in [21] we implemented the two-layer archi- realistic complex domains. For this reason, in [15, 16] tecture depicted in Figure 3, combining a high-level con- we proposed an approach based on inductive logic pro- troller based on POMCP with a low-level model-based gramming [17] to learn logical policy heuristics from controller: The low-level controller is designed using the inverse dynamics technique [22, 23], that allows 2 SysAdmin: https://jair.org/index.php/jair/article/view/10341/24723 to linearize via feedback the system. In particular, let (a) (b) (c) Figure 2: a) Rocksample setup; b) Results of [16] with few simulations and c) on larger grids. High-level controller Low-level controller 𝑢𝑟𝑒𝑓 𝑢 POMDP Control Agent 𝑥 𝑜 Belief update Figure 3: Block diagram of the two-layer architecture Figure 4: Drone paths. The black and blue arrows are, respec- 𝑥̇ = 𝑓 (𝑥) + 𝑔(𝑥)𝑢 be a non-linear dynamical system de- tively, the desired yaw angle and drone initial yaw angle pending on state 𝑥 and command action 𝑢. The controller is obtained as 𝑢 = 𝑔(𝑥)1 (−𝑓 (𝑥) + 𝑣), where 𝑣 is an aux- iliary control signal. Therefore, the low-level controller exploits the auxiliary control signal 𝑣, which is mapped as reference values for the controller, to compute the command 𝑢. The high-level controller is formalized as a POMDP that exploits the linearized closed-loop model to (a) (b) select the best local action 𝑢 for the agents. In particular, Figure 5: Realistic robotic scenarios: a) robotic mapless navi- the POMCP provides the sub-optimal reference values for gation; b) autonomous colonoscopy navigation. the low-level controller optimizing user-defined objec- tives, encoded in the reward function. Note, the two-layer have different control loop sample rates; the low-level has to be fast since it has to provide the commands to 4. Formal verification of deep RL the agent, while the high-level can be slower since it Trained RL policies, especially model-free RL policies generates the reference values for the low-level. encoded in a Deep Neural Network (DNN), do not guar- The two-layer approach is tested in a scenario where antee to provably meet the safety standards required in an aerial drone has to reach a target area, avoiding some the real world. For instance, DNNs are vulnerable to the no-fly zones and minimizing fuel consumption or attitude so-called adversarial inputs, i.e., minimal input variations error. Therefore, the reward function is composed of four that fool the system to output an undesired value (or contributions: an attractive potential component to reach action) [24]. Consequently, in recent years, Formal Ver- the target, a repulsive component to avoid the no-fly zone, ification (FV) of DNNs (aka DNN-Verification) has been the fuel consumption and the heading error. The last developed to provide formal guarantees on the behavior two components are weighted to rank between different of these systems [25]. In particular, given a predefined objectives. Figure 4 shows the trajectory followed by safety property, the goal of DNN-Verification is to assert the drone optimizing only the fuel consumption (black whether at least one input configuration exists that vio- line), both fuel and attitude (red dashed line) and only the lates the property. However, given the non-convex and attitude error (green dotted line). The black line follows non-linear nature of DNNs, verifying safety properties the shortest path to minimize the fuel, the red line follows in the worst case has been shown to be an NP-complete the shortest path but near the target position the attitude problem [26]. Moreover, the standard binary response error component increases to align the drone with the of DNN-verification (safe vs. unsafe) does not provide desired attitude (black arrow). The green line follows the sufficient information to compare the safety of different optimal path to minimize only the attitude. DNNs. Table 1 Results of model selection. SAT indicates property violation. training loop allowed us to obtain a more robust approach Θ’s denote the safety property not to touch the colon wall in with respect to other existing Safe DRL methodologies in any cardinal direction. the context of autonomous navigation, promoting safer Safety Properties behaviors while maintaining similar or better returns. Θ↑ Θ↓ Θ← Θ→ FV selection Method SAT SAT SAT SAT Safe models PPO 300 246 80 167 0 5. Explainable and data-efficient L-PPO 221 198 53 161 3 anomaly detection Autonomous systems operating in the real world are re- To overcome these limitations, in [27], we proposed a quired to reliably work over long periods of time (Long novel quantitative formulation of the DNN-verification Term Autonomy, LTA) under changing and unpredictable problem, allowing to enumerate all unsafe regions for a environmental conditions. In this context, anomaly detec- given domain of interest and thus rank the models on the tion is crucial to promptly identify situations that diverge portion of unsafe regions they may have. However, we from the desired behaviour. Specifically, unsupervised showed that this problem turns out to be #P-hard. Hence, anomaly detection aims to idenfity anomalies related to in [28] we proposed 𝜖-ProVe. Exploiting a controllable the global behavior of the system [34, 35, 36], monitor- underestimation of the output reachable sets obtained ing multivariate time series generated from sensors and via statistical prediction of tolerance limits [29], the al- actuators and starting from the only knowledge of the gorithm provides a tight —with provable probabilistic nominal (i.e., anomaly-free) behavior. guarantees— lower estimate of the (un)safe areas. We recently proposed two contributions in this area. We validated DNN-Verification in realistic robotic namely, an online approach for detecting anomalous be- safety-critical scenarios. In particular, in [30], we showed haviors of robotic systems involved in complex LTA sce- that DNN-Verification can be used to rank different suc- narios (HHAD) [37], and an adversarial data augmen- cessful DNN models according to the level of safety, tation and retraining approach (HHAD-AUG) [38]. In verifying collision avoidance in robotic mapless navi- HHAD [37], we use Hidden Markov Models (HMMs) gation. We then applied a similar pipeline in a more to represent the nominal behavior of a robot. We then safety-critical domain, namely autonomous colonoscopy evaluate online the dissimilarity between the probability navigation for colorectal detection with deep RL [31] distribution of multivariate sensor time series in a sliding (Figure 5). In particular, we trained an agent to navigate window and the emission probability of the related HMM the endoscope in patient-specific colon models based on hidden states. We adopt the Hellinger distance [39] as a endoscopic images, using Constrained RL (CRL) to im- distance measure since it is bounded (thus it lends itself pose a safety cost for the agent to touch colon walls at to simpler interpretation and thresholding) and it is less the training stage. Nevertheless, due to the Lagrangian noisy, hence more informative and discriminative. relaxation implemented by CRL to perform constrained In HHAD-AUG [38], we address the usual lack (or optimization, safety may not be guaranteed. Hence, we paucity) of anomalous examples and the noise that char- adopted a model selection strategy that harnesses FV to acterizes time series of real systems. We propose a data evaluate the safety of a vast pool of trained policies to augmentation method based on perturbed (adversarial) select the one the meets all the behavioral preferences time series [40], having the advantage of not requiring specified. The results of our study are reported in Ta- any prior knowledge about the application domain and ble 1 over 300 trained models, finding 3 completely safe data conformation. We generate adversarial examples models that provably meet the safety requirements. only for nominal points, optimizing a loss function based Finally, to address the necessity of running the FV on the Hellinger distance between the observed and the process only after training due to its computational com- expected data distributions. plexity, in [32] we proposed an unconstrained DRL frame- We evaluate our data augmentation and re-training ap- work that leverages a novel sample-based method to ap- proach on several public datasets, plus one collected from proximate local violations of input-output conditions to our aquatic drones developed in the EU H2020 project foster the learning of safer behaviors inside the training INTCATCH [41]. Results show that (i) the adversarial loop. However, such conditions are typically hard-coded generation algorithms can generate meaningful adversar- and require task-level knowledge, making their applica- ial examples for HHAD, employing them to significantly tion intractable in challenging safety-critical tasks. To improve the performance of HHAD; (ii) our data augmen- this end, in [33], we introduced the Collection and Refine- tation method yields higher performance than examples ment of Online Properties (CROP) framework to collect generated by state-of-the-art augmentation methods; (iii) and refine safety properties during training. The combi- adversarial examples generated considering the Hellinger nation of CROP with approximate violation inside the distance yield higher improvement than examples gen- References [1] Z. Jan, F. Ahamed, W. Mayer, N. Patel, G. Gross- mann, M. Stumptner, A. Kuusk, Artificial intelli- gence for industry 4.0: Systematic review of ap- plications, challenges, and opportunities, Expert Systems with Applications 216 (2023) 119456. [2] M. Zuccotto, A. Castellini, D. L. Torre, L. Mola, A. Farinelli, Reinforcement learning applications in environmental sustainability: a review, Artificial Intelligence Review 57 (2024) 88. |X| [3] B. Shneiderman, Responsible ai: Bridging from Figure 6: Average F1-score for the original detector HHAD ethics to practice, Communications of the ACM 64 and augmented detectors [38]: H-AUG (ours, based on (2021) 32–35. Hellinger distance), L-AUG (ours, based on log-likelihood), R- [4] P. Mikalef, K. Conboy, J. E. Lundström, A. Popovič, AUG (random-based baseline), D-AUG (drift-based baseline), Thinking responsibly about responsible AI and ‘the G-AUG (gaussian-based baseline), and S-AUG (SMOTE-based dark side’of AI, 2022. baseline) on different training set sizes in the INTCATCH [5] I. Gabriel, Artificial intelligence, values, and align- dataset. Averages are computed over 30 datasets, for each ment, Minds and machines 30 (2020) 411–437. dataset size. [6] P. Scholl, F. Dietrich, C. Otte, S. Udluft, Safe policy improvement approaches and their limitations, in: International Conference on Agents and Artificial erated considering standard log-likelihood; (v) the low Intelligence, Springer, 2022, pp. 74–98. computational complexity and high parallelizability of [7] M. Świechowski, K. Godlewski, B. Sawicki, the proposed method allow for a fast data augmentation J. Mańdziuk, Monte carlo tree search: A review and retraining of HHAD. Figure 6 shows the results on of recent modifications and applications, Artificial the INTCATCH dataset [41]. Intelligence Review 56 (2023) 2497–2562. Finally, we have recently addressed the problem of [8] A. Castellini, F. Bianchi, E. Zorzi, T. D. Simão, explainable anomaly detection, in order to provide useful A. Farinelli, M. T. J. Spaan, Scalable safe policy information about the source of the anomaly for easier improvement via Monte Carlo tree search, in: Pro- repair. To this aim, in [42] we showed that causal dis- ceedings of the 40th International Conference on covery based on Conditional Mutual Information (CMI) Machine Learning (ICML 2023), PMLR, 2023, pp. between time series can achieve higher performance than 3732–3756. standard deep learning antomaly detectors, on a bench- [9] R. Sutton, A. Barto, Reinforcement learning, An mark robotic dataset of the Pepper service robot3 . Our introduction, 2nd ed., MIT Press, 2018. methodology evaluates the variation of CMI between [10] R. Laroche, P. Trichelair, R. Tachet Des Combes, time series, thus providing a useful hint to the root cause Safe policy improvement with baseline bootstrap- of the anomaly. Moreover, it builds a nominal model ping, in: Proceedings of the 36th International of the real physical relations between variables of the Conference on Machine Learning (ICML), PMLR, system, thus resulting in higher robustness and more 2019, pp. 3652–3661. accurate anomaly detection, compared to DNN methods [11] D. Silver, J. Veness, Monte-carlo planning in large (95% vs 90 % F1-score and 100% precision). pomdps, Advances in neural information process- ing systems 23 (2010). [12] N. Ye, A. Somani, D. Hsu, W. S. Lee, Despot: Online 6. Conclusion and future works pomdp planning with regularization, Journal of Artificial Intelligence Research 58 (2017) 231–266. Our methodologies aim at increasing transparency and [13] G. Mazzi, A. Castellini, A. Farinelli, Risk-aware safety at different development levels, from planning to shielding of Partially Observable Monte Carlo Plan- execution and verification. Our current research direc- ning policies, Artificial Intelligence 324 (2023). tion includes the online integration of symbolic learning [14] C. Barrett, R. Sebastiani, S. A. Seshia, C. Tinelli, and formal verification approaches into RL, focusing on Satisfiability modulo theories, 2021. the current scalability issues. [15] G. Mazzi, D. Meli, A. Castellini, A. Farinelli, Learn- ing logic specifications for soft policy guidance in POMCP, in: Proceedings of the 2023 International 3 https://sites.google.com/diag.uniroma1.it/robsec-data Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, IFAAMAS, 2023, pp. 373–381. [31] D. Corsi, L. Marzari, A. Pore, A. Farinelli, A. Casals, [16] D. Meli, A. Castellini, A. Farinelli, Learning logic P. Fiorini, D. Dall’Alba, Constrained reinforce- specifications for policy guidance in POMDPs: an ment learning and formal verification for safe inductive logic programming approach, Journal of colonoscopy navigation, in: 2023 IEEE/RSJ Inter- Artificial Intelligence Research 79 (2024) 725–776. national Conference on Intelligent Robots and Sys- [17] S. Muggleton, L. De Raedt, Inductive logic program- tems (IROS), IEEE, 2023, pp. 10289–10294. ming: Theory and methods, The Journal of Logic [32] E. Marchesini, L. Marzari, A. Farinelli, C. Amato, Programming 19 (1994) 629–679. Safe deep reinforcement learning by verifying task- [18] S. C. Tran, E. Pontelli, M. Balduccini, T. Schaub, level properties, AAMAS ’23, International Foun- Answer set planning: a survey, Theory and Practice dation for Autonomous Agents and Multiagent Sys- of Logic Programming 23 (2023) 226–298. tems, 2023, p. 1466–1475. [19] D. Meli, H. Nakawala, P. Fiorini, Logic program- [33] L. Marzari, E. Marchesini, A. Farinelli, Online safety ming for deliberative robotic task planning, Artifi- property collection and refinement for safe deep cial Intelligence Review 56 (2023) 9011–9049. reinforcement learning in mapless navigation, in: [20] C. Veronese, D. Meli, F. Bistaffa, M. Rodríguez-Soto, 2023 IEEE International Conference on Robotics A. Farinelli, J. A. Rodríguez-Aguilar, Inductive logic and Automation (ICRA), IEEE, 2023, pp. 7133–7139. programming for transparent alignment with mul- [34] A. Castellini, M. Bicego, F. Masillo, M. Zuccotto, tiple moral values, in: CEUR WORKSHOP PRO- A. Farinelli, Time series segmentation for state- CEEDINGS, 2023, pp. 84–88. model generation of autonomous aquatic drones: A [21] F. Trotti, A. Farinelli, R. Muradore, An online path systematic framework, Engineering Applications planner based on pomdp for uavs, in: 2023 Euro- of Artificial Intelligence 90 (2020). pean Control Conference (ECC), IEEE, 2023. [35] A. Castellini, M. Bicego, D. Bloisi, J. Blum, F. Masillo, [22] A. Isidori, Nonlinear control systems II, Springer, S. Peignier, A. Farinelli, Subspace clustering for 2013. situation assessment in aquatic drones: a sensitivity [23] H. Khalil, Nonlinear Systems, Prentice Hall, 2002. analysis for state-model improvement, Cybernetics [24] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er- and Systems 50 (2019) 658–671. han, I. Goodfellow, R. Fergus, Intriguing properties [36] A. Castellini, F. Masillo, M. Bicego, D. Bloisi, J. Blum, of neural networks, arXiv:1312.6199 (2013). A. Farinelli, Subspace clustering for situation assess- [25] C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett, ment in aquatic drones, in: Proc. 33th ACM/SIGAPP M. J. Kochenderfer, et al., Algorithms for verifying Symposium on Applied Computing, SAC, 2019, pp. deep neural networks, Foundations and Trends® 930–937. in Optimization 4 (2021) 244–404. [37] D. Azzalini, A. Castellini, M. Luperto, A. Farinelli, [26] G. Katz, C. Barrett, D. L. Dill, K. Julian, M. J. Kochen- F. Amigoni, et al., HMMs for anomaly detection derfer, Reluplex: An efficient smt solver for veri- in autonomous robots, in: Proc. AAMAS, 2020, p. fying deep neural networks, in: International con- 105–113. ference on computer aided verification, Springer, [38] A. Castellini, F. Masillo, D. Azzalini, F. Amigoni, 2017, pp. 97–117. A. Farinelli, Adversarial data augmentation for [27] L. Marzari, D. Corsi, F. Cicalese, A. Farinelli, The hmm-based anomaly detection, IEEE Transactions #DNN-Verification Problem: Counting Unsafe In- on Pattern Analysis and Machine Intelligence 45 puts for Deep Neural Networks, in: International (2023) 14131–14143. Joint Conference on Artificial Intelligence (IJCAI), [39] E. Hellinger, Neue begründung der theorie 2023, pp. 217–224. quadratischer formen von unendlichvielen verän- [28] L. Marzari, D. Corsi, E. Marchesini, A. Farinelli, derlichen, Journal für die reine und angewandte F. Cicalese, Enumerating safe regions in deep neural Mathematik 136 (1909) 210–271. networks with provable probabilistic guarantees, in: [40] F. Karim, S. Majumdar, H. Darabi, Adversarial at- Proceedings of the AAAI Conference on Artificial tacks on time series, IEEE Trans Pattern Anal Mach Intelligence, volume 38, 2024, pp. 21387–21394. Intell 43 (2020) 3309–3320. [29] S. S. Wilks, Statistical prediction with special refer- [41] A. Castellini, D. Bloisi, J. Blum, F. Masillo, ence to the problem of tolerance limits, The annals A. Farinelli, Multivariate sensor signals collected of mathematical statistics 13 (1942) 400–409. by aquatic drones involved in water monitoring: A [30] G. Amir, D. Corsi, R. Yerushalmi, L. Marzari, complete dataset, Data Brief 30 (2020) 105436. D. Harel, A. Farinelli, G. Katz, Verifying learning- [42] D. Meli, Explainable online unsupervised anomaly based robotic navigation systems, in: 29th Inter- detection for cyber-physical systems via causal dis- national Conference TACAS, Springer, 2023, pp. covery from time series, arXiv:2404.09871 (2024). 607–627.