Using Adaptive Stress Testing to Identify Paths to Ethical Dilemmas in Autonomous Systems Ann-Katrin Reuel1 , Mark Koren2 , Anthony Corso2 , Mykel J. Kochenderfer2 1 University of Pennsylvania, School of Engineering and Applied Sciences, Philadelphia, PA 19104 akreuel@seas.upenn.edu 2 Stanford University, School of Engineering, Stanford, CA 94305 mark.c.koren21@gmail.com, acorso@stanford.edu, mykel@stanford.edu Abstract adaptive stress testing (AST) (Lee et al. 2020), a framework based on reinforcement learning (RL), to explicitly identify During operation, autonomous agents may find themselves making decisions which have ethical ramifications. In this the most likely paths to ethical dilemmas. This could open position paper, we look at one aspect of these situations: eth- new ways for agents to avoid such dilemmas in the first ical dilemmas. We first define them as situations in which an place. We further suggest a pedestrian simulator example to autonomous agent can only choose from actions that violate validate this idea. one or more previously given ethical principle. Subsequently, we suggest to use adaptive stress testing, a framework based Background on reinforcement learning, as one way to uncover situations where an autonomous system gets into an ethical dilemma. Moral programming and ethical decision making in par- Using an example from the autonomous driving domain, we ticular have become major areas of interest in the field of propose a simulator setup, define a context-specific ethical AI safety (Wernaart 2021; Aliman and Kester 2019). Con- dilemma, and suggest how adaptive stress testing can be ap- sidering autonomous systems, this topic is still a relatively plied to find the most likely path to an ethical dilemma. under-explored area in machine learning with many chal- lenges. One such challenge is that how to make an ethical Introduction decision is a disputed subject. There are different ethical theories which might lead to contrasting answers to the Safety-critical autonomous systems, such as autonomous question which action is the morally correct one to take. For vehicles, are increasingly operating within society. Just example, utilitarianism seeks to maximize human welfare like human beings, autonomous agents might encounter (Bentham and Mill 2004). In this context, actions are judged situations where there’s no clear ethical course of action. based on their ability to maximize the expected overall Rather, a decision between multiple unethical actions has utility of their immediate consequences. For example, the to be made – this is what we call an ethical dilemma. cost of one human life would be outweighed by the cost Ethical decision making for autonomous agents is already of many lives in this school of thought. On the other hand, complicated by questions such as whose values to consider there are contractualist deontological ethics. Here, actions and how to aggregate them in a way that can be used by are preferred which individuals in a social construct could the agent (Russell 2019). However, ethical dilemmas give not reasonably reject (Scanlon 2003), i.e. actions which rise to a further complication: How do we choose among conform to moral norms (Davis 1993; Geisslinger et al. unethical options? How should we prioritize the ethical 2021). While such imperatives seem too unspecified to be principles specified, to make an explicable decision among adapted in an autonomous system, efforts have been made to these options? We contend, however, that there is no ethical translate these ideas in a way that machines can work with, way for an agent to choose among unethical options. After e.g. by the Three Laws of Robotics (Asimov 1950). While all, such dilemmas exist because even humans cannot agree these rule-based ethics have the potential to be used in a on an unambiguously correct path of action. Instead, we machine-context due to their structured approach (Powers propose that autonomous agents should explicitly reason in 2006), some authors have argued that context-specific a way to prevent ending up in an ethical dilemma in the first information isn’t taken into account sufficiently, potentially place. causing an autonomous agent to undertake risky behavior to adhere to a strict set of rules (Loh 2017; Goodall 2016). An- In this position paper, we first define ethical dilemmas as other challenge with regards to autonomous agents making situations in which an autonomous agent can only choose ethical decisions is the question of how ethically-aligned from actions that violate one or more previously given eth- behavior can be implemented in a machine. This becomes ical principle. Subsequently, we suggest the application of especially challenging in real-world, culture-dependent Copyright © 2022 for this paper by its authors. Use permitted under settings (Awad et al. 2018) due to their inherent complexity, Creative Commons License Attribution 4.0 International (CC BY involving correlations which aren’t sufficiently depicted by 4.0). simplified ethical theories. Despite these challenges, work has been done to im- plement ethical decision making in autonomous systems. Conitzer et al. (2017) discuss moral decision making frameworks for autonomous agents on a high level. They argue that systems based on ad-hoc rules are insufficient and that a more general framework is needed. The authors compare game theoretic formalism approaches to classical supervised machine learning methods which are based on a labeled ethical decision data set. Conitzer et al. (2017) find that, while the former can take into account multi-agent decisions, the basic representation schemes would need to be extended to work as an ethical decision framework. On Figure 1: Simplified adaptive stress testing framework show- the other hand, they argue that supervised learning could ing its core components (Lee et al. 2020). help in making human-like ethical decisions. The major issue here is that ethical decision situations tend to take place in fairly complex statistical contexts, often involving tonomous systems, to identify the most likely path to an eth- multiple human and non-human agents who do not always ical dilemma (for an overview of alternative approaches to act rationally (Hadfield-Menell et al. 2016). Hence, ethical find failures in autonomous systems, please refer to Corso decision situations are rarely comparable as even changing et al. (2020)). This information could subsequently be used one parameter would often lead – from a human perspective to prevent the agent from arriving in an ethical dilemma in – to a completely new evaluation of the situation. the first place. Additional work to acquire and use human preferences Approach in ethical decisions was conducted by Christiano et al. Adaptive Stress Testing is a framework that is used in (2017). The authors used deep inverse RL ((Ng, Russell safety-critical systems like aircraft collision avoidance sys- et al. 2000)), i.e. they involved humans in the agent’s tems to find the most likely path to a failure event. Instead learning process by giving the human repeatedly short of defining failure events as critical system failures such as snippets of situations which she should order according to aircraft collisions, though, we define them in this position her preferences. The agent would use this information to paper as reaching a state in which the agent is in an ethical refine its reward function, allowing it to iteratively adjust dilemma. We want to highlight that we specifically don’t the function to the human’s preferences. This approach define an unethical action taken by the agent as failure but could be used in ethical decision making, too, by showing rather situations in which the agent can only make unethical humans two outcomes of an ethical decision which they decisions. This way, the issue of deciding for a course of should order with regards to their desirability, analogous to action in an ethical dilemma can be circumvented, because the Moral Machines approach (Awad et al. 2018). A similar the mere necessity for such a decision would qualify as a idea was proposed by Abel, MacGlashan, and Littman failure in our approach. (2016) who came to the conclusion that RL can be used to generalize moral values in a way that can be implemented in machines. However, there are multiple issues with these We first define ethical failures. We subsequently suggest a approaches: Firstly, one would need to select a balanced setup for our approach using a variation of the trolley prob- group of people who contribute to the ethical learning lem which will be relevant in the context of autonomous process of the agent to ensure that the moral judgement vehicles. The trolley problem, first proposed by Thomson learned is representative of a larger population. Secondly, (1976), is a standard ethical dilemma considered in the liter- given the necessary constant involvement of humans in the ature where an autonomous agent has multiple options in a learning process, this approach scales poorly. In addition driving decision situation which all lead to fatal collisions. to these shortcomings, none of the approaches discussed Defining Ethical Failures allows for the satisfactory resolution of ethical dilemmas, especially when human feedback is necessary, since such Based on the work by Dennis et al. (2016), we consider a dilemmas aren’t solvable by human beings per definition. set of abstract ethical principles Φ, with ϕ1 , ϕ2 , ..., ϕn corre- Hence, it is unlikely that they can teach an agent what to do sponding to single abstract ethical principles such as ”Don’t in such situations. harm humans.”: Φ = {ϕ1 , ϕ2 , ..., ϕn } Due to these issues, we argue that approaches to prevent ethical dilemmas need to be studied, instead of trying to re- To transform these abstract principles into situation- solve ethical decision situations when a clear moral action specific ethical rules Γ = γ1 , γ2 , ..., γn , case-based reason- is not present. This position paper is the first to propose the ing is applied, as shown by (Anderson and Anderson 2007), use of such an approach: We suggest to apply AST, an RL- which allows for a context-specific instantiating of the re- based framework by Lee et al. (2020) to find failures in au- spective rules. A context, in our case, ”informs an agent of what counts as a violation of the laws and principles by which the context is governed” (Dennis et al. 2016). An action is defined as unethical if it violates one or more of the ethical rules in Γ in a given context c. This establish- ment of ethical rules follows the deontologic ethics approach (see Grossi, Meyer, and Dignum (2005) for more informa- Figure 2: Example initial setup for simulator. The red circles tion). Given these prerequisites, we define what an ethical depict pedestrians while the green boxes show immobile ob- dilemma is. To simplify, we assume that the defined ethical stacles. principles in set Φ – and all ethical rules Γ derived from prin- ciples in Φ – are equally important. Now, in a given context c, we have a set of actions A available to the agent: Simulation Design As a first step to show that AST can be used to identify paths to ethical dilemmas, we propose Ac = a1 , a2 , ..., an a toy problem in an autonomous vehicle simulator. We use If all of these actions violate one or more ethical rules the following specifications to propose a scenario which in- in the set Γ and hence in the principle set Φ, there is per cludes a version of the trolley problem (overall structure and definition no ethical option available to the agent. The agent core components modelled based on Koren et al. (2018)): finds itself in an ethical dilemma. 1. Environment: We propose to use a simplified environ- Applying Adaptive Stress Testing ment where an autonomous vehicle drives on a one-lane The evaluation of failure events has been extensively stud- street. On the sidewalk on each side of the street are ied in safety-critical applications such as aircraft collision both immobile obstacles as well as a variable number of systems. One approach taken in this field is AST: Lee pedestrians who are free to move in any direction, in- et al. (2020) were interested in finding the most likely path cluding past obstacles and across the street (see Figure to failure events in “complex stochastic environments” 2). They can be described by their velocity (v̂x(i) , v̂y (i) ) (Lee et al. 2020) to understand how an agent arrives at and position (x̂(i) , ŷ (i) ), both relative to the system under a failure and hence prevent that failure path from being test (see below). The positions of the obstacles should be taken in the first place. Essentially, the authors followed fixed while the pedestrians’ movement is controlled by a simulation-based approach where the knowledge of the AST. h i system under test wasn’t necessary. They formulated the (1) (2) (n) The simulation state ssim = ssim , ssim , . . . , ssim con- problem as a sequential Markov Decision Process (MDP) (i) in both fully and partially observable environments with hsists of the states ofi each pedestrian i, with ssim = stochastic disturbances. Subsequently, they let an agent try (i) v̂x(i) , v̂y , x̂(i) , ŷ (i) . For more details on the simula- to maximize a reward function in this environment which rewards it for what is defined as failure. tion of pedestrian movement, please refer to Koren et al. (2018). In AST, there are four main components (see Figure 2. System under Test: We propose to use the Intelligent 1): the simulator, the system under test, the environment, Driver Model (IDM) (Treiber, Hennecke, and Helbing and the reinforcement learner. The reinforcement learner 2000) as our system under test. The IDM is programmed chooses a stochastic disturbance x to change the simulation to stay in lane and drive in compliance with the rules of in order to create failures. In return, it receives the simulator traffic. Its base speed is fixed at 35mph, i.e. the standard state s as well as the reward r. Using RL, the most likely speed on most city streets. At each step, the system under path to a failure event can then be found by maximizing the test would receive a set of observations with the states of reward. The framework operates in a black-box setting and the pedestrians as well as the positions of the immobile a multiple-step simulation of the situation which can lead to obstacles. It would then choose an action based on these a failure is required. Furthermore, simulation control func- information which is then used to update the vehicle’s tions need to be provided to the solver to allow for stochastic state. disturbances of the environment. The sampling which is sub- sequently performed by the framework is adapted based on 3. Solver: The exploration of the state space is dependent a Monte Carlo tree search (MCTS), allowing for a best-first on the solver specifications. For additional details on the exploration of the search space. This leads to the following MCTS solver we propose to use, please refer to Lee et al. formal problem (Koren, Corso, and Kochenderfer 2020): (2020). The solver should be able to interact with the simulator by resetting the simulator to its initial state, maximize P (s0 , a0 , . . . , st , at ) by drawing the next state s′ after an action a was taken, a0 ,...,at subject to st ∈ E and by evaluating whether a terminal state (an ethical dilemma or the end of the time horizon) has been found. with S being the simulator, E the event space, P (s0 , a0 , ..., st , at ) the probability of a trajectory in 4. Reward Function: Compared to the original reward func- simulator S and st = f (at , st−1 ). tion by Lee et al. (2015), we suggest to use a modified version as implemented by Koren et al. (2018): The corresponding action space is A = {ao , ap } No matter which action the agent would choose, he would violate either γp (by harming a pedestrian) or γo Figure 3: Example ethical dilemma. A pedestrian moves in (by harming its occupants) and as a consequence also ϕh , front of the vehicle, leaving it with the option to crash into i.e. to cause no harm. Hence, neither option can be clearly the pedestrian, a pedestrian on the left-hand side, or an ob- identified as ethical and the agent ends up in a dilemma. stacle on the right-hand side. As per the original AST framework, instead of receiving a negative reward for a failure event, the agent would receive a positive reward for these situations to encourage finding 0 s∈E ( paths to ethical dilemmas. R(s) = −α − β × DIST (pv, pp) s ∈ / E, t ≥ T − log (1 + M (a, µa | s)) s ∈ / E, t < T The goal of the AST framework is then to maximize this where DIST (pv, pp) would be the distance between the reward by disturbing the pedestrian movement and creating closest pedestrian and the system under test, while the failure states in which it receives the highest reward. This ap- Mahalanobis distance could be used as a proxy for the proach results in the most likely path to an ethical dilemma probability of an action. See Koren et al. (2018) for more – an information which could subsequently be used to pre- details. This reward function covers three cases: a) find- vent this path from being taken, decreasing the likelihood of ing an ethical dilemma, which gives the highest reward, ending up in such a dilemma in the first place. b) finding no dilemma and reaching the time horizon, which gives the lowest reward (by choosing high α and Future Research Directions β values), and c) finding no dilemma but the agent still Identifying ethical dilemmas using AST comes with chal- operates within the specified time horizon T. lenges that need to be addressed in future work. Firstly, it depends on the availability of a simulator which sufficiently Ethical Dilemmas As Failure Events The key idea is now depicts an ethical decision situation. Secondly, the defined to define our event of interest, i.e. the failure event, not as a ethical principles need to be specific enough so that the agent collision (as in Koren et al. (2018)) but as a decision situa- can evaluate its available actions with regards to these prin- tion in which the agent finds itself in an ethical dilemma. ciples. Furthermore, the ethical principles should be defined One example for the subset of the state space we’re inter- such that the majority of potentially affected people agrees ested in in our simulator are settings in which the path of the with them, which has been an open issue in research (Gabriel system under test is blocked on both the left- and right-hand 2020). Also, while AST can find the most likely path to a side, either by a pedestrian or an obstacle, while a pedes- failure event, it might be the case that all possible paths re- trian appears in close proximity in front of the vehicle (see sult in an ethical dilemma, i.e. that it cannot be prevented. Figure 3). We assume that a crash with an obstacle would For these cases, other strategies to prevent or deal with ethi- severely injure the passengers of the system under test while cal dilemmas need to be employed, which are still an unre- a crash with a pedestrian would severely injure the pedes- solved question in the field. Another limitation of the AST trian. We further assume that the agent would be given the framework that has to be considered is that the downstream ethical principle effect of immediate actions taken by the agent isn’t part of ϕh = do no harm the analysis. Despite these open questions, our next step will which could be translated into the context-specific ethical be to implement the proposed setup for an empirical proof rules of the approach. This could then be extended to show how γp = do not harm pedestrians the information of a path to an ethical dilemma can be used to prevent that path from being taken in the first place. While γo = do not harm occupants not a one-size-fits-all framework to deal with ethical dilem- Note that our system does not require any weighting to be mas in autonomous systems, AST can be used as part of a given on harming an occupant vs. harming a pedestrian. It is larger strategy to deal with such decision situations. sufficient to say that a violation of either is a violation of the directive to do no harm to a human. Confronted with the sit- Conclusions uation described above, the autonomous agent identifies the In this position paper, we showed how ethical failures can following available actions (planning and identifying avail- be defined and subsequently used as failure events in the able actions is not part of this paper; please refer to Tulum, AST framework. This constitutes a novel approach in deal- Durak, and Yder (2009) or Coles et al. (2010) for further ing with ethical dilemmas in autonomous decision systems: information): Instead of solving them, we suggest to circumvent ethical • Option ao : Crash into an obstacle, likely causing harm dilemmas in the first place by identifying the most likely to the agent’s occupants. path to such a failure event. As a next step, we propose • Option ap : Crash into a pedestrian, likely causing harm the implementation of the suggested simulator as a proof- to the the pedestrian and potentially the agent’s occu- of-concept. Long-term, this approach could be part of more pants. comprehensive efforts to create ethical autonomous systems. References Koren, M.; Corso, A.; and Kochenderfer, M. J. 2020. Abel, D.; MacGlashan, J.; and Littman, M. L. 2016. Rein- The adaptive stress testing formulation. arXiv preprint forcement learning as a framework for ethical decision mak- arXiv:2004.04293. ing. In Workshops at the thirtieth AAAI conference on artifi- Lee, R.; Kochenderfer, M. J.; Mengshoel, O. J.; Brat, G. P.; cial intelligence. and Owen, M. P. 2015. Adaptive stress testing of airborne Aliman, N.-M.; and Kester, L. 2019. Transformative AI gov- collision avoidance systems. In 2015 IEEE/AIAA 34th Dig- ernance and AI-Empowered ethical enhancement through ital Avionics Systems Conference (DASC), 6C2–1. IEEE. preemptive simulations. Delphi, 2: 23. Lee, R.; Mengshoel, O. J.; Saksena, A.; Gardner, R. W.; Anderson, M.; and Anderson, S. L. 2007. Machine ethics: Genin, D.; Silbermann, J.; Owen, M.; and Kochenderfer, Creating an ethical intelligent agent. AI magazine, 28(4): M. J. 2020. Adaptive stress testing: Finding likely failure 15–15. events with reinforcement learning. Journal of Artificial In- Asimov, I. 1950. I, Robot. Fawcett Publications. telligence Research, 69: 1165–1201. Awad, E.; Dsouza, S.; Kim, R.; Schulz, J.; Henrich, J.; Shar- Loh, J. 2017. Roboterethik. Über eine noch junge Bereich- iff, A.; Bonnefon, J.-F.; and Rahwan, I. 2018. The moral sethik. Information Philosophie, 20–33. machine experiment. Nature, 563(7729): 59–64. Ng, A. Y.; Russell, S. J.; et al. 2000. Algorithms for inverse Bentham, J.; and Mill, J. S. 2004. Utilitarianism and other reinforcement learning. In Icml, volume 1, 2. essays. Penguin UK. Powers, T. M. 2006. Prospects for a Kantian machine. IEEE Christiano, P.; Leike, J.; Brown, T. B.; Martic, M.; Legg, S.; Intelligent Systems, 21(4): 46–51. and Amodei, D. 2017. Deep reinforcement learning from Russell, S. 2019. Human compatible: Artificial intelligence human preferences. arXiv preprint arXiv:1706.03741. and the problem of control. Penguin. Coles, A.; Coles, A.; Fox, M.; and Long, D. 2010. Forward- Scanlon, T. M. 2003. The difficulty of tolerance: Essays in chaining partial-order planning. In Proceedings of the Inter- political philosophy. Cambridge University Press. national Conference on Automated Planning and Schedul- Thomson, J. J. 1976. Killing, letting die, and the trolley ing, volume 20. problem. The Monist, 59(2): 204–217. Conitzer, V.; Sinnott-Armstrong, W.; Borg, J. S.; Deng, Y.; Treiber, M.; Hennecke, A.; and Helbing, D. 2000. Con- and Kramer, M. 2017. Moral decision making frameworks gested traffic states in empirical observations and micro- for artificial intelligence. In Thirty-first aaai conference on scopic simulations. Physical review E, 62(2): 1805. artificial intelligence. Tulum, K.; Durak, U.; and Yder, S. K. 2009. Situation aware Corso, A.; Moss, R. J.; Koren, M.; Lee, R.; and Kochender- UAV mission route planning. In 2009 IEEE Aerospace con- fer, M. J. 2020. A survey of algorithms for black-box safety ference, 1–12. IEEE. validation. arXiv preprint arXiv:2005.02979. Wernaart, B. 2021. Developing a roadmap for the moral Davis, N. 1993. Contemporary Deontology. In Singer, P., programming of smart technology. Technology in Society, ed., A Companion to Ethics. John Wiley & Sons. 64: 101466. Dennis, L.; Fisher, M.; Slavkovik, M.; and Webster, M. 2016. Formal verification of ethical choices in autonomous systems. Robotics and Autonomous Systems, 77: 1–14. Gabriel, I. 2020. Artificial intelligence, values, and align- ment. Minds and machines, 30(3): 411–437. Geisslinger, M.; Poszler, F.; Betz, J.; Lütge, C.; and Lienkamp, M. 2021. Autonomous driving ethics: From Trol- ley problem to ethics of risk. Philosophy & Technology, 1– 23. Goodall, N. J. 2016. Away from trolley problems and toward risk management. Applied Artificial Intelligence, 30(8): 810–821. Grossi, D.; Meyer, J.-J. C.; and Dignum, F. 2005. Modal logic investigations in the semantics of counts-as. In Pro- ceedings of the 10th international conference on Artificial intelligence and law, 1–9. Hadfield-Menell, D.; Russell, S. J.; Abbeel, P.; and Dragan, A. 2016. Cooperative inverse reinforcement learning. Ad- vances in neural information processing systems, 29: 3909– 3917. Koren, M.; Alsaif, S.; Lee, R.; and Kochenderfer, M. J. 2018. Adaptive stress testing for autonomous vehicles. In 2018 IEEE Intelligent Vehicles Symposium (IV), 1–7. IEEE.