Using Adaptive Stress Testing to Identify Paths to Ethical Dilemmas in
                                  Autonomous Systems
                                            Ann-Katrin Reuel1 , Mark Koren2 ,
                                         Anthony Corso2 , Mykel J. Kochenderfer2
               1
                   University of Pennsylvania, School of Engineering and Applied Sciences, Philadelphia, PA 19104
                                                       akreuel@seas.upenn.edu
                                  2
                                    Stanford University, School of Engineering, Stanford, CA 94305
                              mark.c.koren21@gmail.com, acorso@stanford.edu, mykel@stanford.edu
                            Abstract                                  adaptive stress testing (AST) (Lee et al. 2020), a framework
                                                                      based on reinforcement learning (RL), to explicitly identify
  During operation, autonomous agents may find themselves
  making decisions which have ethical ramifications. In this          the most likely paths to ethical dilemmas. This could open
  position paper, we look at one aspect of these situations: eth-     new ways for agents to avoid such dilemmas in the first
  ical dilemmas. We first define them as situations in which an       place. We further suggest a pedestrian simulator example to
  autonomous agent can only choose from actions that violate          validate this idea.
  one or more previously given ethical principle. Subsequently,
  we suggest to use adaptive stress testing, a framework based                              Background
  on reinforcement learning, as one way to uncover situations
  where an autonomous system gets into an ethical dilemma.            Moral programming and ethical decision making in par-
  Using an example from the autonomous driving domain, we             ticular have become major areas of interest in the field of
  propose a simulator setup, define a context-specific ethical        AI safety (Wernaart 2021; Aliman and Kester 2019). Con-
  dilemma, and suggest how adaptive stress testing can be ap-         sidering autonomous systems, this topic is still a relatively
  plied to find the most likely path to an ethical dilemma.           under-explored area in machine learning with many chal-
                                                                      lenges. One such challenge is that how to make an ethical
                         Introduction                                 decision is a disputed subject. There are different ethical
                                                                      theories which might lead to contrasting answers to the
Safety-critical autonomous systems, such as autonomous                question which action is the morally correct one to take. For
vehicles, are increasingly operating within society. Just             example, utilitarianism seeks to maximize human welfare
like human beings, autonomous agents might encounter                  (Bentham and Mill 2004). In this context, actions are judged
situations where there’s no clear ethical course of action.           based on their ability to maximize the expected overall
Rather, a decision between multiple unethical actions has             utility of their immediate consequences. For example, the
to be made – this is what we call an ethical dilemma.                 cost of one human life would be outweighed by the cost
Ethical decision making for autonomous agents is already              of many lives in this school of thought. On the other hand,
complicated by questions such as whose values to consider             there are contractualist deontological ethics. Here, actions
and how to aggregate them in a way that can be used by                are preferred which individuals in a social construct could
the agent (Russell 2019). However, ethical dilemmas give              not reasonably reject (Scanlon 2003), i.e. actions which
rise to a further complication: How do we choose among                conform to moral norms (Davis 1993; Geisslinger et al.
unethical options? How should we prioritize the ethical               2021). While such imperatives seem too unspecified to be
principles specified, to make an explicable decision among            adapted in an autonomous system, efforts have been made to
these options? We contend, however, that there is no ethical          translate these ideas in a way that machines can work with,
way for an agent to choose among unethical options. After             e.g. by the Three Laws of Robotics (Asimov 1950). While
all, such dilemmas exist because even humans cannot agree             these rule-based ethics have the potential to be used in a
on an unambiguously correct path of action. Instead, we               machine-context due to their structured approach (Powers
propose that autonomous agents should explicitly reason in            2006), some authors have argued that context-specific
a way to prevent ending up in an ethical dilemma in the first         information isn’t taken into account sufficiently, potentially
place.                                                                causing an autonomous agent to undertake risky behavior to
                                                                      adhere to a strict set of rules (Loh 2017; Goodall 2016). An-
   In this position paper, we first define ethical dilemmas as        other challenge with regards to autonomous agents making
situations in which an autonomous agent can only choose               ethical decisions is the question of how ethically-aligned
from actions that violate one or more previously given eth-           behavior can be implemented in a machine. This becomes
ical principle. Subsequently, we suggest the application of           especially challenging in real-world, culture-dependent
Copyright © 2022 for this paper by its authors. Use permitted under   settings (Awad et al. 2018) due to their inherent complexity,
Creative Commons License Attribution 4.0 International (CC BY         involving correlations which aren’t sufficiently depicted by
4.0).                                                                 simplified ethical theories.
   Despite these challenges, work has been done to im-
plement ethical decision making in autonomous systems.
Conitzer et al. (2017) discuss moral decision making
frameworks for autonomous agents on a high level. They
argue that systems based on ad-hoc rules are insufficient
and that a more general framework is needed. The authors
compare game theoretic formalism approaches to classical
supervised machine learning methods which are based on
a labeled ethical decision data set. Conitzer et al. (2017)
find that, while the former can take into account multi-agent
decisions, the basic representation schemes would need to
be extended to work as an ethical decision framework. On          Figure 1: Simplified adaptive stress testing framework show-
the other hand, they argue that supervised learning could         ing its core components (Lee et al. 2020).
help in making human-like ethical decisions. The major
issue here is that ethical decision situations tend to take
place in fairly complex statistical contexts, often involving     tonomous systems, to identify the most likely path to an eth-
multiple human and non-human agents who do not always             ical dilemma (for an overview of alternative approaches to
act rationally (Hadfield-Menell et al. 2016). Hence, ethical      find failures in autonomous systems, please refer to Corso
decision situations are rarely comparable as even changing        et al. (2020)). This information could subsequently be used
one parameter would often lead – from a human perspective         to prevent the agent from arriving in an ethical dilemma in
– to a completely new evaluation of the situation.                the first place.

   Additional work to acquire and use human preferences                                     Approach
in ethical decisions was conducted by Christiano et al.
                                                                  Adaptive Stress Testing is a framework that is used in
(2017). The authors used deep inverse RL ((Ng, Russell
                                                                  safety-critical systems like aircraft collision avoidance sys-
et al. 2000)), i.e. they involved humans in the agent’s
                                                                  tems to find the most likely path to a failure event. Instead
learning process by giving the human repeatedly short
                                                                  of defining failure events as critical system failures such as
snippets of situations which she should order according to
                                                                  aircraft collisions, though, we define them in this position
her preferences. The agent would use this information to
                                                                  paper as reaching a state in which the agent is in an ethical
refine its reward function, allowing it to iteratively adjust
                                                                  dilemma. We want to highlight that we specifically don’t
the function to the human’s preferences. This approach
                                                                  define an unethical action taken by the agent as failure but
could be used in ethical decision making, too, by showing
                                                                  rather situations in which the agent can only make unethical
humans two outcomes of an ethical decision which they
                                                                  decisions. This way, the issue of deciding for a course of
should order with regards to their desirability, analogous to
                                                                  action in an ethical dilemma can be circumvented, because
the Moral Machines approach (Awad et al. 2018). A similar
                                                                  the mere necessity for such a decision would qualify as a
idea was proposed by Abel, MacGlashan, and Littman
                                                                  failure in our approach.
(2016) who came to the conclusion that RL can be used to
generalize moral values in a way that can be implemented
in machines. However, there are multiple issues with these           We first define ethical failures. We subsequently suggest a
approaches: Firstly, one would need to select a balanced          setup for our approach using a variation of the trolley prob-
group of people who contribute to the ethical learning            lem which will be relevant in the context of autonomous
process of the agent to ensure that the moral judgement           vehicles. The trolley problem, first proposed by Thomson
learned is representative of a larger population. Secondly,       (1976), is a standard ethical dilemma considered in the liter-
given the necessary constant involvement of humans in the         ature where an autonomous agent has multiple options in a
learning process, this approach scales poorly. In addition        driving decision situation which all lead to fatal collisions.
to these shortcomings, none of the approaches discussed
                                                                  Defining Ethical Failures
allows for the satisfactory resolution of ethical dilemmas,
especially when human feedback is necessary, since such           Based on the work by Dennis et al. (2016), we consider a
dilemmas aren’t solvable by human beings per definition.          set of abstract ethical principles Φ, with ϕ1 , ϕ2 , ..., ϕn corre-
Hence, it is unlikely that they can teach an agent what to do     sponding to single abstract ethical principles such as ”Don’t
in such situations.                                               harm humans.”:
                                                                                       Φ = {ϕ1 , ϕ2 , ..., ϕn }
   Due to these issues, we argue that approaches to prevent
ethical dilemmas need to be studied, instead of trying to re-       To transform these abstract principles into situation-
solve ethical decision situations when a clear moral action       specific ethical rules Γ = γ1 , γ2 , ..., γn , case-based reason-
is not present. This position paper is the first to propose the   ing is applied, as shown by (Anderson and Anderson 2007),
use of such an approach: We suggest to apply AST, an RL-          which allows for a context-specific instantiating of the re-
based framework by Lee et al. (2020) to find failures in au-      spective rules. A context, in our case, ”informs an agent of
what counts as a violation of the laws and principles by
which the context is governed” (Dennis et al. 2016). An
action is defined as unethical if it violates one or more of
the ethical rules in Γ in a given context c. This establish-
ment of ethical rules follows the deontologic ethics approach
(see Grossi, Meyer, and Dignum (2005) for more informa-             Figure 2: Example initial setup for simulator. The red circles
tion). Given these prerequisites, we define what an ethical         depict pedestrians while the green boxes show immobile ob-
dilemma is. To simplify, we assume that the defined ethical         stacles.
principles in set Φ – and all ethical rules Γ derived from prin-
ciples in Φ – are equally important. Now, in a given context
c, we have a set of actions A available to the agent:
                                                                    Simulation Design As a first step to show that AST can
                                                                    be used to identify paths to ethical dilemmas, we propose
                       Ac = a1 , a2 , ..., an
                                                                    a toy problem in an autonomous vehicle simulator. We use
   If all of these actions violate one or more ethical rules        the following specifications to propose a scenario which in-
in the set Γ and hence in the principle set Φ, there is per         cludes a version of the trolley problem (overall structure and
definition no ethical option available to the agent. The agent      core components modelled based on Koren et al. (2018)):
finds itself in an ethical dilemma.
                                                                    1. Environment: We propose to use a simplified environ-
Applying Adaptive Stress Testing                                       ment where an autonomous vehicle drives on a one-lane
The evaluation of failure events has been extensively stud-            street. On the sidewalk on each side of the street are
ied in safety-critical applications such as aircraft collision         both immobile obstacles as well as a variable number of
systems. One approach taken in this field is AST: Lee                  pedestrians who are free to move in any direction, in-
et al. (2020) were interested in finding the most likely path          cluding past obstacles and across the street (see Figure
to failure events in “complex stochastic environments”                 2). They can be described by their velocity (v̂x(i) , v̂y (i) )
(Lee et al. 2020) to understand how an agent arrives at                and position (x̂(i) , ŷ (i) ), both relative to the system under
a failure and hence prevent that failure path from being               test (see below). The positions of the obstacles should be
taken in the first place. Essentially, the authors followed            fixed while the pedestrians’ movement is controlled by
a simulation-based approach where the knowledge of the                 AST.                                 h                       i
system under test wasn’t necessary. They formulated the                                                        (1)   (2)        (n)
                                                                       The simulation state ssim = ssim , ssim , . . . , ssim con-
problem as a sequential Markov Decision Process (MDP)
                                                                                                                                 (i)
in both fully and partially observable environments with               hsists of the states ofi each pedestrian i, with ssim =
stochastic disturbances. Subsequently, they let an agent try                         (i)
                                                                          v̂x(i) , v̂y , x̂(i) , ŷ (i) . For more details on the simula-
to maximize a reward function in this environment which
rewards it for what is defined as failure.                              tion of pedestrian movement, please refer to Koren et al.
                                                                        (2018).
   In AST, there are four main components (see Figure               2. System under Test: We propose to use the Intelligent
1): the simulator, the system under test, the environment,             Driver Model (IDM) (Treiber, Hennecke, and Helbing
and the reinforcement learner. The reinforcement learner               2000) as our system under test. The IDM is programmed
chooses a stochastic disturbance x to change the simulation            to stay in lane and drive in compliance with the rules of
in order to create failures. In return, it receives the simulator      traffic. Its base speed is fixed at 35mph, i.e. the standard
state s as well as the reward r. Using RL, the most likely             speed on most city streets. At each step, the system under
path to a failure event can then be found by maximizing the            test would receive a set of observations with the states of
reward. The framework operates in a black-box setting and              the pedestrians as well as the positions of the immobile
a multiple-step simulation of the situation which can lead to          obstacles. It would then choose an action based on these
a failure is required. Furthermore, simulation control func-           information which is then used to update the vehicle’s
tions need to be provided to the solver to allow for stochastic        state.
disturbances of the environment. The sampling which is sub-
sequently performed by the framework is adapted based on            3. Solver: The exploration of the state space is dependent
a Monte Carlo tree search (MCTS), allowing for a best-first            on the solver specifications. For additional details on the
exploration of the search space. This leads to the following           MCTS solver we propose to use, please refer to Lee et al.
formal problem (Koren, Corso, and Kochenderfer 2020):                  (2020). The solver should be able to interact with the
                                                                       simulator by resetting the simulator to its initial state,
             maximize       P (s0 , a0 , . . . , st , at )             by drawing the next state s′ after an action a was taken,
               a0 ,...,at
              subject to    st ∈ E                                     and by evaluating whether a terminal state (an ethical
                                                                       dilemma or the end of the time horizon) has been found.
with S being the simulator, E the event space,
P (s0 , a0 , ..., st , at ) the probability of a trajectory in      4. Reward Function: Compared to the original reward func-
simulator S and st = f (at , st−1 ).                                   tion by Lee et al. (2015), we suggest to use a modified
                                                                       version as implemented by Koren et al. (2018):
                                                                     The corresponding action space is
                                                                                             A = {ao , ap }
                                                                        No matter which action the agent would choose, he
                                                                     would violate either γp (by harming a pedestrian) or γo
Figure 3: Example ethical dilemma. A pedestrian moves in             (by harming its occupants) and as a consequence also ϕh ,
front of the vehicle, leaving it with the option to crash into       i.e. to cause no harm. Hence, neither option can be clearly
the pedestrian, a pedestrian on the left-hand side, or an ob-        identified as ethical and the agent ends up in a dilemma.
stacle on the right-hand side.                                       As per the original AST framework, instead of receiving a
                                                                     negative reward for a failure event, the agent would receive
                                                                     a positive reward for these situations to encourage finding
                0                         s∈E
            (
                                                                     paths to ethical dilemmas.
   R(s) =       −α − β × DIST (pv, pp) s ∈  / E, t ≥ T
                − log (1 + M (a, µa | s)) s ∈
                                            / E, t < T                  The goal of the AST framework is then to maximize this
   where DIST (pv, pp) would be the distance between the             reward by disturbing the pedestrian movement and creating
   closest pedestrian and the system under test, while the           failure states in which it receives the highest reward. This ap-
   Mahalanobis distance could be used as a proxy for the             proach results in the most likely path to an ethical dilemma
   probability of an action. See Koren et al. (2018) for more        – an information which could subsequently be used to pre-
   details. This reward function covers three cases: a) find-        vent this path from being taken, decreasing the likelihood of
   ing an ethical dilemma, which gives the highest reward,           ending up in such a dilemma in the first place.
   b) finding no dilemma and reaching the time horizon,
   which gives the lowest reward (by choosing high α and                          Future Research Directions
   β values), and c) finding no dilemma but the agent still          Identifying ethical dilemmas using AST comes with chal-
   operates within the specified time horizon T.                     lenges that need to be addressed in future work. Firstly, it
                                                                     depends on the availability of a simulator which sufficiently
Ethical Dilemmas As Failure Events The key idea is now               depicts an ethical decision situation. Secondly, the defined
to define our event of interest, i.e. the failure event, not as a    ethical principles need to be specific enough so that the agent
collision (as in Koren et al. (2018)) but as a decision situa-       can evaluate its available actions with regards to these prin-
tion in which the agent finds itself in an ethical dilemma.          ciples. Furthermore, the ethical principles should be defined
   One example for the subset of the state space we’re inter-        such that the majority of potentially affected people agrees
ested in in our simulator are settings in which the path of the      with them, which has been an open issue in research (Gabriel
system under test is blocked on both the left- and right-hand        2020). Also, while AST can find the most likely path to a
side, either by a pedestrian or an obstacle, while a pedes-          failure event, it might be the case that all possible paths re-
trian appears in close proximity in front of the vehicle (see        sult in an ethical dilemma, i.e. that it cannot be prevented.
Figure 3). We assume that a crash with an obstacle would             For these cases, other strategies to prevent or deal with ethi-
severely injure the passengers of the system under test while        cal dilemmas need to be employed, which are still an unre-
a crash with a pedestrian would severely injure the pedes-           solved question in the field. Another limitation of the AST
trian. We further assume that the agent would be given the           framework that has to be considered is that the downstream
ethical principle                                                    effect of immediate actions taken by the agent isn’t part of
                       ϕh = do no harm                               the analysis. Despite these open questions, our next step will
which could be translated into the context-specific ethical          be to implement the proposed setup for an empirical proof
rules                                                                of the approach. This could then be extended to show how
                γp = do not harm pedestrians                         the information of a path to an ethical dilemma can be used
                                                                     to prevent that path from being taken in the first place. While
                 γo = do not harm occupants                          not a one-size-fits-all framework to deal with ethical dilem-
   Note that our system does not require any weighting to be         mas in autonomous systems, AST can be used as part of a
given on harming an occupant vs. harming a pedestrian. It is         larger strategy to deal with such decision situations.
sufficient to say that a violation of either is a violation of the
directive to do no harm to a human. Confronted with the sit-                                Conclusions
uation described above, the autonomous agent identifies the          In this position paper, we showed how ethical failures can
following available actions (planning and identifying avail-         be defined and subsequently used as failure events in the
able actions is not part of this paper; please refer to Tulum,       AST framework. This constitutes a novel approach in deal-
Durak, and Yder (2009) or Coles et al. (2010) for further            ing with ethical dilemmas in autonomous decision systems:
information):                                                        Instead of solving them, we suggest to circumvent ethical
  • Option ao : Crash into an obstacle, likely causing harm          dilemmas in the first place by identifying the most likely
    to the agent’s occupants.                                        path to such a failure event. As a next step, we propose
  • Option ap : Crash into a pedestrian, likely causing harm         the implementation of the suggested simulator as a proof-
    to the the pedestrian and potentially the agent’s occu-          of-concept. Long-term, this approach could be part of more
    pants.                                                           comprehensive efforts to create ethical autonomous systems.
                        References                                Koren, M.; Corso, A.; and Kochenderfer, M. J. 2020.
Abel, D.; MacGlashan, J.; and Littman, M. L. 2016. Rein-          The adaptive stress testing formulation. arXiv preprint
forcement learning as a framework for ethical decision mak-       arXiv:2004.04293.
ing. In Workshops at the thirtieth AAAI conference on artifi-     Lee, R.; Kochenderfer, M. J.; Mengshoel, O. J.; Brat, G. P.;
cial intelligence.                                                and Owen, M. P. 2015. Adaptive stress testing of airborne
Aliman, N.-M.; and Kester, L. 2019. Transformative AI gov-        collision avoidance systems. In 2015 IEEE/AIAA 34th Dig-
ernance and AI-Empowered ethical enhancement through              ital Avionics Systems Conference (DASC), 6C2–1. IEEE.
preemptive simulations. Delphi, 2: 23.                            Lee, R.; Mengshoel, O. J.; Saksena, A.; Gardner, R. W.;
Anderson, M.; and Anderson, S. L. 2007. Machine ethics:           Genin, D.; Silbermann, J.; Owen, M.; and Kochenderfer,
Creating an ethical intelligent agent. AI magazine, 28(4):        M. J. 2020. Adaptive stress testing: Finding likely failure
15–15.                                                            events with reinforcement learning. Journal of Artificial In-
Asimov, I. 1950. I, Robot. Fawcett Publications.                  telligence Research, 69: 1165–1201.
Awad, E.; Dsouza, S.; Kim, R.; Schulz, J.; Henrich, J.; Shar-     Loh, J. 2017. Roboterethik. Über eine noch junge Bereich-
iff, A.; Bonnefon, J.-F.; and Rahwan, I. 2018. The moral          sethik. Information Philosophie, 20–33.
machine experiment. Nature, 563(7729): 59–64.                     Ng, A. Y.; Russell, S. J.; et al. 2000. Algorithms for inverse
Bentham, J.; and Mill, J. S. 2004. Utilitarianism and other       reinforcement learning. In Icml, volume 1, 2.
essays. Penguin UK.                                               Powers, T. M. 2006. Prospects for a Kantian machine. IEEE
Christiano, P.; Leike, J.; Brown, T. B.; Martic, M.; Legg, S.;    Intelligent Systems, 21(4): 46–51.
and Amodei, D. 2017. Deep reinforcement learning from             Russell, S. 2019. Human compatible: Artificial intelligence
human preferences. arXiv preprint arXiv:1706.03741.               and the problem of control. Penguin.
Coles, A.; Coles, A.; Fox, M.; and Long, D. 2010. Forward-        Scanlon, T. M. 2003. The difficulty of tolerance: Essays in
chaining partial-order planning. In Proceedings of the Inter-     political philosophy. Cambridge University Press.
national Conference on Automated Planning and Schedul-            Thomson, J. J. 1976. Killing, letting die, and the trolley
ing, volume 20.                                                   problem. The Monist, 59(2): 204–217.
Conitzer, V.; Sinnott-Armstrong, W.; Borg, J. S.; Deng, Y.;       Treiber, M.; Hennecke, A.; and Helbing, D. 2000. Con-
and Kramer, M. 2017. Moral decision making frameworks             gested traffic states in empirical observations and micro-
for artificial intelligence. In Thirty-first aaai conference on   scopic simulations. Physical review E, 62(2): 1805.
artificial intelligence.
                                                                  Tulum, K.; Durak, U.; and Yder, S. K. 2009. Situation aware
Corso, A.; Moss, R. J.; Koren, M.; Lee, R.; and Kochender-
                                                                  UAV mission route planning. In 2009 IEEE Aerospace con-
fer, M. J. 2020. A survey of algorithms for black-box safety
                                                                  ference, 1–12. IEEE.
validation. arXiv preprint arXiv:2005.02979.
                                                                  Wernaart, B. 2021. Developing a roadmap for the moral
Davis, N. 1993. Contemporary Deontology. In Singer, P.,
                                                                  programming of smart technology. Technology in Society,
ed., A Companion to Ethics. John Wiley & Sons.
                                                                  64: 101466.
Dennis, L.; Fisher, M.; Slavkovik, M.; and Webster, M.
2016. Formal verification of ethical choices in autonomous
systems. Robotics and Autonomous Systems, 77: 1–14.
Gabriel, I. 2020. Artificial intelligence, values, and align-
ment. Minds and machines, 30(3): 411–437.
Geisslinger, M.; Poszler, F.; Betz, J.; Lütge, C.; and
Lienkamp, M. 2021. Autonomous driving ethics: From Trol-
ley problem to ethics of risk. Philosophy & Technology, 1–
23.
Goodall, N. J. 2016. Away from trolley problems and toward
risk management. Applied Artificial Intelligence, 30(8):
810–821.
Grossi, D.; Meyer, J.-J. C.; and Dignum, F. 2005. Modal
logic investigations in the semantics of counts-as. In Pro-
ceedings of the 10th international conference on Artificial
intelligence and law, 1–9.
Hadfield-Menell, D.; Russell, S. J.; Abbeel, P.; and Dragan,
A. 2016. Cooperative inverse reinforcement learning. Ad-
vances in neural information processing systems, 29: 3909–
3917.
Koren, M.; Alsaif, S.; Lee, R.; and Kochenderfer, M. J. 2018.
Adaptive stress testing for autonomous vehicles. In 2018
IEEE Intelligent Vehicles Symposium (IV), 1–7. IEEE.