=Paper= {{Paper |id=Vol-3087/paper_53 |storemode=property |title=Using Adaptive Stress Testing to Identify Paths to Ethical Dilemmas in Autonomous Systems |pdfUrl=https://ceur-ws.org/Vol-3087/paper_53.pdf |volume=Vol-3087 |authors=Ann-Katrin Reuel,Mark Koren,Anthony Corso,Mykel J. Kochenderfer |dblpUrl=https://dblp.org/rec/conf/aaai/ReuelKCK22 }} ==Using Adaptive Stress Testing to Identify Paths to Ethical Dilemmas in Autonomous Systems== https://ceur-ws.org/Vol-3087/paper_53.pdf

Using Adaptive Stress Testing to Identify Paths to Ethical Dilemmas in
Autonomous Systems
Ann-Katrin Reuel1 , Mark Koren2 ,
Anthony Corso2 , Mykel J. Kochenderfer2
1
University of Pennsylvania, School of Engineering and Applied Sciences, Philadelphia, PA 19104
akreuel@seas.upenn.edu
2
Stanford University, School of Engineering, Stanford, CA 94305
mark.c.koren21@gmail.com, acorso@stanford.edu, mykel@stanford.edu
Abstract adaptive stress testing (AST) (Lee et al. 2020), a framework
based on reinforcement learning (RL), to explicitly identify
During operation, autonomous agents may find themselves
making decisions which have ethical ramifications. In this the most likely paths to ethical dilemmas. This could open
position paper, we look at one aspect of these situations: eth- new ways for agents to avoid such dilemmas in the first
ical dilemmas. We first define them as situations in which an place. We further suggest a pedestrian simulator example to
autonomous agent can only choose from actions that violate validate this idea.
one or more previously given ethical principle. Subsequently,
we suggest to use adaptive stress testing, a framework based Background
on reinforcement learning, as one way to uncover situations
where an autonomous system gets into an ethical dilemma. Moral programming and ethical decision making in par-
Using an example from the autonomous driving domain, we ticular have become major areas of interest in the field of
propose a simulator setup, define a context-specific ethical AI safety (Wernaart 2021; Aliman and Kester 2019). Con-
dilemma, and suggest how adaptive stress testing can be ap- sidering autonomous systems, this topic is still a relatively
plied to find the most likely path to an ethical dilemma. under-explored area in machine learning with many chal-
lenges. One such challenge is that how to make an ethical
Introduction decision is a disputed subject. There are different ethical
theories which might lead to contrasting answers to the
Safety-critical autonomous systems, such as autonomous question which action is the morally correct one to take. For
vehicles, are increasingly operating within society. Just example, utilitarianism seeks to maximize human welfare
like human beings, autonomous agents might encounter (Bentham and Mill 2004). In this context, actions are judged
situations where there’s no clear ethical course of action. based on their ability to maximize the expected overall
Rather, a decision between multiple unethical actions has utility of their immediate consequences. For example, the
to be made – this is what we call an ethical dilemma. cost of one human life would be outweighed by the cost
Ethical decision making for autonomous agents is already of many lives in this school of thought. On the other hand,
complicated by questions such as whose values to consider there are contractualist deontological ethics. Here, actions
and how to aggregate them in a way that can be used by are preferred which individuals in a social construct could
the agent (Russell 2019). However, ethical dilemmas give not reasonably reject (Scanlon 2003), i.e. actions which
rise to a further complication: How do we choose among conform to moral norms (Davis 1993; Geisslinger et al.
unethical options? How should we prioritize the ethical 2021). While such imperatives seem too unspecified to be
principles specified, to make an explicable decision among adapted in an autonomous system, efforts have been made to
these options? We contend, however, that there is no ethical translate these ideas in a way that machines can work with,
way for an agent to choose among unethical options. After e.g. by the Three Laws of Robotics (Asimov 1950). While
all, such dilemmas exist because even humans cannot agree these rule-based ethics have the potential to be used in a
on an unambiguously correct path of action. Instead, we machine-context due to their structured approach (Powers
propose that autonomous agents should explicitly reason in 2006), some authors have argued that context-specific
a way to prevent ending up in an ethical dilemma in the first information isn’t taken into account sufficiently, potentially
place. causing an autonomous agent to undertake risky behavior to
adhere to a strict set of rules (Loh 2017; Goodall 2016). An-
In this position paper, we first define ethical dilemmas as other challenge with regards to autonomous agents making
situations in which an autonomous agent can only choose ethical decisions is the question of how ethically-aligned
from actions that violate one or more previously given eth- behavior can be implemented in a machine. This becomes
ical principle. Subsequently, we suggest the application of especially challenging in real-world, culture-dependent
Copyright © 2022 for this paper by its authors. Use permitted under settings (Awad et al. 2018) due to their inherent complexity,
Creative Commons License Attribution 4.0 International (CC BY involving correlations which aren’t sufficiently depicted by
4.0). simplified ethical theories.
Despite these challenges, work has been done to im-
plement ethical decision making in autonomous systems.
Conitzer et al. (2017) discuss moral decision making
frameworks for autonomous agents on a high level. They
argue that systems based on ad-hoc rules are insufficient
and that a more general framework is needed. The authors
compare game theoretic formalism approaches to classical
supervised machine learning methods which are based on
a labeled ethical decision data set. Conitzer et al. (2017)
find that, while the former can take into account multi-agent
decisions, the basic representation schemes would need to
be extended to work as an ethical decision framework. On Figure 1: Simplified adaptive stress testing framework show-
the other hand, they argue that supervised learning could ing its core components (Lee et al. 2020).
help in making human-like ethical decisions. The major
issue here is that ethical decision situations tend to take
place in fairly complex statistical contexts, often involving tonomous systems, to identify the most likely path to an eth-
multiple human and non-human agents who do not always ical dilemma (for an overview of alternative approaches to
act rationally (Hadfield-Menell et al. 2016). Hence, ethical find failures in autonomous systems, please refer to Corso
decision situations are rarely comparable as even changing et al. (2020)). This information could subsequently be used
one parameter would often lead – from a human perspective to prevent the agent from arriving in an ethical dilemma in
– to a completely new evaluation of the situation. the first place.

Additional work to acquire and use human preferences Approach
in ethical decisions was conducted by Christiano et al.
Adaptive Stress Testing is a framework that is used in
(2017). The authors used deep inverse RL ((Ng, Russell
safety-critical systems like aircraft collision avoidance sys-
et al. 2000)), i.e. they involved humans in the agent’s
tems to find the most likely path to a failure event. Instead
learning process by giving the human repeatedly short
of defining failure events as critical system failures such as
snippets of situations which she should order according to
aircraft collisions, though, we define them in this position
her preferences. The agent would use this information to
paper as reaching a state in which the agent is in an ethical
refine its reward function, allowing it to iteratively adjust
dilemma. We want to highlight that we specifically don’t
the function to the human’s preferences. This approach
define an unethical action taken by the agent as failure but
could be used in ethical decision making, too, by showing
rather situations in which the agent can only make unethical
humans two outcomes of an ethical decision which they
decisions. This way, the issue of deciding for a course of
should order with regards to their desirability, analogous to
action in an ethical dilemma can be circumvented, because
the Moral Machines approach (Awad et al. 2018). A similar
the mere necessity for such a decision would qualify as a
idea was proposed by Abel, MacGlashan, and Littman
failure in our approach.
(2016) who came to the conclusion that RL can be used to
generalize moral values in a way that can be implemented
in machines. However, there are multiple issues with these We first define ethical failures. We subsequently suggest a
approaches: Firstly, one would need to select a balanced setup for our approach using a variation of the trolley prob-
group of people who contribute to the ethical learning lem which will be relevant in the context of autonomous
process of the agent to ensure that the moral judgement vehicles. The trolley problem, first proposed by Thomson
learned is representative of a larger population. Secondly, (1976), is a standard ethical dilemma considered in the liter-
given the necessary constant involvement of humans in the ature where an autonomous agent has multiple options in a
learning process, this approach scales poorly. In addition driving decision situation which all lead to fatal collisions.
to these shortcomings, none of the approaches discussed
Defining Ethical Failures
allows for the satisfactory resolution of ethical dilemmas,
especially when human feedback is necessary, since such Based on the work by Dennis et al. (2016), we consider a
dilemmas aren’t solvable by human beings per definition. set of abstract ethical principles Φ, with ϕ1 , ϕ2 , ..., ϕn corre-
Hence, it is unlikely that they can teach an agent what to do sponding to single abstract ethical principles such as ”Don’t
in such situations. harm humans.”:
Φ = {ϕ1 , ϕ2 , ..., ϕn }
Due to these issues, we argue that approaches to prevent
ethical dilemmas need to be studied, instead of trying to re- To transform these abstract principles into situation-
solve ethical decision situations when a clear moral action specific ethical rules Γ = γ1 , γ2 , ..., γn , case-based reason-
is not present. This position paper is the first to propose the ing is applied, as shown by (Anderson and Anderson 2007),
use of such an approach: We suggest to apply AST, an RL- which allows for a context-specific instantiating of the re-
based framework by Lee et al. (2020) to find failures in au- spective rules. A context, in our case, ”informs an agent of
what counts as a violation of the laws and principles by
which the context is governed” (Dennis et al. 2016). An
action is defined as unethical if it violates one or more of
the ethical rules in Γ in a given context c. This establish-
ment of ethical rules follows the deontologic ethics approach
(see Grossi, Meyer, and Dignum (2005) for more informa- Figure 2: Example initial setup for simulator. The red circles
tion). Given these prerequisites, we define what an ethical depict pedestrians while the green boxes show immobile ob-
dilemma is. To simplify, we assume that the defined ethical stacles.
principles in set Φ – and all ethical rules Γ derived from prin-
ciples in Φ – are equally important. Now, in a given context
c, we have a set of actions A available to the agent:
Simulation Design As a first step to show that AST can
be used to identify paths to ethical dilemmas, we propose
Ac = a1 , a2 , ..., an
a toy problem in an autonomous vehicle simulator. We use
If all of these actions violate one or more ethical rules the following specifications to propose a scenario which in-
in the set Γ and hence in the principle set Φ, there is per cludes a version of the trolley problem (overall structure and
definition no ethical option available to the agent. The agent core components modelled based on Koren et al. (2018)):
finds itself in an ethical dilemma.
1. Environment: We propose to use a simplified environ-
Applying Adaptive Stress Testing ment where an autonomous vehicle drives on a one-lane
The evaluation of failure events has been extensively stud- street. On the sidewalk on each side of the street are
ied in safety-critical applications such as aircraft collision both immobile obstacles as well as a variable number of
systems. One approach taken in this field is AST: Lee pedestrians who are free to move in any direction, in-
et al. (2020) were interested in finding the most likely path cluding past obstacles and across the street (see Figure
to failure events in “complex stochastic environments” 2). They can be described by their velocity (v̂x(i) , v̂y (i) )
(Lee et al. 2020) to understand how an agent arrives at and position (x̂(i) , ŷ (i) ), both relative to the system under
a failure and hence prevent that failure path from being test (see below). The positions of the obstacles should be
taken in the first place. Essentially, the authors followed fixed while the pedestrians’ movement is controlled by
a simulation-based approach where the knowledge of the AST. h i
system under test wasn’t necessary. They formulated the (1) (2) (n)
The simulation state ssim = ssim , ssim , . . . , ssim con-
problem as a sequential Markov Decision Process (MDP)
(i)
in both fully and partially observable environments with hsists of the states ofi each pedestrian i, with ssim =
stochastic disturbances. Subsequently, they let an agent try (i)
v̂x(i) , v̂y , x̂(i) , ŷ (i) . For more details on the simula-
to maximize a reward function in this environment which
rewards it for what is defined as failure. tion of pedestrian movement, please refer to Koren et al.
(2018).
In AST, there are four main components (see Figure 2. System under Test: We propose to use the Intelligent
1): the simulator, the system under test, the environment, Driver Model (IDM) (Treiber, Hennecke, and Helbing
and the reinforcement learner. The reinforcement learner 2000) as our system under test. The IDM is programmed
chooses a stochastic disturbance x to change the simulation to stay in lane and drive in compliance with the rules of
in order to create failures. In return, it receives the simulator traffic. Its base speed is fixed at 35mph, i.e. the standard
state s as well as the reward r. Using RL, the most likely speed on most city streets. At each step, the system under
path to a failure event can then be found by maximizing the test would receive a set of observations with the states of
reward. The framework operates in a black-box setting and the pedestrians as well as the positions of the immobile
a multiple-step simulation of the situation which can lead to obstacles. It would then choose an action based on these
a failure is required. Furthermore, simulation control func- information which is then used to update the vehicle’s
tions need to be provided to the solver to allow for stochastic state.
disturbances of the environment. The sampling which is sub-
sequently performed by the framework is adapted based on 3. Solver: The exploration of the state space is dependent
a Monte Carlo tree search (MCTS), allowing for a best-first on the solver specifications. For additional details on the
exploration of the search space. This leads to the following MCTS solver we propose to use, please refer to Lee et al.
formal problem (Koren, Corso, and Kochenderfer 2020): (2020). The solver should be able to interact with the
simulator by resetting the simulator to its initial state,
maximize P (s0 , a0 , . . . , st , at ) by drawing the next state s′ after an action a was taken,
a0 ,...,at
subject to st ∈ E and by evaluating whether a terminal state (an ethical
dilemma or the end of the time horizon) has been found.
with S being the simulator, E the event space,
P (s0 , a0 , ..., st , at ) the probability of a trajectory in 4. Reward Function: Compared to the original reward func-
simulator S and st = f (at , st−1 ). tion by Lee et al. (2015), we suggest to use a modified
version as implemented by Koren et al. (2018):
The corresponding action space is
A = {ao , ap }
No matter which action the agent would choose, he
would violate either γp (by harming a pedestrian) or γo
Figure 3: Example ethical dilemma. A pedestrian moves in (by harming its occupants) and as a consequence also ϕh ,
front of the vehicle, leaving it with the option to crash into i.e. to cause no harm. Hence, neither option can be clearly
the pedestrian, a pedestrian on the left-hand side, or an ob- identified as ethical and the agent ends up in a dilemma.
stacle on the right-hand side. As per the original AST framework, instead of receiving a
negative reward for a failure event, the agent would receive
a positive reward for these situations to encourage finding
0 s∈E
(
paths to ethical dilemmas.
R(s) = −α − β × DIST (pv, pp) s ∈ / E, t ≥ T
− log (1 + M (a, µa | s)) s ∈
/ E, t < T The goal of the AST framework is then to maximize this
where DIST (pv, pp) would be the distance between the reward by disturbing the pedestrian movement and creating
closest pedestrian and the system under test, while the failure states in which it receives the highest reward. This ap-
Mahalanobis distance could be used as a proxy for the proach results in the most likely path to an ethical dilemma
probability of an action. See Koren et al. (2018) for more – an information which could subsequently be used to pre-
details. This reward function covers three cases: a) find- vent this path from being taken, decreasing the likelihood of
ing an ethical dilemma, which gives the highest reward, ending up in such a dilemma in the first place.
b) finding no dilemma and reaching the time horizon,
which gives the lowest reward (by choosing high α and Future Research Directions
β values), and c) finding no dilemma but the agent still Identifying ethical dilemmas using AST comes with chal-
operates within the specified time horizon T. lenges that need to be addressed in future work. Firstly, it
depends on the availability of a simulator which sufficiently
Ethical Dilemmas As Failure Events The key idea is now depicts an ethical decision situation. Secondly, the defined
to define our event of interest, i.e. the failure event, not as a ethical principles need to be specific enough so that the agent
collision (as in Koren et al. (2018)) but as a decision situa- can evaluate its available actions with regards to these prin-
tion in which the agent finds itself in an ethical dilemma. ciples. Furthermore, the ethical principles should be defined
One example for the subset of the state space we’re inter- such that the majority of potentially affected people agrees
ested in in our simulator are settings in which the path of the with them, which has been an open issue in research (Gabriel
system under test is blocked on both the left- and right-hand 2020). Also, while AST can find the most likely path to a
side, either by a pedestrian or an obstacle, while a pedes- failure event, it might be the case that all possible paths re-
trian appears in close proximity in front of the vehicle (see sult in an ethical dilemma, i.e. that it cannot be prevented.
Figure 3). We assume that a crash with an obstacle would For these cases, other strategies to prevent or deal with ethi-
severely injure the passengers of the system under test while cal dilemmas need to be employed, which are still an unre-
a crash with a pedestrian would severely injure the pedes- solved question in the field. Another limitation of the AST
trian. We further assume that the agent would be given the framework that has to be considered is that the downstream
ethical principle effect of immediate actions taken by the agent isn’t part of
ϕh = do no harm the analysis. Despite these open questions, our next step will
which could be translated into the context-specific ethical be to implement the proposed setup for an empirical proof
rules of the approach. This could then be extended to show how
γp = do not harm pedestrians the information of a path to an ethical dilemma can be used
to prevent that path from being taken in the first place. While
γo = do not harm occupants not a one-size-fits-all framework to deal with ethical dilem-
Note that our system does not require any weighting to be mas in autonomous systems, AST can be used as part of a
given on harming an occupant vs. harming a pedestrian. It is larger strategy to deal with such decision situations.
sufficient to say that a violation of either is a violation of the
directive to do no harm to a human. Confronted with the sit- Conclusions
uation described above, the autonomous agent identifies the In this position paper, we showed how ethical failures can
following available actions (planning and identifying avail- be defined and subsequently used as failure events in the
able actions is not part of this paper; please refer to Tulum, AST framework. This constitutes a novel approach in deal-
Durak, and Yder (2009) or Coles et al. (2010) for further ing with ethical dilemmas in autonomous decision systems:
information): Instead of solving them, we suggest to circumvent ethical
• Option ao : Crash into an obstacle, likely causing harm dilemmas in the first place by identifying the most likely
to the agent’s occupants. path to such a failure event. As a next step, we propose
• Option ap : Crash into a pedestrian, likely causing harm the implementation of the suggested simulator as a proof-
to the the pedestrian and potentially the agent’s occu- of-concept. Long-term, this approach could be part of more
pants. comprehensive efforts to create ethical autonomous systems.
References Koren, M.; Corso, A.; and Kochenderfer, M. J. 2020.
Abel, D.; MacGlashan, J.; and Littman, M. L. 2016. Rein- The adaptive stress testing formulation. arXiv preprint
forcement learning as a framework for ethical decision mak- arXiv:2004.04293.
ing. In Workshops at the thirtieth AAAI conference on artifi- Lee, R.; Kochenderfer, M. J.; Mengshoel, O. J.; Brat, G. P.;
cial intelligence. and Owen, M. P. 2015. Adaptive stress testing of airborne
Aliman, N.-M.; and Kester, L. 2019. Transformative AI gov- collision avoidance systems. In 2015 IEEE/AIAA 34th Dig-
ernance and AI-Empowered ethical enhancement through ital Avionics Systems Conference (DASC), 6C2–1. IEEE.
preemptive simulations. Delphi, 2: 23. Lee, R.; Mengshoel, O. J.; Saksena, A.; Gardner, R. W.;
Anderson, M.; and Anderson, S. L. 2007. Machine ethics: Genin, D.; Silbermann, J.; Owen, M.; and Kochenderfer,
Creating an ethical intelligent agent. AI magazine, 28(4): M. J. 2020. Adaptive stress testing: Finding likely failure
15–15. events with reinforcement learning. Journal of Artificial In-
Asimov, I. 1950. I, Robot. Fawcett Publications. telligence Research, 69: 1165–1201.
Awad, E.; Dsouza, S.; Kim, R.; Schulz, J.; Henrich, J.; Shar- Loh, J. 2017. Roboterethik. Über eine noch junge Bereich-
iff, A.; Bonnefon, J.-F.; and Rahwan, I. 2018. The moral sethik. Information Philosophie, 20–33.
machine experiment. Nature, 563(7729): 59–64. Ng, A. Y.; Russell, S. J.; et al. 2000. Algorithms for inverse
Bentham, J.; and Mill, J. S. 2004. Utilitarianism and other reinforcement learning. In Icml, volume 1, 2.
essays. Penguin UK. Powers, T. M. 2006. Prospects for a Kantian machine. IEEE
Christiano, P.; Leike, J.; Brown, T. B.; Martic, M.; Legg, S.; Intelligent Systems, 21(4): 46–51.
and Amodei, D. 2017. Deep reinforcement learning from Russell, S. 2019. Human compatible: Artificial intelligence
human preferences. arXiv preprint arXiv:1706.03741. and the problem of control. Penguin.
Coles, A.; Coles, A.; Fox, M.; and Long, D. 2010. Forward- Scanlon, T. M. 2003. The difficulty of tolerance: Essays in
chaining partial-order planning. In Proceedings of the Inter- political philosophy. Cambridge University Press.
national Conference on Automated Planning and Schedul- Thomson, J. J. 1976. Killing, letting die, and the trolley
ing, volume 20. problem. The Monist, 59(2): 204–217.
Conitzer, V.; Sinnott-Armstrong, W.; Borg, J. S.; Deng, Y.; Treiber, M.; Hennecke, A.; and Helbing, D. 2000. Con-
and Kramer, M. 2017. Moral decision making frameworks gested traffic states in empirical observations and micro-
for artificial intelligence. In Thirty-first aaai conference on scopic simulations. Physical review E, 62(2): 1805.
artificial intelligence.
Tulum, K.; Durak, U.; and Yder, S. K. 2009. Situation aware
Corso, A.; Moss, R. J.; Koren, M.; Lee, R.; and Kochender-
UAV mission route planning. In 2009 IEEE Aerospace con-
fer, M. J. 2020. A survey of algorithms for black-box safety
ference, 1–12. IEEE.
validation. arXiv preprint arXiv:2005.02979.
Wernaart, B. 2021. Developing a roadmap for the moral
Davis, N. 1993. Contemporary Deontology. In Singer, P.,
programming of smart technology. Technology in Society,
ed., A Companion to Ethics. John Wiley & Sons.
64: 101466.
Dennis, L.; Fisher, M.; Slavkovik, M.; and Webster, M.
2016. Formal verification of ethical choices in autonomous
systems. Robotics and Autonomous Systems, 77: 1–14.
Gabriel, I. 2020. Artificial intelligence, values, and align-
ment. Minds and machines, 30(3): 411–437.
Geisslinger, M.; Poszler, F.; Betz, J.; Lütge, C.; and
Lienkamp, M. 2021. Autonomous driving ethics: From Trol-
ley problem to ethics of risk. Philosophy & Technology, 1–
23.
Goodall, N. J. 2016. Away from trolley problems and toward
risk management. Applied Artificial Intelligence, 30(8):
810–821.
Grossi, D.; Meyer, J.-J. C.; and Dignum, F. 2005. Modal
logic investigations in the semantics of counts-as. In Pro-
ceedings of the 10th international conference on Artificial
intelligence and law, 1–9.
Hadfield-Menell, D.; Russell, S. J.; Abbeel, P.; and Dragan,
A. 2016. Cooperative inverse reinforcement learning. Ad-
vances in neural information processing systems, 29: 3909–
3917.
Koren, M.; Alsaif, S.; Lee, R.; and Kochenderfer, M. J. 2018.
Adaptive stress testing for autonomous vehicles. In 2018
IEEE Intelligent Vehicles Symposium (IV), 1–7. IEEE.