=Paper= {{Paper |id=Vol-2640/paper_21 |storemode=property |title=Extracting Money from Causal Decision Theorists |pdfUrl=https://ceur-ws.org/Vol-2640/paper_21.pdf |volume=Vol-2640 |authors=Caspar Oesterheld,Vincent Conitzer |dblpUrl=https://dblp.org/rec/conf/ijcai/OesterheldC20 }} ==Extracting Money from Causal Decision Theorists== https://ceur-ws.org/Vol-2640/paper_21.pdf
                           Extracting Money from Causal Decision Theorists

                                      Caspar Oesterheld∗ , Vincent Conitzer
                                   Duke University, Department of Computer Science
                                          {ocaspar, conitzer}@cs.duke.edu



                           Abstract                                  that Newcomb’s problem closely resembles playing a Pris-
                                                                     oner’s Dilemma against a similar opponent [Brams, 1975;
        Newcomb’s problem has spawned a debate about                 Lewis, 1979]. Whereas CDT recommends defecting even
        which variant of expected utility maximization (if           in a Prisoner’s Dilemma against an exact copy, EDT rec-
        any) should guide rational choice. In this paper,            ommends cooperating when facing sufficiently similar op-
        we provide a new argument against what is proba-             ponents. What degree of similarity is required and whether
        bly the most popular variant: causal decision the-           EDT and CDT ever come apart in this way in practice has
        ory (CDT). In particular, we provide two scenarios           been the subject of much discussion [Ahmed, 2014, Sec-
        in which CDT voluntarily loses money. In the first,          tion 4.6]. A common view is that CDT and EDT come
        an agent faces a single choice and following CDT’s           apart only under fairly specific circumstances that do not
        recommendation yields a loss of money in expec-              include most human interactions. Still, Hofstadter [1983],
        tation. The second scenario extends the first to a           for example, argues that “superrational” humans should co-
        diachronic Dutch book against CDT.                           operate with each other in a real-world one-shot Prisoner’s
                                                                     Dilemma, reasoning in a way that resembles the reasoning
                                                                     done under EDT. Economists have usually been somewhat
1       Introduction                                                 dismissive of such ideas, sometimes referring to them as
In Newcomb’s problem [Nozick, 1969; Ahmed, 2014], a “be-             “(quasi-)magical thinking” when trying to explain observed
ing” offers two boxes, A and B. Box A is transparent and             human behavior [Shafir and Tversky, 1992; Masel, 2007;
contains $1,000. Box B is opaque and may contain either              Daley and Sadowski, 2017]. Indeed, standard game-theoretic
$1,000,000 or nothing. An agent is asked to choose between           solution concepts are closely related to ratificationism [Joyce
receiving the contents of both boxes, or of box B only. How-         and Gibbard, 1998, Section 5], a variant of CDT which we
ever, the being has put $1,000,000 in box B if and only if the       will revisit later in this paper (see Part 3 of Section 4).
being predicted that the agent would choose box B only. The             Second, even if the players of a game are dissimilar, it is
being’s predictions are uncannily accurate. What should the          in the nature of strategic interactions that each player’s be-
agent do?                                                            havior is predicted by the other players. Newcomb’s prob-
   Causal decision theory (CDT) recommends that the agent            lem itself (as well as the A DVERSARIAL O FFER discussed in
reason as follows: I cannot causally affect the content of the       this paper) differs from strategic interactions as they are usu-
boxes – whatever is in the boxes is already there. Thus, if          ally considered in game theory in its asymmetry. However,
I choose both boxes, regardless of what is in box B, I will          one might still expect that CDT and EDT offer different per-
end up with $1,000 more than if I choose one box. Hence, I           spectives on how rational agents should deal with the mutual
should choose both boxes.                                            prediction inherent in strategic interactions [Gauthier, 1989,
   Evidential decision theory (EDT), on the other hand, rec-         Section XI].
ommends that the agent reason as follows: if I choose                   Third, CDT and EDT differ in their treatment of situations
one box, then in all likelihood the being predicted that I           with imperfect recall. Seminal discussions of such games are
would choose one box, so I can expect to walk away with              due to Piccione and Rubinstein [1997] and Aumann et al.
$1,000,000. (Even if the being is wrong some small percent-          [1997] [cf. Bostrom, 2010, for an overview from a philoso-
age of the time, the expected value will remain at least close to    pher’s perspective]. While these problems originally were
$1,000,000.) If I choose both, then I can expect to walk away        not associated with Newcomb’s problem, the relevance of
with (close to) $1,000. Hence, I should choose one box.              different decision theories in this context has been pointed
   While Newcomb’s problem itself is far-fetched, it has been        out by Briggs [2010] [cf. Armstrong, 2011; Schwarz, 2015;
argued that the difference between CDT and EDT matters in            Conitzer, 2015].
various game-theoretic settings. First, it has been pointed out         The importance of the differences in decision theory is am-
                                                                     plified if, instead of humans, we consider artificial agents.
    ∗
        Contact Author                                               After all, it is common that multiple copies of a software sys-


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
tem are deployed and other parties are often able to obtain             2       Extracting a profit in expectation from
the source code of a system to analyze or predict its behavior.                 causal decision theorists
As some software systems choose more autonomously, we
might expect their behavior will be (approximately) describ-            Consider the following scenario:
able by CDT or EDT (or yet some other theory). If either of                  A DVERSARIAL O FFER: Two boxes, B1 and B2 ,
these theories has serious flaws, we might worry that if a sys-              are on offer. A (risk-neutral) buyer may purchase
tem implements the wrong theory, it will make unexpected,                    one or none of the boxes but not both. Each of
suboptimal choices in some scenarios. Such scenarios might                   the two boxes costs $1. Yesterday, the seller put
arise naturally, e.g., as many copies of a system are deployed.              $3 in each box that she predicted the buyer not to
We might also worry about adversarial problems like the one                  acquire. Both the seller and the buyer believe the
in this paper.                                                               seller’s prediction to be accurate with probability
                                                                             0.75. No randomization device is available to the
   One argument against CDT is that causal decision theorists                buyer (or at least no randomization device that is
(tend to) walk away with less money than evidential decision                 not predictable to the seller).2
theorists, but this argument has not proved decisive in the de-
bate. For instance, one influential response has been that CDT            If the buyer takes either box Bi , then the expected money
makes the best out of the situation – fixing whether the money          gained by the seller is
is in box B – which EDT does not [Joyce, 1999, Section 5.1].                         $1 − P ($3 in Bi | buyer chooses Bi ) · $3
It would be more convincing if there were Newcomb-like sce-                        = $1 − 0.25 · $3
narios in which a causal decision theorist volunteers to lose
money (in expectation or with certainty).1 Constructing such                       = $0.25.
a scenario from Newcomb’s problem is non-trivial. For ex-               Hence, the buyer suffers an expected loss of $0.25 (if he buys
ample, in Newcomb’s problem, a causal decision theorist may             a box). The best action for the buyer therefore appears to be
realize that box B will be empty. Hence, he would be unwill-            to not purchase either box. Indeed, this is the course of action
ing to pay more than $1,000 for the opportunity to play the             prescribed by EDT as well as other decision theories that rec-
game.                                                                   ommend one-boxing in Newcomb’s problem [e.g., those pro-
   In this paper, we provide Newcomb-like decision problems             posed by Spohn, 2012; Poellinger, 2013; Soares and Levin-
in which the causal decision theorist voluntarily loses money           stein, 2017].
to another agent. We first give a single-decision scenario in              In contrast, CDT prescribes that the buyer buy one of the
which this is true only in expectation (Section 2). We then             two boxes. Because the agent cannot causally affect yester-
extend the scenario to create a diachronic Dutch book against           day’s prediction, CDT prescribes to calculate the expected
CDT – a two-step scenario in which the causal decision theo-            utility of buying box Bi as
rist is sure to lose money (Section 3). Finally, we discuss the                           P ($3 in box Bi ) · $3 − $1,                (1)
implications of the existence of such scenarios (Section 4).
                                                                        where P ($3 in box Bi ) is the buyer’s subjective probability
                                                                        that the seller has put money in box Bi , prior to updating
                                                                        this belief based on his own decision. For i = 1, 2, let pi be
    1
      Walking away with the maximum possible (expected) payoff
                                                                        the probability that the buyer assigns to the seller having pre-
under any circumstances is not a realistic desideratum for a decision   dicted him to buy Bi . Similarly, let p0 be the probability the
theory: any decision theory X has a lower expected payoff than some     buyer assigns to the seller having predicted him to buy noth-
other decision theory Y in a decision problem that rewards agents       ing. These beliefs should satisfy p0 + p1 + p2 = 1. Because
simply for using decision theory Y [cf. Skalse, 2018, for a harder-     p0 ≥ 0, we have that (p0 +p1 )+(p0 +p2 ) = 2p0 +p1 +p2 ≥ 1.
to-defuse version of this point]. However, such a setup does not        Hence, it must be the case that p0 + p1 ≥ 12 or p0 + p2 ≥ 12
allow one to devise a generic scenario in which an agent voluntarily    (or both). Because P ($3 in box Bi ) = p0 + p3−i for i = 1, 2,
loses money, i.e. loses money in spite of having the option to walk     it is P ($3 in box Bi ) ≥ 12 for at least one i ∈ {1, 2}. Thus,
away losing nothing.                                                    the expected utility in eq. 1 of at least one of the two possible
   Furthermore, scenarios with voluntary loss appear significantly      purchases is at least 12 · $3 − $1 = $0.50, which is positive.
more problematic for pragmatic reasons. Regardless of what you
think is the right option in Newcomb’s problem, you might not view
                                                                            Any seller capable of predicting the causal decision theo-
Newcomb’s problem as relevant ground for decision-theoretical ar-       rist sufficiently well will thus have an incentive to use this
gument because it is so unlikely that one would ever face New-          scheme to exploit CDT agents. (It does not matter whether
comb’s problem in the real world. For instance, even if you thought     the seller subscribes to CDT or EDT.) It should be noted that
that one-boxing is rational (and two-boxing is not), you might stick    even if the buyer uses CDT, his view of the deal matches the
with CDT nonetheless because your real-world expected opportu-          seller’s as soon as the dollar is paid. That is, after observ-
nity costs from two-boxing in Newcomb’s problem are negligible.         ing his action, he will realize that the box he bought is empty
[For some discussion of this deflationary argument, see, e.g., Gau-
                                                                            2
thier, 1989, Section XI; Ahmed, 2014, Section 7.1.iv; Oesterheld,            This decision problem resembles the widely discussed Death in
2019, Section 1, and references therein.] However, if there is a        Damascus scenario [introduced to the decision theory literature by
Newcomb-like problem in which the causal decision theorist volun-       Gibbard and Harper, 1981, Section 11] and even more closely the
tarily loses money to some other agent, this generates a significant    Frustrater case proposed by Spencer and Wells [2017], though these
incentive to place him in such a situation.                             are not set up to result in an expected financial loss.


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
with probability 0.75 and thus worth less than a dollar. CDT              treats any other random variable in the environment. So the
knows that it will regret its choice [see Joyce, 2012; Weirich,           causal expected utility of not opting out is just what an out-
1985 for discussions of the phenomenon of anticipated regret              side observer would expect the payoff of a CDT agent facing
a.k.a. decision instability in CDT].                                      A DVERSARIAL O FFER to be. Because this expected payoff
                                                                          of −$0.25 is less than the certain payoff of −$0.20 that can
3       A diachronic Dutch book against causal                            be obtained by opting out, CDT recommends opting out.
        decision theory                                                      In fact, for the argument in the previous paragraph to suc-
                                                                          ceed, it is only necessary that CDT is used on Tuesday; other
A DVERSARIAL O FFER results in a loss in expectation for the              decision theories would also recommend accepting the Mon-
causal decision theorist. It is natural to ask whether it is pos-         day offer, if they anticipate that the agent will use CDT on
sible to set up the scenario so that the causal decision theo-            Tuesday. For instance, if the agent followed EDT on Monday
rist ends up with a sure loss; effectively, a Dutch book. Ar-             and CDT on Tuesday (and is aware on Monday that he will
guably, Dutch books are more convincing than scenarios with               use CDT on Tuesday), then he would still accept the Monday
expected losses since the very meaning of “expectations” is               offer. Similarly, if the seller believes that the buyer will pick
the subject of the debate about EDT and CDT. Of course, if                one of the boxes on Tuesday, then she will hope that he re-
the seller could perfectly predict the buyer in A DVERSARIAL              jects the Monday offer. Thus, it seems that what creates the
O FFER (instead of being right only 75% of the time), then                opportunity for a Dutch book is the prospect of buying a box
A DVERSARIAL O FFER would become a Dutch book. But                        on Tuesday (as CDT recommends), not the use of CDT on
can we construct a Dutch book without perfect prediction?                 Monday.
   We have already observed that in A DVERSARIAL O FFER
the causal decision theorist always regrets his decision after
observing its execution. This suggests the following simple               4     Discussion
approach to constructing a Dutch book. After the box is sold,             We differentiate four types of responses to these scenarios
the seller allows the buyer to reverse his decision for a small           available to supporters of causal decision theory:
fee (ending up without any box and having lost only the fee).
However, a CDT buyer may then anticipate eventually undo-                     1. They could claim that these scenarios are irrelevant for
ing his choice and therefore not buy a box in the first place                    evaluating decision theories, in the sense that they are
[Ahmed, 2014, Section 3.2; though cf. Skyrms, 1993; Ra-                          impossible to set up or otherwise out of scope, and there-
binowicz, 2000].3 To get our Dutch book to work, we add                          fore unpersuasive.
another choice before A DVERSARIAL O FFER.                                    2. They could concede that these scenarios are relevant for
        A DVERSARIAL O FFER WITH O PT-O UT: It is                                evaluating decision theories, but claim that CDT’s rec-
        Monday. The buyer is scheduled to face the A D -                         ommendations in them are acceptable.
        VERSARIAL O FFER on Tuesday. He also knows
                                                                              3. They could concede that our analysis obliges them to
        that the seller’s prediction was already made on
                                                                                 give up on certain specific formulations of CDT, but try
        Sunday.
                                                                                 to modify CDT to get these scenarios right while main-
        As a courtesy to her customer, the seller approaches                     taining some of its essence, in particular two-boxing and
        the buyer on Monday. She offers to not offer the                         the causal dominance principle.
        boxes on Tuesday if the buyer pays her $0.20.
                                                                              4. They could concede that these scenarios show that the
   Note that the seller does not attempt to predict whether the                  very core of CDT (two-boxing and the causal dominance
buyer will pay to opt out. Also, we assume that the buyer                        principle) is implausible.
cannot, on Monday, commit himself to a course of action to
follow on Tuesday.                                                        We will discuss these options in turn.
   It seems that a rational agent should never feel compelled
                                                                          1 Surely, if one could show that a CDT agent will or can
to accept the Monday offer. After all, doing so loses him
                                                                          never face these scenarios – despite the seller having an obvi-
money with certainty, whereas simply refusing both offers
                                                                          ous incentive to set them up – that would be the most convinc-
(on Monday and on Tuesday) guarantees that he loses no
                                                                          ing defense of CDT. In particular, a causal decision theorist
money.
                                                                          might claim that sufficiently accurate prediction of a CDT
   CDT, however, recommends opting out on Monday, for the
                                                                          agent is simply impossible.4 However, not much accuracy
following reasons. A CDT buyer knows on Monday that if
                                                                          is required, for the following reasons. The CDT agent will
he does not opt out, he will buy a box on Tuesday (though
                                                                          take one of the two boxes. Even if the seller picks the box to
he may not yet know which one). Further, he believes that
                                                                          fill with money uniformly at random, she would therefore be
whatever box he will take on Tuesday will contain $3 with
                                                                          right half of the time. If she can do any better than that, pre-
only 25% probability, thus implying an overall expected pay-
                                                                          dicting correctly with probability 1/2+, then she can extract
off of 0.25 · $3 − $1 = −$0.25. This is because, on Monday,
                                                                          money from the CDT agent by putting (instead of $3) some
CDT treats the decision on Tuesday in the same way as it
                                                                          amount between $2/(1 − 2) and $2 in the box predicted not
    3
     This, of course, requires that the reversal offer does not come as
                                                                             4
a surprise. Throughout, we insist that the buyer knows all the rules           For a general discussion of such unpredictability claims in de-
of the game.                                                              fense of CDT, see Ahmed [2014, Chapter 8].


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
to be taken. Thus, the CDT agent needs to be completely un-            determined by a computer program) facing a wide range of
predictable in order to avoid being taken advantage of in these        scenarios including the ones given in this paper.
examples.
                                                                       2 If our scenarios are within the scope of causal decision
   Most human beings are, generally speaking, at least some-
                                                                       theory, then the supporter of causal decision theory has to
what predictable in their actions even when such predictabil-
                                                                       contend with the fact that one can extract expected money
ity can be used against them. For example, in rock-paper-
                                                                       from, and even Dutch-book, CDT agents in them. But he
scissors – which structurally resembles the A DVERSARIAL
                                                                       might question the significance of Dutch-book arguments and
O FFER – most people follow exploitable patterns in what
                                                                       other money extraction schemes, either in general or in this
moves they select [see, e.g., Farber, 2015, and references
                                                                       particular context. For some general discussion of whether
therein].5 Consider such a somewhat predictable person
                                                                       (diachronic) Dutch books are conclusive decision-theoretic
who aims to be a causal decision theorist. It seems that he
                                                                       arguments, see, e.g., Vineberg [2016] or Hájek [2009]. Note,
would indeed be vulnerable to the examples discussed ear-
                                                                       though, that some of the most influential arguments in favor
lier. The only defense for the supporter of causal decision
                                                                       of expected utility maximization (EUM) – of which CDT is
theory would seem to then be that if so, the person in ques-
                                                                       a refinement – are Dutch books. Of course, one may use dif-
tion is not truly acting in the way that CDT describes. That
                                                                       ferent arguments to justify EUM. But it would seem odd to
is, acting according to CDT also requires being unpredictable
                                                                       follow Dutch-book arguments to EUM but no further.
to the seller, either by succeeding at out-thinking the seller
sufficiently often, or by acting sufficiently randomly.                    Instead of rehashing some of the more generic reasons for
                                                                       and against the persuasiveness of Dutch books and loss of
   Is it reasonable to consider this a requirement of acting ac-
                                                                       money in expectation, we here discuss a response that is spe-
cording to CDT? CDT does not suggest any strict preference
                                                                       cific to CDT and A DVERSARIAL O FFER WITH O PT-O UT.6
for choosing randomly across options, as opposed to just de-
                                                                       A causal decision theorist may argue that it is not gener-
terministically choosing one of the options that is best accord-
                                                                       ally fair to expect any kind of coherence from CDT’s recom-
ing to the buyer’s beliefs. Hence, the unpredictability would
                                                                       mendations when multiple decisions are to be made across
have to emerge from the buyer attempting to out-think the
                                                                       time, due to the different perspectives that the decision maker
seller. But it does not seem that this is always an attainable
                                                                       adopts (and, arguably, has to adopt) at different points in time.
goal. For example, imagine that the buyer is a deterministic
                                                                       Consider Newcomb’s problem. Let t0 be the time at which
computer program whose source code is known to the seller.
                                                                       the predictor observes the agent (perhaps using fMRI or the
Then regardless of how exactly the agent works, the seller
                                                                       like) in order to make a prediction. Then, before t0 , CDT rec-
can predict the buyer’s behavior perfectly [cf. Soares and Fal-
                                                                       ommends committing – and if needed paying money to com-
lenstein, 2014, Section 2; Cavalcanti, 2010, Section 5]. We
                                                                       mit – to one-boxing [cf. Barnes, 1997; Joyce, 1999, pp. 153f.;
would thus be forced to conclude that such a program cannot
                                                                       Meacham, 2010]. After t0 , CDT recommends two-boxing.
possibly follow CDT, which to us is an unsatisfactory con-
                                                                       However, most decision theorists do not consider this to be a
clusion. Plausibly any other physically realized agent that
                                                                       compelling argument against CDT. The causal decision theo-
chooses deterministically can at least in principle (if not with
                                                                       rist can easily justify the difference in the decision made by
current technology) be predicted by creating or emulating an
                                                                       the fact that, before t0 , the commitment decision has a causal
atom-by-atom copy of that agent [cf. Yudkowsky, 2010, pp.
                                                                       effect on what is in the boxes, and after t0 , it does not.
85ff.].
                                                                           It would be hypocritical for an evidential decision theorist
   Even if the supporter of CDT acknowledges that these sce-
                                                                       to disagree, since EDT is dynamically inconsistent in analo-
narios are possible, he might nevertheless argue that they are
                                                                       gous ways. For instance, consider a version of Newcomb’s
irrelevant, in the sense that the decision theory is not intended
                                                                       problem in which both boxes are transparent [Gibbard and
to be used for such scenarios and hence nothing that one could
                                                                       Harper, 1981, Section 10; also discussed by Gauthier, 1989;
show about its performance in such a scenario is of signifi-
                                                                       Drescher, 2006, Section 6.2; Arntzenius, 2008, Section 7;
cance for evaluating the theory. “It is as if one evaluated a
                                                                       Meacham, 2010, Section 3.2.2]. Let t00 be the time at which
car by testing how it performs underwater.” There is little we
                                                                       the EDT agent sees the content of both boxes. Then before t00 ,
can say about this response. Still, we expect it to be unattrac-
                                                                       EDT recommends committing – and if needed paying money
tive to most decision theorists. After all, our scenarios (in
                                                                       to commit – to one-boxing. After t00 , EDT recommends two-
particular the A DVERSARIAL O FFER) resemble Newcomb’s
                                                                       boxing.7 The evidential decision theorist can easily justify
problem – the problem that has led to the development of
                                                                       this along similar lines: before t00 , her commitment is evi-
CDT in the first place. Further, if our scenarios were out of
                                                                       dence about what is in the boxes, and after t00 it no longer
CDT’s scope, then we (and presumably most other decision
                                                                       is.
theorists) would still be interested in identifying a decision
theory that does make good recommendations for predictable                6
                                                                            For a discussion of similar arguments about other diachronic
agents (such as artificial intelligent agents whose behavior is        Dutch books, see, e.g., Rabinowicz [2008].
                                                                          7
                                                                             Parfit’s (1984) hitchhiker [Barnes, 1997], XOR Blackmail
   5
     There are multiple rock-paper-scissors bots available online      [Soares and Levinstein, 2017, Section 2] and Yankees vs. Red Sox
which attempt to predict their opponent’s future moves based on        [Arntzenius, 2008, pp. 22-23; Ahmed and Price, 2012] similarly ex-
past moves (using data from other players). As of July 2019, the       pose dynamic inconsistencies in EDT. Conitzer [2015] gives a some-
bot at http://www.essentially.net/rsp/ has reportedly played about 2   what different type of scenario – based on the Sleeping Beauty prob-
million rounds and won 57% more often than it lost.                    lem – in which EDT is dynamically inconsistent.


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
   Thus, at least some types of dynamic inconsistency do not         nately, this concept is of no help in the A DVERSARIAL O F -
constitute strong arguments against a decision theory. How-          FER , because none of the three options (buying B1 , buying
ever, in our opinion, the dynamic inconsistency displayed            B2 or declining) is ratifiable. For instance, under the beliefs
by CDT in the A DVERSARIAL O FFER WITH O PT-O UT is                  that would result from knowing that you will take box Bi , it
much more problematic. For one, it leads to a Dutch book.            would be better to buy the other box B3−i .
Often, the main argument that is given for why a particu-               The ratificationist may respond by claiming that unpre-
lar inconsistency is problematic is precisely that it allows         dictable randomization should always be possible. If that
for a Dutch book. Conversely, defenses of dynamic incon-             were true, then the only ratifiable option would be to take
sistencies [Ahmed, 2014, Section 3.2, for an example in a            each box with probability 50%, thus gaining money in ex-
Newcomb-like scenario] often focus on arguing that they do           pectation. But again, we would like to have a decision the-
not allow for Dutch Books.                                           ory that works in a broad variety of scenarios, including ones
   Further, it seems that some of the reasons for (or defenses       where the agent expects to be somewhat predictable. Further-
of) dynamic inconsistency in the above decision problems do          more, even if a true random number generator (TRNG) (e.g.,
not apply to CDT’s dynamic inconsistency in A DVERSARIAL             one based on nuclear decay) is in fact available, this does not
O FFER WITH O PT-O UT. For CDT in Newcomb’s problem,                 settle the issue. For example, consider a variant of the A D -
there is a particular event at time t0 that splits the decision      VERSARIAL O FFER in which the seller refrains from putting
perspectives: the loss of causal control at t0 over the content      money in any box if she predicts the buyer to make different
of box B. Similarly, for EDT in the Newcomb’s problem with           choices depending on the output of the TRNG. In this A NTI -
transparent boxes, that event is the loss of evidential control      R ANDOMIZATION A DVERSARIAL O FFER, again no option
[cf. Almond, 2010, Section 4.5] at t00 over the content of box       is ratifiable: under the beliefs that would result from know-
B. It is thus easy to argue for defenders of the respective theo-    ing that you will make different choices depending on the
ries that the perspectives from before and after t0 or t00 should    TRNG’s output (and therefore choose a box with some posi-
diverge [Ahmed and Price, 2012, pp. 22-23, Section 4]. In            tive probability), you would rather not pick any box. To cir-
sharp contrast, the A DVERSARIAL O FFER WITH O PT-O UT               cumvent this example, the ratificationist could argue that the
lacks any such event between the decision points. The differ-        decision maker should be able to randomize in such a way
ence in perspectives for CDT appears to be purely a result of        that whether he is randomizing is unpredictable. However, at
CDT viewing its current choice differently than it views past        this point, one might just as well assert the impossibility (or
and future decisions.                                                irrelevance) of Newcomb-type scenarios altogether, which we
   All that being said, we agree that caution should be taken        have addressed in 1.
when evaluating a decision theory based on scenarios with               A different strategy for modifying CDT to avoid the Dutch
multiple decisions across time. In general, more research on         book in the A DVERSARIAL O FFER WITH O PT-O UT is the
what conclusions can be drawn from such scenarios is needed          following. The Dutch book arises from a disagreement be-
[Steele and Stefánsson, 2016, Section 6]. Nevertheless, we do       tween CDT on Monday and CDT on Tuesday (cf. the discus-
not see any clear path by which such research would justify          sion under 2). A tempting possibility is to modify CDT so that
CDT’s recommendations in the A DVERSARIAL O FFER WITH                it considers all decisions to be made at once. That is, such a
O PT-O UT. In any case, even if one is at this point unwilling to    version of CDT – let us refer to it as policy-CDT – prescribes
consider scenarios with multiple decision points at all for the      that one decide on one’s general policy all at once.8 In the
purpose of evaluating decision theories, one would still have        A DVERSARIAL O FFER WITH O PT-O UT, there are four possi-
to contend with the simpler A DVERSARIAL O FFER scenario,            ble policies: opt out, buy B1 , buy B2 , and buy nothing (where
in which there is only one decision point.                           the last three possibilities include declining the opt-out offer).
                                                                     When considering these policies, buy nothing dominates opt
3 If a straightforward interpretation of CDT cannot be de-           out. Hence, policy-CDT will decline the opt-out offer and
fended against our scenarios, one may look to modify it to           thereby avoid the Dutch book. (Note, however, that such a
avoid expected or sure loss while preserving some of CDT’s           modification of CDT will make no difference to the choices it
core tenets. In particular, in response to other alleged coun-       prescribes in A DVERSARIAL O FFER, which has only one de-
terexamples, some authors have tried to modify CDT while             cision point. Hence, it will still lose money in expectation.)
maintaining the causal dominance [Joyce, 1999, Section 5.1]             While this appears to be a promising approach, it is non-
a.k.a. sure thing [Gibbard and Harper, 1981, Section 7] prin-        trivial to flesh out, because on other examples it is less clear
ciple [though see Ahmed, 2012, for an argument against the           what policy-CDT should prescribe. For illustration, consider
motivation behind some of these approaches]. For example,            the following interpretation of policy-CDT: follow the pol-
one may turn to the concept of ratifiability. In Newcomb-like        icy to which CDT would like to commit ex ante, where “ex
scenarios such as those under discussion here, for any choice            8
a, we can consider the beliefs about what is in the boxes that             Policy-CDT resembles Fisher’s [nd] disposition-based decision
                                                                     theory. Compare Meacham [2010] for a discussion of explicit pre-
would result from knowing that one will choose a. Then, a
                                                                     commitment. Similarly, Gauthier [1989] has argued for evaluating
choice a is ratifiable if it is an optimal choice – as judged        “plans” not decisions in Newcomb-like problems (without basing
by CDT – under those beliefs. For example, in Newcomb’s              this argument on any particular theory like CDT or EDT). A few au-
problem only two-boxing is ratifiable, precisely because it is       thors have also proposed policy versions of other, more EDT-like de-
causally dominant. For an overview of ratification and its           cision theories [Drescher, 2006, Section 6.2; Yudkowsky and Soares,
relation to CDT, see Weirich [2016, Section 3.6]. Unfortu-           2018, Section 4].


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
ante” refers to some point in time before the first decision of      References
the scenario. Now, let us consider a version of Newcomb’s            [Ahmed and Price, 2012] Arif Ahmed and Huw Price.
problem which is supplemented by another trivial and un-                Arntzenius on ‘why ain’cha rich?’. Erkenntnis, 77(1):15–
related decision – say, whether to eat a peppermint – that              30, 7 2012.
takes place when the agent still has a causal influence over
the prediction. Then the ex-ante-commitment interpretation           [Ahmed, 2012] Arif Ahmed. Push the button. Philosophy of
of policy-CDT would recommend one-boxing. To the causal                 Science, 79(3):386–395, 7 2012.
decision theorist, this may be unacceptable, especially given        [Ahmed, 2014] Arif Ahmed.           Evidence, Decision and
that adding the peppermint decision is such a minor modi-               Causality. Cambridge University Press, 2014.
fication of Newcomb’s problem. Perhaps there is a way to
define policy-CDT that avoids such dependence on irrelevant          [Almond, 2010] Paul Almond. On causation and correlation
decisions while also prescribing two-boxing, but it is not im-          part 1: Evidential decision theory is correct, 9 2010.
mediately obvious how to do so.                                      [Armstrong, 2011] Stuart Armstrong. Anthropic decision
   Many other ways of modifying CDT are worth consider-                 theory, Nov 2011.
ing. For instance, in the A DVERSARIAL O FFER, it may be un-         [Arntzenius, 2008] Frank Arntzenius. No regrets, or: Edith
realistic for the buyer to form a single probability distribution       piaf revamps decision theory. Erkenntnis, 68(2):277–297,
over box contents. Instead, he may consider multiple different          2008.
probability distributions, including one under which box B1
is probably empty and one under which box B2 is probably             [Aumann et al., 1997] Robert J. Aumann, Sergiu Hart, and
empty. He could then evaluate each option pessimistically,              Motty Perry. The absent-minded driver. Games and Eco-
i.e., w.r.t. the probability distribution that is worst under that      nomic Behavior, 20:102–116, 1997.
option. Such a version of CDT would prescribe declining              [Barnes, 1997] R. Eric Barnes. Rationality, dispositions, and
to buy a box. At the same time, it would recommend two-                 the newcomb paradox. Philosophical Studies: An Inter-
boxing in Newcomb’s problem and more generally obey the                 national Journal for Philosophy in the AnalyticTradition,
causal dominance principle. For a discussion of this maxmin             88(1):1–28, 10 1997.
criterion for choice under multiple probability distributions,
                                                                     [Bostrom, 2010] Nick Bostrom. Anthropic Bias: Observa-
see, e.g., Gilboa and Schmeidler [1989] and in particular
game-theoretic interpretations such as that of Grünwald and            tion Selection Effects in Science and Philosophy. Studies
Halpern [2011]. A more general discussion of how using sets             in Philosophy. Routledge, 2010.
of probability distributions (while potentially decision rules       [Bradley, 2012] Seamus Bradley. Dutch book arguments and
other than the maxmin criterion) is offered by Bradley [2012].          imprecise probabilities. In D. Dieks, W. Gonzalez, S. Hart-
In our setting, B1 are B2 are, roughly, complementary bets in           mann, M. Stöltzner, and M. Weber, editors, Probabilities,
the causalist’s beliefs. In all worlds in which Bi is empty,            Laws, and Structures, volume 3 of The Philosophy of Sci-
B3−i is full. As discussed by Bradley, it has been argued that          ence in a European Perspective, pages 3–17. Springer,
a rational agent should accept one of a pair of complementary           Dordrecht, 2012.
bets. Indeed, expected utility maximization for a single prob-       [Brams, 1975] Steven J. Brams. Newcomb’s problem and
ability distribution satisfies this complementarity criterion –
                                                                        prisoners’ dilemma. The Journal of Conflict Resolution,
to the causalist’s detriment in the Adversarial Offer. Bradley
[2012] argues that in general, an agent with imprecise prob-            19(4):596–612, 12 1975.
abilities should not satisfy the complementarity criterion and       [Briggs, 2010] Rachael Briggs. Putting a value on beauty.
that this allows him to avoid Dutch books – though, of course,          volume 3 of Oxford Studies in Epistemology, pages 3–34.
he considers Dutch books of a very different type.                      Oxford University Press, 2010.
                                                                     [Cavalcanti, 2010] Eric G. Cavalcanti. Causation, decision
4 Finally, one may view at least one of the scenarios in                theory, and bell’s theorem: A quantum analogue of the
this paper as supporting a persuasive argument against the              newcomb problem. The British Journal for the Philosophy
very core of CDT. EDT is the obvious alternative. How-                  of Science, 61(3):569–597, 9 2010.
ever, depending on how problematic we find EDT’s prescrip-
tions in other cases – such as the Smoking lesion [Ahmed,            [Conitzer, 2015] Vincent Conitzer. A dutch book against
2014, Section 4.1–4.3] or cases of dynamic inconsistency like           sleeping beauties who are evidential decision theorists.
Newomb’s problem with transparent boxes (and the problems               Synthese, 192(9):2887–2899, 10 2015.
listed in footnote 7) – we may also look to various other            [Daley and Sadowski, 2017] Brendan Daley and Philipp
decision theories that have been proposed [Gauthier, 1989;              Sadowski. Magical thinking: A representation result. The-
Spohn, 2012; Poellinger, 2013; Soares and Levinstein, 2017].            oretical Economics, 12:909–956, 2017.
                                                                     [Drescher, 2006] Gary L. Drescher. Good and Real – De-
Acknowledgements                                                        mystifying Paradoxes from Physics to Ethics. MIT Press,
                                                                        2006.
We thank Johannes Treutlein and Jesse Clifton for comments           [Farber, 2015] Neil Farber. The surprising psychology of
and discussions.                                                        rock-paper-scissors, 4 2015.


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
[Fisher, nd] Justin C. Fisher. Disposition-based decision the-       [Oesterheld, 2019] Caspar Oesterheld. Approval-directed
   ory. n.d.                                                            agency and the decision theory of newcomb-like problems.
[Gauthier, 1989] David Gauthier. In the neighbourhood                   Synthese, 2019.
  of the newcomb-predictor (reflections on rationality).             [Parfit, 1984] Derek Parfit. Reasons and Persons. Oxford
  In Proceedings of the Aristotelian Society, New Series,               University Press, 1984.
  1988–1989, volume 89, pages 179–194. 1989.                         [Piccione and Rubinstein, 1997] Michele Piccione and Ariel
[Gibbard and Harper, 1981] Allan Gibbard and William L.                 Rubinstein. On the interpretation of decision problems
  Harper. Counterfactuals and two kinds of expected util-               with imperfect recall. Games and Economic Behavior,
  ity. In William L. Harper, Robert Stalnaker, and Glenn                20:3–24, 1997.
  Pearce, editors, Ifs. Conditionals, Belief, Decision, Chance       [Poellinger, 2013] Roland Poellinger. Unboxing the con-
  and Time, volume 15 of The University of Western On-                  cepts in newcomb’s paradox: Causation, prediction, de-
  tario Series in Philosophy of Science. A Series of Books in           cision. 2013.
  Philosophy of Science, Methodology, Epistemology, Logic,
                                                                     [Rabinowicz, 2000] Wlodek Rabinowicz. Money pump with
  History of Science, and Related Fields, pages 153–190.
  Springer, 1981.                                                       foresight. In Michael J. Almeida, editor, Imperceptible
                                                                        Harms and Benefits, pages 123–154. Springer, 2000.
[Gilboa and Schmeidler, 1989] Itzhak Gilboa and David
                                                                     [Rabinowicz, 2008] Wlodek Rabinowicz. Pragmatic Argu-
  Schmeidler. Maxmin expected utility with non-unique
                                                                        ments for Rationality Constraints, pages 139–163. Rea-
  prior. Journal of Mathematical Economics, 18:141–153,
                                                                        soning, Rationality and Probability. CSLI Publications,
  1989.
                                                                        2008.
[Grünwald and Halpern, 2011] Peter D. Grünwald and                 [Schwarz, 2015] Wolfgang Schwarz. Lost memories and
  Joseph Y. Halpern. Making decisions using sets of                     useless coins: revisiting the absentminded driver. Syn-
  probabilities: Updating, time consistency, and calibration.           these, 192:3011–3036, 2015.
  Journal of Artificial Intelligence Research, 42, 2011.
                                                                     [Shafir and Tversky, 1992] Eldar Shafir and Amos Tversky.
[Hofstadter, 1983] Douglas Hofstadter. Dilemmas for super-              Thinking through uncertainty: Nonconsequential reason-
  rational thinkers, leading up to a luring lottery. Scientific         ing and choice. Cognitive Psychology, 24(4):449–474,
  American, 248(6), Jun 1983.                                           1992.
[Hájek, 2009] Alan Hájek. Dutch book arguments. In The             [Skalse, 2018] Joar Skalse. A counterexample to perfect de-
   Handbook of Rational and Social Choice, chapter 7. Ox-               cision theories and a possible response. 2018.
   ford University Press, 2009.
                                                                     [Skyrms, 1993] Brian Skyrms. A mistake in dynamic coher-
[Joyce and Gibbard, 1998] James M. Joyce and Allan Gib-                 ence arguments? Philosophy of Science, 60:320–328, 6
   bard. Causal decision theory. In Handbook of Utility The-            1993.
   ory, Volume 1: Principles., chapter 13, pages 627–666.
                                                                     [Soares and Fallenstein, 2014] Nate Soares and Benja Fall-
   Kluwer, 1998.
                                                                        enstein. Toward idealized decision theory. Technical
[Joyce, 1999] James M. Joyce. The Foundations of Causal                 Report 2014-7, Machine Intelligence Research Institute,
   Decision Theory. Cambridge Studies in Probability,                   2014.
   Induction, and Decision Theory. Cambridge University              [Soares and Levinstein, 2017] Nate Soares and Benjamin A.
   Press, 1999.                                                         Levinstein. Cheating death in damascus. In Formal Episte-
[Joyce, 2012] James M. Joyce. Regret and instability in                 mology Workshop (FEW) 2017, University of Washington,
   causal decision theory. Synthese, 187:123–145, 2012.                 Seattle, USA, 5 2017.
[Lewis, 1979] David Lewis. Prisoners’ dilemma is a new-              [Spencer and Wells, 2017] Jack Spencer and Ian Wells. Why
   comb problem. Philosophy & Public Affairs, 8(3):235–                 take both boxes? Philosophy and Phenomenological Re-
   240, 1979.                                                           search, 2017.
[Masel, 2007] Joanna Masel. A bayesian model of quasi-               [Spohn, 2012] Wolfgang Spohn. Reversing 30 years of dis-
  magical thinking can explain observed cooperation in the              cussion: why causal decision theorists should one-box.
  public good game. Journal of Economic Behavior & Or-                  Synthese, 187(1):95–122, 2012.
  ganization, 64(2):216–231, 10 2007.                                [Steele and Stefánsson, 2016] Katie Steele and H. Orri
[Meacham, 2010] Christopher J. G. Meacham. Binding and                  Stefánsson. Decision theory. 2016.
  its consequences. Philosophical Studies, 149(1):49–71, 5           [Vineberg, 2016] Susan Vineberg. Dutch book arguments.
  2010.                                                                 In Edward N. Zalta, editor, The Stanford Encyclopedia of
[Nozick, 1969] Robert Nozick. Newcomb’s problem and                     Philosophy. Metaphysics Research Lab, Stanford Univer-
  two principles of choice. In Nicholas Rescher et al., ed-             sity, spring 2016 edition, 2016.
  itor, Essays in Honor of Carl G. Hempel, pages 114–146.            [Weirich, 1985] Paul Weirich. Decision instability. Aus-
  Springer, 1969.                                                       tralasian Journal of Philosophy, 63(4):465–472, 1985.


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
[Weirich, 2016] Paul Weirich. Causal decision theory. In The
  Stanford Encyclopedia of Philosophy. Spring 2016 edition,
  2016.
[Yudkowsky and Soares, 2018] Eliezer Yudkowsky and Nate
  Soares. Functional decision theory: A new theory of in-
  strumental rationality, 5 2018.
[Yudkowsky, 2010] Eliezer Yudkowsky. Timeless decision
  theory, 2010.




Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).