<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Formalization of Gambler's Ruin Problem in Isabelle/HOL</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Ecole Polytechnique, Rte de Saclay</institution>
          ,
          <addr-line>91120 Palaiseau</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institut Polytechnique de Paris</institution>
          ,
          <addr-line>5 Av. Le Chatelier 2ème étage, 91764 Palaiseau</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The gambler's ruin problem is an exciting application of probability theory which is generally used to analyze various practical scenarios like the security of bitcoin protocol. In this article, we provide a complete analysis of the formalization of the gambler's ruin problem in Isabelle/HOL. First, we present a comprehensive background and pen-and-paper calculation. Second, we summarise how to quantify the gambler's ruin problem and prepare all necessary intermediate conclusions. Third, we explain the dificulties we faced during the final formalization and analyze the strategies to overcome these barriers. Our final result: The recursive probability equation aims to establish the complete quantitative analysis of random walks and assist others in developing advanced probability analysis based on what we endeavour here.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Formal Verification</kwd>
        <kwd>Gambler's Ruin Problem</kwd>
        <kwd>Probability Theory</kwd>
        <kwd>Random Walk</kwd>
        <kwd>Theorem Proving</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Gambler’s Ruin Problem</title>
      <sec id="sec-2-1">
        <title>2.1. Problem introduction</title>
        <p>Karl Sigman describes the gambler’s ruin problem as follows [2]:</p>
        <p>Let N ≥ 2 be an integer and let 1 ≤  ≤  − 1. Consider a gambler who starts
with an initial fortune of $i and then on each successive gamble either wins $1 or
loses $1 independent of the past with probabilities  and  = 1 −  respectively.
Let  denote the total fortune after the ℎ gamble. The gambler’s objective is to
reach a total fortune of $ , without first getting ruined (running out of money). If
the gambler succeeds, then the gambler is said to win the game. In any case, the
gambler stops playing after winning or getting ruined, whichever happens first.</p>
        <p>Tom Leighton and Ronitt Rubinfeld further generalize, in their lecture notes of random walk
[3] , the problem into one-dimensional random walk mode as below:</p>
        <p>
          This problem is a classic example of a problem that involves a one-dimensional
random walk. In such a random walk, there is some value, say the number of
dollars we have, that can go up or down or stay the same at each step with some
probabilities. In this example, we have a random walk in which the value can go
up or down by 1 at each step [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Basic calculation of Gambler’s Ruin Problem</title>
        <p>According to the description above, we could formalize the gambler’s ruin problem as a
onedimensional model below.</p>
        <p>Assume we start with  &gt; 0 and randomly walk of +1 with the possibility of p or − 1 with
the possibility of 1 − , then we end this game once we get  &gt;  or 0 . The possibility of
reaching  from initially  without any stop is what we want to calculate and formalize in the
end.</p>
        <p>Let  be the possibility of successfully reaching m with initial n. Then it is fairly easy to
derive the equations:
 = +1 + (1 − )− 1 if 0 &lt;  &lt; 
0 = 0
 = 1
Proof: 0 = 0 since we have already ruined with initial 0. = 1 since we’ve already won
with initial . Let E be the events that first step is plus 1. Let initial denotes the initial number
and target denotes the target ending number, then:
 = Pr( target =  | init = )
= Pr( target =  | init =  ⋀︀ ) Pr( | init = ) + Pr( target =  | init =  ⋀︀ ¯) Pr( ¯ |
init = )
=  Pr( target =  | init =  + 1) + (1 − ) Pr( target =  | init =  − 1)
= +1 + (1 − )− 1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Formalization of Gambler’s Ruin Problem</title>
      <p>In Gambler_Ruin_Problem.thy, we will construct the formalization of a specific random walk
model coordinated with gambler’s ruin problem.</p>
      <sec id="sec-3-1">
        <title>3.1. Theory Infinite_Coin_Toss_Space</title>
        <p>In order to construct the formal model in gambler’s ruin problem, we start with the existing
formalization in the Theory Infinite_Coin_Toss_Space[ 4] which constructed the probability
space on infinite sequences of independent coin tosses.</p>
        <p>locale infinite_coin_toss_space =
ifxes  : real and  :: "bool stream measure"
assumes p_gt_0: "0 ≤ ′′
and p_lt_1: "p ≤ 1"
and bernoulli: "M = bernoulli_stream  "</p>
        <p>The only concept that needs to be elaborated here is bernoulli assumption. ’a stream is
the type of infinite sequence with all elements of type ’ . The bernoulli stream is a stream
measure with space Ω composed of all the elements of type bool stream, power set A filled
with all the measurable subsets under this specific measure and measure function  produced
by infinite Cartesian product. All the probability of occurrence of elements in a specific set
 can be described as the measure value of  if and only if  is the measurable set of this
bernoulli stream M. In fact, it is set up by producting countable measures of boolean space with
measuring {True} to p and { False } to 1 − . Despite the complicated and tedious construction of
the bernoulli stream, we make a specific diagram [2] here to clarify the structure and mechanism
of the bernoulli stream M which we will mention later frequently.
3.2. Gambler model
fun (in infinite_coin_toss_space) gambler_rand_walk_pre: "int
(nat ⇒ bool stream ⇒ int)" where
base: "gambler_rand_walk_pre u d v 0 w = v"
step1: "gambler_rand_walk_pre u d v (Suc )  = (( 
)(ℎ  ))+ gambler_rand_walk_pre u d v n w "
⇒ int ⇒ int ⇒
⇒  |   ⇒
fun (in infinite_coin_toss_space) gambler_rand_walk:: "int ⇒ int ⇒ int
⇒ (enat ⇒ bool stream ⇒ int)" where
"gambler_rand_walk u d vnw = (case  of enat  ⇒ (gambler_rand_walk_pre u
vw ) /∞ ⇒ − 1 )"</p>
        <p>The function gambler_ran_walk extends the fourth parameter by adding ∞ as new input.
We define it because we found it very tough to describe the position where the specific random
walk stops, for the first time, by reaching the threshold if the natural number is the only allowed
input as what gambler_ran_walk_pre defines. Since some infinite random walks will never stop,
we must allocate ∞ as the input coordinated with those nonstop cases and extend the type of
steps from nat to enat. Nevertheless, if someone wants to base their further analysis on our
endeavour here, please be cautious of or even avoid discussing the case that the initial number
and target number is negative since we map ∞ to − 1. The lemma exist demonstrates that
non-stop random walks will never succeed in reaching the target, which is the best explanation
why we assign − 1 as the output of ∞ in gambler_ran_walk.</p>
        <p>locale gambler_model = infinite_coin_toss_space +
ifxes geom_proc::"int ⇒ bool stream ⇒ enat ⇒ int"
assumes geometric_process:"geom_proc init x step = gambler_rand_walk 1 (-1)
init step x"</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Basic functions</title>
        <p>Here we define all the basic functions which will play an important role in the further probability
analysis.</p>
        <p>• Fun reach_steps describes all the steps where the input random walk reaches the threshold
0, target
• Fun stop_at describes the first step in the reach_steps sets, which means exactly the
stopping point in gambler’s ruin problem. Note that, here the type of output has been
extended to enat, which means stopping point will be ∞ (equivalent to non-existence)
• Fun success describes the random walk reaching the target number rather than ruining
at stopping point</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.4. Important intermediate conclusions</title>
        <p>In this section, we will specify many necessary intermediate lemmas to lay a solid foundation for
further analysis. Regardless of plenty of complex lemmas formalized, there are two significant
things we pursue:
• Trying to clarify that our definition for gambler_rand_walk is reasonable by proving the
abnormal situation doesn’t exist:
• lemma exist demonstrate that the weird situation where random walk succeeds at ∞ will
disappear once we set target to be positive, which is the reason why we set − 1 as the
output of ∞ over function geom_proc
• Demonstrating the ways we count are independent with the numeral results we calculate:
• lemma additional1 states that the reaching number doesn’t change if we want to calculate
from the second step
• lemma conditional 2 states that whether a random walk succeeds or not doesn’t change
if we calculate from second step
• lemma fst_true_plus_one states that we add 1 to initial number if first step is true.
• lemma conditional_set_equation states that the set where all random walk in it succeeds
and their first step are True doesn’t change if we calculate from second step.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.5. Probability equation</title>
        <p>In this section, we start to analyse the probability of successful random walk. There are two
lemmas which pose enormous dificulty and deserve more explanation from diferent aspects:
Lemma success_measurable and Lemma semi_goal_true</p>
        <p>probability_of_win is the function describing the possibility of successful random walks with
initial number and target number as inputs
fun probability_of_win:: "int ⇒ int ⇒ ennreal " where
"probability_of_win init target = emeasure  { ∈ space  . success init  target}"</p>
        <sec id="sec-3-4-1">
          <title>3.5.1. Successful random walk set is measurable</title>
          <p>First, we introduce this most challenging lemma we have met during this model formalization.
Lemma success_measurable asserts that successful random walks set under the assumption "0
≤ initialnumber ≤ targetnumber" is a measurable set for measure  .</p>
          <p>On the one hand, the lack of discrete analysis library might have been the most significant
factor causing the trouble here. Since the probability theory has been set up based on the
measure theory, every specific set must be proved to be measurable concerning fixed measure
before we calculate the probability of the set, which severely hinders most scholars and experts
from formalizing the security analysis related to the probability since it is extremely dificult to
prove why your set is measurable. That is precisely why our endeavour matters to provide an
example to overcome the dificulty.</p>
          <p>On the other hand, the chaos given from the gambler’s ruin model itself leads to a large part of
dificulties here. The topological space where we base our discussion here is the infinite Cartesian
product which makes our formalization deteriorate since it tends to be more challenging to
clarify the topology on the infinite Cartesian product. The function success we rely on is not
the easy one to construct(see our discussion in section 3.3. Its complex definition increases the
dificulties when we handle the measurability.</p>
          <p>we will briefly explain here the way we prove this lemma since it is nontrivial even for
pen-and-paper proof.</p>
          <p>• Lemma finite_stake_measurable states that for the function (  w. stake nw) listing the
ifrst n steps of random walk, the preimage of a finite set is measurable for measure M.
• Lemma finite_image states that sets filled with all bool lists of fixed length n is finite.
Lemma success_measurable 2 is the most important intermediate lemma prepared for
lemma success_measurable. It clarifies that any list in the image of successful random
walk over function stake will never contain another shorter list corresponding to another
successful random walk, which sets up the bijection between successful random walks
stopping at fixed-step and preimage of successful bool list with identical length.
• Lemma success_measurable1 demonstrates that a set of successful random walks is a
countable union of sets of successful random walks stopping at some step.</p>
          <p>Combining theses 4 lemmas together proves the set of successful random walks is measurable.</p>
          <p>These lemmas are surprisingly hard to prove. Honestly,the successful formalization of this
part cannot be accomplished without the current stochastic process theory library established
just in 2021 by Mnacho Echenim, the author of theory infinite_coin_toss_space.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.5.2. Probability of successful random walk with its first step True</title>
          <p>The lemma semi_goal_true is the second dificulty we have overcome during the model
formalization. It asserts that the probability of sets of successful random walk with first step True is
equal to probability of sets of random walk times probability of sets of successful random walk
with initial number plus 1 .</p>
          <p>According to the definition of measure value of subset in the infinite Cartesian product,
we need to calculate a sequence of measure value of simple measurable sets constructed by
multiplying its infinite projections and carefully drives to the limit of that sequence as the
measure value we desire. As everyone knows, this approach is not feasible since the set we
measure is too irregular to nfid the sequence of simple measurable sets to cover it. So how to
calculate such an irregular measurable set?</p>
          <p>Here we briefly explain the deduction. The following two lemmas are the most important
intermediate conclusion we prepare:
• Lemma semi_goal1 states the equation between the measure value of set of successful
random walks calculated from either first step or second step, which facilities our prepare
for the function with first step as input and measure value as output.
• Lemma semi_goal2_final declares the function we set up in lemma semi goal1 can be
calculated in integral prefectly.</p>
          <p>Thanks to the lemma emeasure_stream_space provided by Mnacho Echenim, the author of
infinite_coin_toss_space, we can finally use the integral rather than tediously break down
the countable product to calculate the probability. The things we cooperate to formalize
here are providing the function for the integral (lemma semi_goal1) and proving its relevant
properties(lemma semi_goal2_final). Although our formalization went well after establishing
this two lemmas, it still took us more than two weeks to accomplish the lemma semi_goal_true</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>3.5.3. Final goal: establish the recursive probability equation</title>
          <p>The final probability equation we want to formalize:</p>
          <p>= +1 + (1 − )− 1
lemma Recursive_probability_equation:
ifxes init target
assumes "0&lt; init" "init &lt; target"
shows "probability_of_win init target = * (probability_of_win (init +1) target)
+(1 − )* (probability_of_win (init - 1) target)"
⟨ proof ⟩</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>It seems that the dificulties of formalizing the gambler’s ruin problem are beyond our initial
imagination. By describing and explaining what we endeavour to construct through all the
chapters, we have experienced this arduous but exciting formalization journey again. Fortunately,
we succeed in formalizing all the quantitative probability analysis of this model and resolving
the dificulties such as securing the measurability of sets calculated later and calculating the
measure value of the sets by integral equation. Our success will not come true without many
excellent existing entries like theory In finite_Coin_Toss_Space and our persistence.</p>
      <p>The dificulties we faced arise from multiple aspects. However, the two most critical aspects
are the inconvenience originating from the problem itself and the lack of appropriate entries
and libraries preparing enough discrete probability analysis. Honestly, we have no desire to
establish such a perfect library of discrete probability analysis ultimately. However, we hope
that our contribution here could ofer an example for other excellent scholars when they face a
similar challenge like us in the future.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Future work</title>
      <p>The short-term goal shortly is formalizing the complete security analysis of the bitcoin
mechanism. In the bitcoin whitepaper, Satoshi proposed the link between Gambler’s ruin problem
and an attacker model of bitcoin network [5]:</p>
      <p>The probability of an attacker catching up from a given deficit is analogous to a Gambler’s
Ruin problem. Suppose a gambler with unlimited credit starts at a deficit and plays potentially
an infinite number of trials to try to reach breakeven. We can calculate the probability he ever
reaches breakeven or that an attacker ever catches up with the honest chain.</p>
      <p>
        So the next step for us is trying to apply our well-prepared gambler’s ruin model to the
security analysis of the bitcoin network by formalizing the analysis of the attacker model there.
Since right now all the fundamental models and mathematical tools have been formalized in
AFP libraries [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and our work here, We have the confidence to accomplish this goal before the
end of internship at around 27ℎ August.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgements</title>
      <p>The author was supported by the ERC Advanced Grant ALEXANDRIA (Project GA 742178)
funded by the European Research Council and led by Professor Lawrence Paulson. The author
was also supervised by Prof. Lawrence Paulson and Dr. Anthony Bordg at the University of
Cambridge, UK.
[2] K. Sigman, Gambler’s ruin problem, 2016. URL: http://www.columbia.edu/~ks20/FE-Notes/
4700-07-Notes-GR.pdf .
[3] T. Leighton, R. Rubinfeld, Random walk lecture notes, 2016. URL: http://web.mit.edu/neboat/</p>
      <p>Public/6.042/randomwalks.pdf .
[4] M. Echenim, Pricing in discrete financial models, Arch. Formal Proofs 2018 (2018).
[5] S. Nakamoto, A. Bitcoin, A peer-to-peer electronic cash system, Bitcoin.–URL:
https://bitcoin. org/bitcoin. pdf 4 (2008).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Nipkow</surname>
          </string-name>
          , M. Wenzel, L. C. Paulson, Isabelle/HOL:
          <article-title>a proof assistant for higher-order logic</article-title>
          , Springer,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>