More about left recursion in PEG

                             Roman R. Redziejowski

                                  roman@redz.se


      Abstract. Parsing Expression Grammar (PEG) is extended to handle
      left recursion, and under specified conditions becomes a correct parser
      for left-recursive grammars in Backus-Naur Form (BNF).


1   Introduction
The technique of parsing by recursive descent assigns to each grammatical con-
struct a procedure that calls other procedures to process components of that
construct. As grammars are usually defined in a recursive way, these calls are
recursive. This method encounters two problems:
(1) Procedures corresponding to certain type of construct must chose which
     procedure to call next.
(2) If the grammar contains left recursion, a procedure may call itself indefinitely.
    Problem (1) has been traditionally solved by looking at the next input sym-
bol(s), which works if the grammar satisfies a condition known as LL(n). Another
way is to try the alternatives one by one, backtracking in the input, until one
succeeds (or all fail).
    Making full search can require exponential time, so a possible option is limited
backtracking: never return after a partial success. This method has been used in
actual parsers [4,10] and is described in literature [1–3,7]. It has been eventually
formalized by Ford [5] under the name of Parsing Expression Grammar (PEG).
    Problem (2) is serious because of a strong tendency to present grammars in
left-recursive form. Converting the grammar to right-recursive form is possible,
but is tedious, error-prone, and obscures the spirit of the grammar.
    The problem has been in the recent years solved by what can be called
”recursive ascent”. Each recursion must end up in a part of syntax tree that
does not involve further recursion. It has been referred to as the ”seed”. After
identifying the seed, one reconstructs the syntax tree upwards, in the process of
”growing the seed”. Extensions to PEG using this method have been described
in [6, 8, 12, 15–17].
    The paper tries to find out under which conditions this process will work
correctly. The idea of ”working correctly” needs an explanation. One of the most
common ways to define the syntax of a formal language is the Backus-Naur Form
(BNF) or its extended version EBNF. We treat here PEG as a parser for BNF
that implements recursive descent with limited backtracking. Because of limited
backtracking, PEG may miss some strings that belong to the language defined
by BNF. The author has previously tried to answer the question under which
conditions PEG will accept exactly the language defined by BNF (see [13, 14]).
This paper tries to answer the same question for PEG equipped with recursive
ascent technique to handle left recursion.
    We look here at a process inspired by Hill [6]. Section 2 introduces a subset
of BNF grammar with natural semantics that is a slight modification of that due
to Medeiros [9, 11]. In Section 3 we develop some concepts needed to discuss left
recursion. In Section 4 we describe the parsing process, check that it terminates,
and state the conditions under which it reproduces the BNF syntax tree. The
last section contains some comments. Proofs of the Propositions are found in
Appendix.

2   The BNF grammar
We consider an extremely simplified form of BNF grammar over alphabet Σ. It
is a set of rules of the form A = e where A belongs to a set N of symbols distinct
from the letters of Σ and e is an expression. Each expression is one of these:
                    ε (”empty”),                  e1 e2 (”sequence”),
                    a ∈ Σ (”letter”),             e1 |e2 (”choice”).
                    A ∈ N (”nonterminal”).
where each of e1 , e2 is an expression and ε denotes empty word. The set of all
expressions is in the following denoted by E. There is exactly one rule A = e for
each A ∈ N . The expression e appearing in this rule is denoted by e(A).
    Each expression e ∈ E has its language L(e) ⊆ Σ ∗ defined formally by natural
semantics shown in Figure 1. String x belongs to L(e) if and only if [e]x BNF
                                                                            → x
can be proved using the inference rules from Figure 1. Note that if a proof of
                                       → x for every y ∈ Σ ∗
     → x exists, one can prove [e]xy BNF
[e]x BNF


                                                                        → x
                                                               [e(A)]w BNF
                    (empty)                     (letter)                      (rule)
       [ε]w → ε
            BNF
                                [a]aw → aBNF
                                                                       → x
                                                                 [A]w BNF
                → x [e2 ]w BNF
       [e1 ]xw BNF          → y
                                    (seq)
                       → xy
           [e1 e2 ]xw BNF
                → x
        [e1 ]w BNF                               → x
                                         [e2 ]w BNF
                        (choice1)                          (choice2)
       [e1 |e2 ]w → x
                  BNF
                                        [e1 |e2 ]w BNF
                                                    → x


                           Fig. 1. Formal semantics of BNF

One can read [e]w BNF → x as saying that expression e matches prefix x of string
w. Thus, x ∈ L(e) means that e matches the string x. We say that expression e
is nullable to mean that ε ∈ L(e).
    Figure 2 is an example of formal proof using the rules from Figure 1. It
verifies that the string baac belongs to L(S) as defined by this grammar:
                          S = Ac         A = Aa|B          B = b
                            → b
                   [b]baac BNF
                           → b
                  [B]baac BNF
                             → b
                 [Aa|B]baac BNF
                           → b
                  [A]baac BNF                  → a
                                        [a]aa BNF
                                    → ba
                          [Aa]baac BNF
                                     → ba
                         [Aa|B]baac BNF
                                   → ba
                          [A]baac BNF                       → a
                                                      [a]a BNF
                                              → baa
                                    [Aa]baac BNF
                                               → baa
                                   [Aa|B]baac BNF
                                             → baa
                                    [A]baac BNF                           → c
                                                                    [c]c BNF
                                                           → baac
                                                 [Ac]baac BNF
                                                           → baac
                                                  [S]baac BNF

                              Fig. 2. Example of BNF proof

The proof can be represented in the inverted form shown in Figure 3. This
seems more intuitive when speaking about ”recursive descent”. The diagram on
the right is simplified to show only the expressions. It is the syntax tree of baac.

                                           → baac
                                  [S]baac BNF                                    S
                                 [Ac]baac → baac
                                            BNF
                                                                                 Ac
                        [A]baac → baa
                                 BNF
                                                  [c]c → c
                                                      BNF
                                                                             A        c
                                   → baa
                       [Aa|B]baac BNF                                   Aa|B
                        [Aa]baac → baa
                                  BNF
                                                                            Aa
              [A]baac → ba
                        BNF
                                       [a]a → a
                                           BNF
                                                                        A        a
                         → ba
             [Aa|B]baac BNF                                         Aa|B
             [Aa|B]baac → b
                          BNF
                                                                        Aa
                 → b
        [A]baac BNF             → a
                         [a]aa BNF                                  A        a
      [Aa|B]baac → b
                  BNF
                                                                  Aa|B
        [B]baac → b
                BNF
                                                                    B
                 → b
        [b]baac BNF                                                 b

                          Fig. 3. BNF tree and syntax tree
3       Left recursion

3.1         Recursion classes

For e ∈ E define first(e) as follows:

       first(ε) = first(a) = ∅,         first(A) = {e(A)},
                                                        {
                                                          {e1 } if ε ∈ / L(e1 ),
       first(e1 |e2 ) = {e1 , e2 },     first(e1 e2 ) =
                                                          {e1 , e2 } if ε ∈ L(e1 ).

Define further First to be the transitive closure of relation first. In the fol-
                   first                                    First
lowing, we write e −→ e′ to mean that e′ ∈ first(e), and e −→ e′ to mean that
e′ ∈ First(e).
                                             First
    An expression e is left-recursive if e −→ e. Let R ⊆ E be the set of all left-
recursive expressions. Define relation Rec ⊆ R × R so that (e1 , e2 ) ∈ Rec means
   First  First
e1 −→ e2 −→ e1 . It is an equivalence relation that defines a partition of R into
equivalence classes; we refer to them as recursion classes. The recursion class
containing expression e is denoted by C(e) .
    Let e be an expression belonging to recursion class C. If e = A ∈ N or
e = e1 e2 with non-nullable e1 , the expression e′ ∈ first(e) must also belong
to C. This is so because e′ is the only element in first(e), and we must have
    first    First              First
e −→ e′ −→ e to achieve e −→ e. In e = e1 |e2 or e = e1 e2 with nullable e1 , one of
expressions e1 or e2 may be outside C. It is a seed of C, and e is an exit of C. For
e = e1 e2 in C, e2 is a leaf of C. The set of all seeds of C is denoted by Seed(C)
and the set of all its leafs by Leaf(C).
    As an example, the grammar used in Figure 2 has R = {A, Aa|b, Aa}. All
these expressions belong to the same recursion class with exit Aa|b, seed b and
leaf a.
   To simplify the discussion, we asume in the following that all exits have the
form e1 |e2 , that is, e1 ∈ R in e1 e2 is not nullable. As the ordering of BNF choice
expression does not influence its language, we assume that e2 in e1 |e2 is always
the seed.


3.2         Recursion sequence

Consider a node in the syntax tree and the leftmost branch emanating from it.
                                      first
It is a chain of nodes connected by −→. If the branch contains any left-recursive
expressions, expressions belonging to the same recursion class C must form an
uninterrupted sequence. Indeed, if e1 and e2 appearing in the branch belong
                   First First  First
to C, we have e1 −→ e −→ e2 −→ e1 for any e in between. Such sequence is a
recursion sequence of class C. The same argument shows that the branch can
contain at most one recursion sequence of a given class, and that sequences of
diﬀerent classes cannot be nested.
   The last expression in the sequence must be an exit of C. One can easily see
that each recursion class must have at least one exit. We assume in the following
that this is the case.
      first
Let e −→ e′ be two consecutive expressions in a recursion sequence. Let [e]w BNF  → x
and [e′ ]w BNF
             → x′ . Expression e can be one of these:
  – A = e′ . Then x = x′ according to (rule).
  – e′ |e2 . Then x = x′ according to (choice1).
  – e1 |e′ . Then x = x′ according to (choice2).
  – e′ e2 . Then x = x′ y where y ∈ L(e2 ) according to (seq).
Define adv(e, e′ ) = {ε} in each of the first three cases and adv(e, e′ ) = L(e2 ) in
the last. For a recursion sequence s = e[n] , e[n−1] , . . . , e[2] , e[1] define

                    adv(s) = adv(e[n] , e[n−1] ) . . . adv(e[2] , e[1] ).

For e ∈ R define Adv(e) to be the union of adv(s) for all recursion sequences s
starting with e, ending with e, and not containing e. The following is easy to
see:

Lemma 1. If [e]w BNF              → x′ are two consecutive results for the same
                    → x and [e]w BNF
e in a recursion sequence, we have x ∈ x′ Adv(e).


4   Parsing

Given a BNF grammar, we define for each e ∈ E a parsing procedure named [e].
The procedure may return ”success” after possibly ”consuming” some input,
or ”failure” without consuming anything. In the following, the result of calling
[e] for input w is denoted by [e]w; it is either fail or the consumed string x
(possibly ε).
    The action of parsing procedure [e] for e ∈   / R is the same as in PEG and is
shown in Figure 4.
    The procedure [e] for e ∈ R is a ”grower” for the recursion class C(e) and
input string w. The grower has a ”plant” [e′ , w] for each expression e′ ∈ C(e).
The plant is a procedure with memory. The procedure emulates the action of
parsing procedure [e′ ], but may use results from other plants instead of calls
to parsing procedures. It computes a result as shown in Figure 5. The memory
holds the result of procedure applied to w. It is denoted by ⟨e′ , w⟩. The grower
initializes all its plants with ⟨e′ , w⟩ = fail, and then repeatedly calls all plants
in some order. If the result is better than already held by the plant, it replaces
the latter. (A string is better than fail, and longer string is better than shorter
one.) The grower stops when it cannot improve any result. The result in [e, w]
is then the result of parsing procedure [e].
    In order to create a formal record of parsing process, we represent actions
of parsing procedures and plants by inference rules shown in Figure 6. A rule
with conclusion [e]w = X represents a call to [e] returning X. One with conclu-
sion ⟨e, w⟩ = x represents setting new result x in ⟨e, w⟩. A premise [e′ ]w = X
represents call a to sub-procedure [e′ ] returning X; a premise ⟨e′ , w⟩ = X rep-
resents result obtained from [e′ , w]. With these conventions, parsing process is
represented as a formal proof.
    An example of such proof is shown in Figure 7. It represents parsing process
for the string and grammar from example in Figure 3. Note that part of that
tree was constructed by going down and part by going up.

    [ε]       Indicate success without consuming any input.
    [a]       If the text ahead starts with a, consume a and return success.
              Otherwise return failure.
[A = e1 ] Call [e1 ] and return result.
 [e1 e2 ]     Call [e1 ]. If it succeeded, call [e2 ] and return success if [e2 ] succeeded.
              If [e1 ] or [e2 ] failed, backtrack: reset the input as it was before the invo-
              cation of [e1 ] and return failure.
 [e1 | e2 ]   Call [e1 ]. Return success if it succeeded. Otherwise call [e2 ] and return
              success if [e2 ] succeeded or failure if it failed.

                            Fig. 4. Actions of parsing procedures


    [A, w]       Return ⟨e(A), w⟩.
 [(e1 e2 ), w]   If ⟨e1 , w⟩ = fail, return fail.
                 If ⟨e1 , w⟩ = x, call [e2 ] on z where w = xz and restore input to w.
                 If [e2 ]z = fail, return fail.
                 Otherwise return x [e2 ]z.
 [(e1 | e2 ), w] If ⟨e1 , w⟩ ̸= fail, return ⟨e1 , w⟩.
                 If ⟨e1 , w⟩ = fail and there exists plant [e2 , w], return ⟨e2 , w⟩.
                 Otherwise call [e2 ], restore input to w, and return [e2 ]w.

                                   Fig. 5. Actions of plants

   We say that ”parsing procedure [e] handles string w” to mean that the pro-
cedure applied to w terminates and returns either x ∈ Σ ∗ or fail.


Proposition 1. Each parsing procedure [e] for e ∈ E handles all w ∈ Σ ∗ .

Proof is found in the Appendix.


Proposition 2. For each result [e]w = x or ⟨e, w⟩ = x in a parse tree there
                       → x.
exists a proof of [e]w BNF

Proof is by induction on height of the subtree.
     The conditions under which each BNF tree has a corresponding parse tree
depend on the context in which certain expression may be used. We assume that
the grammar has a start expression S and use as context the BNF proof for
      → u for u ∈ Σ ∗ .
[S]u BNF
     For e ∈ E, we define Tail(e) as the set of all strings that may follow a string
                                    → u. More precisely, it is the set of strings z in
matched by e in the proof of [S]u BNF
                  → x that appear in that proof. A possible method for estimating
all results [e]xz BNF
Tail(e) can be found in [14]. For e ∈ R, TailR (e) excludes occurrences of e in
a recursion path except the first one.
     The ordering of choice expression is essential for rules (choice2.p), (choice2.g),
and(choice3.g). But, the ordering does not aﬀect the language defined by BNF
rules. Therefore, we can always rearrange the choice as is best for the parsing.

Proposition 3. If the grammar satisfies the following conditions (1)-(3) then
                       → u there exists parse tree with root [S]u = u. Moreover,
for each proof of [S]u BNF
                        → x of that proof exists parse tree with root [e]w = x or
for each subproof [e]w BNF
⟨e, w⟩ = x.

               For each e = e1 |e2 ∈ R, e2 ∈
                                           / C(e);                                 (1)
               For each e = e1 |e2 , L(e1 )Σ ∗ ∩ L(e2 )Tail(e) = ∅;                (2)
                                            ∗
               For each e ∈ R, Adv(e)Σ ∩ TailR (e) = ∅.                            (3)

Proof is found in the Appendix.


5    Comments
We presented suﬃcient conditions under which the extended PEG is a correct
parser for BNF grammars. Because we treat PEG as parser for BNF, it does not
include syntactic predicates of classical PEG.
    We chose here the scheme for handling left recursion inspired by [6] because
it seems easy to analyze. The scheme where the grower scans all plants even if
nothing changes is also easy to analyze; in a practical implementation the plant
would be called only if its argument(s) change.
    Checking (2) in the presence of left recursion is not simple. Without left
recursion, one can use approximation by prefixes, with LL(1) as the extreme
case. The languages defined by left recursion often have identical prefixes and
diﬀer only at the far end.
    The chosen scheme required a rather severe restriction (1) to the grammar.
It seems possible to replace it by a requirement that languages of diﬀerent seeds
of the same recursion class are disjoint, as well as languages of expressions in
first−1 (e) for e ∈ R. This is the subject of further research, as well as attempts
to analyze other schemes.
                           [e(A)]w = X             ⟨e, w⟩ = X
            (empty.p)                  (rule.p)               (grow.p)
[ε]w = ε                     [A]w = X               [e]w = X
                                b ̸= a
             (letter1.p)                 (letter2.p)                    (letter3.p)
[a]aw = a                   [b]aw = fail                 [a]ε = fail
[e1 ]xz = x [e2 ]z = y                [e1 ]w = fail
                       (seq1.p)                       (seq2.p)
     [e1 e2 ]xz = xy                 [e1 e2 ]w = fail
[e1 ]xz = x [e2 ]z = fail
                          (seq3.p)
      [e1 e2 ]xz = fail
  [e1 ]w = x                    [e1 ]w = fail [e2 ]w = X
                (choice1.p)                              (choice2.p)
[e1 | e2 ]w = x                        [e1 | e2 ]w = X

⟨e(A), w⟩ = X                 ⟨e1 , xz⟩ = x [e2 ]z = y
              (rule.g)                                 (seq1.g)
 ⟨A, w⟩ = X                      ⟨(e1 e2 ), xz⟩ = xy
⟨e1 , xz⟩ = x [e2 ]z = fail                 ⟨e1 , w⟩ = fail
                            (seq2.g)                           (seq3.g)
    ⟨(e1 e2 ), xz⟩ = fail                 ⟨(e1 e2 ), w⟩ = fail
   ⟨e1 , w⟩ = x                    ⟨e1 , w⟩ = fail ⟨e2 , w⟩ = X
                    (choice1.g)                                 (choice2.g)
⟨(e1 | e2 ), w⟩ = x                       ⟨(e1 | e2 ), w⟩ = X
⟨e1 , w⟩ = fail [e2 ]w = X                         e∈R
                           (choice3.g)                       (init.g)
      ⟨(e1 | e2 ), w⟩ = X                    ⟨e1 , w⟩ = fail

where X denotes x or fail.

                        Fig. 6. Formal semantics of parser


                           [b]baac = b
      ⟨Aa, baac⟩ = fail [B]baac = b
             ⟨Aa|B, baac⟩ = b
               ⟨A, baac⟩ = b              [a]aac = a
                         ⟨Aa, baac⟩ = ba
                       ⟨Aa|B, baac⟩ = ba
                          ⟨A, baac⟩ = ba               [a]ac = a
                                    ⟨Aa, baac⟩ = baa
                                   ⟨Aa|B, baac⟩ = baa
                                     ⟨A, baac⟩ = baa
                                      [A]baac = baa              [c]c = c
                                                [Ac]baac = baac
                                                 [S]baac = baac

                           Fig. 7. Example of parse tree
A     Appendix
A.1    Proof of Proposition 1
The proof is by induction on the length of w with Lemma 3 as induction base
and Lemma 2 as induction step.                                           ⊓
                                                                         ⊔
Lemma 2. If each parsing procedure handles all words of length n or less, each
parsing procedure handles all words of length n + 1.
Proof. Let the rank of expression e, denoted ρ(e), be defined as follows:
  – ρ(ε) = ρ(a) = 0, and otherwise:
          / R, highest ρ(e′ ) for e′ ∈ First(e) plus 1;
  – For e ∈
  – For e ∈ R, highest ρ(e′ ) for e′ ∈ Seed(C(e)) ∪ Leaf(C(e)) plus 1.
One can easily see that this definition is not circular.
   Assume that each procedure handles all words of length n or less, and consider
a word w of length n + 1. We use induction on rank of e to show that each [e]
handles w.
(Induction base:) Obviously, each procedure of rank 0 handles w.
(Induction step:) Assume that each procedure of rank m or less handles all words
of length n + 1 or less. Take any procedure [e] of rank m + 1.
If e ∈
     / R, e can be one of these:
  – A ∈ N . We have ρ(e(A)) = m; thus e(A) handles w, and so does A.
  – e1 e2 with nullable e1 . We have ρ(e1 ) ≤ m and ρ(e2 ) ≤ m. Each of them
     handles words of length n + 1 or less, so e1 e2 handles w.
  – e1 e2 with non-nullable e1 . We have ρ(e1 ) = m, so e1 handles w. If it fails, so
     does e1 e2 . Otherwise it consumes x ̸= ε and e2 is applied to the rest w′ of
     w, with length n or less. Thus, e2 handles w′ , so e1 e2 handles w.
  – e1 |e2 . We have ρ(e1 ) ≤ m and ρ(e2 ) ≤ m. Each of them handles words of
     length n + 1 or less, so e1 |e2 handles w.
If e ∈ R, the result of [e] is obtained by grower for the recursion class C(e) and
input w. The grower stops when it cannot improve any result. Each improvement
means consuming more of w. Since w is finite, the grower must eventually stop.
Thus, [e] handles w.                                                                ⊓
                                                                                    ⊔
Lemma 3. Each parsing procedure handles word of length 0.
Proof. The proof is essentially the same as that of Lemma 2, with w replaced
by ε, and simplified case of e1 e2 with non-nullable e1 .                  ⊓
                                                                           ⊔

A.2    Proof of Proposition 3
Suppose we are given a BNF proof of [S]u BNF    → u. We are going to show that
each partial proof of that proof (and the final proof) has the corresponding parse
                                             → x” to mean the partial proof with
tree. In the following, we say ”subtree [e]w BNF
             → x. Define the level of subtree [e]w BNF
result [e]w BNF                                     → x as follows:
 – For e = ε and e = a the level is 1.
 – For e ∈
         / R, other than above, the level is 1 plus the highest level of subtrees
   for components of e.
 – Each result with e ∈ R belongs to some recursion sequence. All subtrees
   in that sequence have the same level, equal to 1 plus the highest level of
   subtrees for the seed and leafs of that recursion sequence.
   The proof is by induction on the level.
(Induction base:) The parse trees for subtrees on level 1 are [ε]w = ε respectively
[a]az = a. They represent calls to parsing procedures [ε] and [a].
(Induction step:) Assume that there exists parse tree for each subtree on level n
or less. We show that there exists parse tree for each subtree on level n + 1.
                     → x on level n + 1 where e ∈
For a subtree [e]w BNF                            / R we construct parse tree from
parse trees of subtrees. This is done in Lemma 4 and represents calls from [e] to
its subprocedures.
The root of each subtree [e]w BNF→ x on level n + 1 where e ∈ R belongs to some
recursion sequence (perhaps degenerated to length 1). We construct parse trees
for all results in the sequence from parse trees for the leafs and the seed. This is
done in Lemma 5 and represents work done by the grower.                           ⊓
                                                                                  ⊔


                                                                 → x. There
Lemma 4. Assume there exists parse tree for each subtree of [e]w BNF
                           → x.
exists parse tree for [e]w BNF

Proof. The parse tree for [e]w BNF  → x is constructed in the way that depends
on e:
  – A = e′ . [A]w BNF → x is derived from [e′ ]w BNF
                                                 → x according to (rule).
    As assumed, there exists parse tree [e′ ]w = x for [e′ ]w BNF
                                                               → x. The parse tree
    [A]w = x is constructed from it using (rule.p).
                       → xy is derived from [e1 ]xz BNF
  – e1 e2 . [e1 e2 ]xz BNF                           → x and [e2 ]z BNF
                                                                     → y according
    to (seq).
                                                                                 → x
    As assumed, there exist parse trees [e1 ]xz = x and [e2 ]z = y. for [e1 )]xz BNF
    and [e2 ]z BNF  → y. The parse tree [e1 e2 ]xz = xy is built from them using
    (seq1.p).
  – e1 |e2 with [e1 |e2 ]w BNF
                           → x derived from [e1 ]w BNF
                                                     → x according to (choice1).
                                                               → x. The parse tree
    As assumed, there exists parse tree [e1 ]w = x for [e1 ]w BNF
    [e1 |e2 ]w = x is constructed from it using (choice1.p).
  – e1 |e2 with [e1 |e2 ]w BNF
                           → x derived from [e2 ]w BNF
                                                     → x according to (choice2).
    As assumed, there exists parse tree [e2 ]w = x for [e2 ]w BNF → x. [e2 ]w BNF→ x
    means w ∈ L(e2 )Tail(e). As specified in Figure 4, parser calls the procedure
    [e1 ] on z. According to Proposition 1, the call returns either fail or prefix
    y of w. By Proposition 2, this latter would mean w ∈ L(e1 )Σ ∗ , which con-
    tradicts (2). Therefore must be [e1 ]y = fail. The parse tree [e1 |e2 ]w = x is
    constructed from [e1 ]w = fail and [e2 ]w = x using (choice2.p).
                                                                                     ⊓
                                                                                     ⊔
Lemma 5. If there exist parse tree for each seed and each leaf of recursion
sequence e[k] , . . . , e[2] , e[1] , there exists parse tree for each result in the sequence
and in particular for e = e[k] .

Proof. Suppose the grower is called to handle e for input w. We start by showing
that after the n-th round of the grower, ⟨e[n] ⟩ contains the root of parse tree for
                     → x[n] .
the subtree [e[n] ]w BNF
   As the grower checks all plants on each round, it will call ⟨e[n] ⟩ on round n.
The proof is by induction on n.
(Induction base:) Expression e[1] is an exit e1 |e2 of the sequence, with e2 as seed.
                     → x is derived from [e2 ]w BNF
The result [e[1] ]w BNF                           → x using (choice2).
When the grower calls [e[1] , z] in its first round, [e1 , z] contains fail. As e2 is
the seed of the sequence, there exists parse tree with root [e2 ]w = x. The parse
tree for ⟨e[1] , w⟩ = x is constructed from it using (choice3.g).
(Induction step:) Consider the round n + 1 and assume that [e[n] , w] contains the
root ⟨e[n] , w⟩ = x of parse tree for subtree [e[n] ]w BNF
                                                       → x. What happens when the
grower calls [e[n+1] , w] depends on expression e[n+1] . It can be one of these:
  – A = e′ with [e[n+1] ]w BNF
                           → x derived from [e′ ]w BNF
                                                     → x according to (rule).
    The e here is e[n] with parse tree ⟨e′ , w⟩ = x. The parse tree for ⟨e[n+1] , w⟩
          ′

    is constructed from it using (rule.g).
                           → xy derived from [e1 ]xz BNF
  – e1 e2 with [e[n+1] ]xz BNF                          → x and [e2 ]z BNF → y accord-
    ing to (seq).
    The e1 here is e[n] with parse tree ⟨e1 , xz⟩ = x. As e2 is a leaf of the sequence,
    there exists parse tree with root [e2 ]z = y. The parse tree for [e[n+1] , w] is
    constructed using (seq1.g).
  – e1 |e2 . By (1), e2 ∈ / C. The BNF result [e[n+1] ]w BNF→ x is derived from or
     [e2 ]w → x according to (choice2).
            BNF


     As e[n] is in C, it must be e1 with parse tree ⟨e1 , w⟩ = x. The parse tree for
     ⟨e[n+1] , w⟩ is constructed from it using (choice1.g).
Note that from (3) follows ε ∈  / Adv(e). Thus, according to Lemma 1, a new result
is stored in ⟨e[n] , w⟩ in every turn.
    Suppose the grower does not stop after e[k] because it finds a better re-
sult for plant [e[k] , w]. If this better result is x′ , we have by Lemma 1 x′ =
⟨e[k] , w⟩Adv(e[k] ) which means Adv(e[k] ) is a prefix of TailR (e), and contradicts
(3). Thus, the grower stops at e[k] .
    The parse tree [e]w = x is constructed from ⟨e[k] , w⟩ = x using (grow.p). ⊓   ⊔
References
 1. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers, Principles,Techniques, and Tools.
    Addison-Wesley (1987)
 2. Birman, A.: The TMG Recognition Schema. Ph.D. thesis, Princeton University
    (February 1970)
 3. Birman, A., Ullman, J.D.: Parsing algorithms with backtrack. Information and
    Control 23, 1–34 (1973)
 4. Brooker, P., Morris, D.: A general translation program for phrase structure lan-
    guages. J. ACM 9(1), 1–10 (1962)
 5. Ford, B.: Parsing expression grammars: A recognition-based syntactic foundation.
    In: Jones, N.D., Leroy, X. (eds.) Proceedings of the 31st ACM SIGPLAN-SIGACT
    Symposium on Principles of Programming Languages, POPL 2004. pp. 111–122.
    ACM, Venice, Italy (14–16 January 2004)
 6. Hill, O.: Support for Left-Recursive PEGs (2010),
    https://github.com/orlandohill/peg-left-recursion
 7. Hopgood, F.R.A.: Compiling Techniques. MacDonalds (1969)
 8. Laurent, N., Mens, K.: Parsing expression grammars made practical. CoRR
    abs/1509.02439 (2015), http://arxiv.org/abs/1509.02439
 9. Mascarenhas, F., Medeiros, S., Ierusalimschy, R.: On the relation between context-
    free grammars and Parsing Expression Grammars. Science of Computer Program-
    ming 89, 235–250 (2014)
10. McClure, R.M.: TMG – a syntax directed compiler. In: Winner, L. (ed.) Proceed-
    ings of the 20th ACM National Conference. pp. 262–274. ACM (24–26 August
    1965)
11. Medeiros, S.: Correspondência entre PEGs e Classes de Gramáticas Livres de Con-
    texto. Ph.D. thesis, Pontifı́cia Universidade Católica do Rio de Janeiro (Aug 2010)
12. Medeiros, S., Mascarenhas, F., Ierusalimschy, R.: Left recursion in Parsing Expres-
    sion Grammars. Science of Computer Programming 96, 177–190 (2014)
13. Redziejowski, R.R.: From EBNF to PEG. Fundamenta Informaticae 128, 177–191
    (2013)
14. Redziejowski, R.R.: Trying to understand PEG. Fundamenta Informaticae 157,
    463–475 (2018)
15. Sigaud, P.: Left recursion. Tech. rep. (2017),
    https://github.com/PhilippeSigaud/Pegged/wiki/Left-Recursion
16. Tratt, L.: Direct left-recursive parsing expression grammars. Tech. Rep. EIS-10-01,
    School of Engineering and Information Sciences, Middlesex University (Oct 2010)
17. Warth, A., Douglass, J.R., Millstein, T.D.: Packrat parsers can support left re-
    cursion. In: Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Eval-
    uation and Semantics-based Program Manipulation, PEPM 2008, San Francisco,
    California, USA, January 7-8, 2008. pp. 103–110 (2008)