=Paper= {{Paper |id=Vol-2951/paper2 |storemode=property |title=Left Recursion by Recursive Ascent |pdfUrl=https://ceur-ws.org/Vol-2951/paper2.pdf |volume=Vol-2951 |authors=Roman R. Redziejowski |dblpUrl=https://dblp.org/rec/conf/csp/Redziejowski21 }} ==Left Recursion by Recursive Ascent== https://ceur-ws.org/Vol-2951/paper2.pdf
Left Recursion by Recursive Ascent
Roman R. Redziejowski


                                      Abstract
                                      Recursive-descent parsers can not handle left recursion, and several solutions to this problem have been
                                      suggested. This paper presents yet another solution. The idea is to modify recursive-descent parser so
                                      that it reconstructs left-recursive portions of syntax tree bottom-up, by "recursive ascent".

                                      Keywords
                                      parsing, recursive descent, left recursion




1. Introduction
Recursive-descent parser is a collection of "parsing procedures" that correspond to different
syntactic units and call each other recursively. Some of these procedures must choose what
to call next, and the choice is made by looking at the input ahead, on depth-first basis. This
process does not work if the grammar is left-recursive: the procedure may indefinitely call itself,
facing the same input. One suggested solution [1] counts the number of recursive calls of each
procedure and stops when it reaches a pre-set bound. The process is repeated with the bound
starting with 1 and gradually increased. Another solution [2, 3] saves the input consumed by
invocations of parsing procedures and tricks the parser to use the saved result instead of making
recursive call. The third idea sees parsing as reconstruction of input’s syntax tree. The classical
process builds that tree top-down, but [4] suggests that portions of the tree involved in left
recursion can be reconstructed starting from the bottom. We present here an approach based on
this idea. To reconstruct portions of the tree bottom-up we use procedures that call each other
recursively. These procedures can be regarded as new parsing procedures. We incorporate them
into the recursive-descent parser, and the result is recursive-descent parser for a new grammar,
referred to as the "dual grammar". The dual grammar is (normally) not left-recursive and defines
the same language as the original one.
   After the necessary definitions in Section 2, Section 3 gives an example of recursive ascent,
and shows that the procedures performing it can be seen as parsing procedures for new non-
terminals. Section 4 incorporates these new non-terminals into the original grammar, thus
obtaining the dual grammar. Two Propositions state the essential properties of that grammar.
Section 5 discusses handling choice expressions, and Section 6 discusses the elements omitted
to simplify the presentation. Section 7 contains some final remarks: relation to other solutions
and unsolved problems. Proofs of the Propositions are given in the Appendix.

29th international Workshop on Concurrency, Specification and Programming (CS&P’21), September 2021, Berlin,
Germany
" roman@redz.se (R. R. Redziejowski)
~ http://www.romanredz.se (R. R. Redziejowski)
                                    Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
2. Basic concepts
We consider a BNF-like grammar 𝐺 = (N, Ξ£, E, β„°, 𝑁𝑠 ) with finite set N of non-terminals, finite
set Ξ£ of terminals, finite set E of expressions, function β„° from non-terminals to expressions, and
the start symbol 𝑁𝑠 .
An expression is one of these:
    β€’ π‘Ž ∈ Ξ£ ("terminal"),
    β€’ 𝑁 ∈ N ("non-terminal"),
    β€’ 𝑒1 . . . 𝑒𝑛 ("sequence"),
    β€’ 𝑒1 | . . . |𝑒𝑛 ("choice"),
where each of 𝑒𝑖 is an expression. The function β„° is defined by a set of rules of the form
𝑁 β†’ 𝑒, where 𝑒 is the expression assigned by β„° to non-terminal 𝑁 . We often write 𝑁 β†’ 𝑒 to
mean 𝑒 = β„°(𝑁 ). To simplify the presentation, we did not include the empty string πœ€ among
expressions. In the following, expressions π‘Ž ∈ Ξ£ and 𝑁 ∈ N will be viewed as special cases of
choice expression with 𝑛 = 1.
   Non-terminal 𝑁 β†’ 𝑒1 . . . 𝑒𝑛 derives the string 𝑒1 . . . 𝑒𝑛 of symbols, while 𝑁 β†’ 𝑒1 | . . . |𝑒𝑛
derives one of 𝑒1 , . . . , 𝑒𝑛 . The derivation is repeated to obtain a string of terminals. This process
is represented by syntax tree.
   Figures 1 and 2 are examples of grammar 𝐺, showing syntax trees of strings derived from
the start symbol. The set of all strings derived from 𝑁 ∈ N is called the language of 𝑁 and is
denoted by β„’(𝑁 ).

                    N = {Z,A,A1,B,B1,B2}                              Z
                    Ξ£ = {a,b,x,y}                                        HH
                                                              x       A         y
                    𝑁𝑠 = Z
                                                                      A1
                    Z β†’ x A y                                     
                                                                          H
                                                                           H
                                                                  B         a
                    A β†’ A1 | a
                    A1β†’ B a                                    B1
                    B β†’ B1 | B2 | b                        
                                                                     H
                                                                      H
                                                          A           b
                    B1β†’ A b
                    B2β†’ B b                               a

Figure 1: Example of grammar 𝐺 and syntax tree of ’xabay’

                                           first
For 𝑁 ∈ N and 𝑒 ∈ N βˆͺ Ξ£, define 𝑁 βˆ’β†’ 𝑒 to mean that parsing procedure for 𝑁 may call that
for 𝑒 on the same input. We have thus:
                              first
    β€’ If 𝑁 β†’ 𝑒1 . . . 𝑒𝑛 , 𝑁 βˆ’β†’ 𝑒1 .
                                 first
    β€’ If 𝑁 β†’ 𝑒1 | . . . | 𝑒𝑛 , 𝑁 βˆ’β†’ 𝑒𝑖 for 1 ≀ 𝑖 ≀ 𝑛.

    First                                first                                          First
Let βˆ’β†’ be the transitive closure of βˆ’β†’. Non-terminal 𝑁 ∈ N is recursive if 𝑁 βˆ’β†’ 𝑁 . The set
of all recursive non-terminals of 𝐺 is denoted by R. All non-terminals in Figure 1 except Z, and
all non-terminals in Figure 2 are recursive.
                    N = {E,E1,F,F1}                                  E
                    Ξ£ = {a,b,x,z}
                    𝑁𝑠 = E                                           E1
                                                                        HH
                                                            E        +         F
                    E β†’ E1 | F
                    E1β†’ E + F                               F                 F1
                    F β†’ F1 | a                                            
                                                                                  H
                                                                                   H
                                                            F1            F    *      a
                    F1β†’ F * a
                                                        
                                                                H
                                                                 H
                                                        F   *    a        a

                                                        a

Figure 2: Example of grammar 𝐺 and syntax tree of ’a*a+a*a’


                                                                              First       First
   Define relation between recursive 𝑁1 , 𝑁2 ∈ R that holds if 𝑁1 βˆ’β†’ 𝑁2 βˆ’β†’ 𝑁1 . This is an
equivalence relation that partitions R into equivalence classes. We call them recursion classes.
The recursion class of 𝑁 is denoted by C(𝑁 ). All non-terminals in Figure 1 belong to the same
class; the grammar of Figure 2 has two recursion classes: {E,E1} and {F,F1}.
   In syntax tree, the leftmost path emanating from any node is a chain of nodes connected
   first
by βˆ’β†’. Suppose 𝑁1 and 𝑁2 belonging to the same recursion class C appear on the same
leftmost path. Any non-terminal 𝑁 between them must also belong to C, which follows from
                 First   First   First
the fact that 𝑁1 βˆ’β†’ 𝑁 βˆ’β†’ 𝑁2 βˆ’β†’ 𝑁1 . It means that members of C appearing on the same
leftmost path must form an uninterrupted sequence. We call such sequence a recursion path of
class C. The only recursion path in Figure 1 is the whole leftmost path from the first A without
final a. The syntax tree in Figure 2 has two recursion paths, one starting with E and another
with F.
   Let 𝑁 β†’ 𝑒1 . . . 𝑒𝑛 be on a recursion path. The next item on the leftmost path is 𝑒1 , and it
                                     First
must belong to C(𝑁 ) to ensure 𝑁 βˆ’β†’ 𝑁 . It follows that the last item on a recursion path must
be 𝑁 β†’ 𝑒1 | . . . |𝑒𝑛 where at least one of 𝑒𝑖 is not a member of C(𝑁 ). Such 𝑁 is called an exit
of C(𝑁 ), and its alternatives outside C(𝑁 ) are the seeds of C(𝑁 ). In Figure 1, both A and B are
exits, and the seeds are a and b. In Figure 2, E and F are exits of their respective classes, and the
corresponding seeds are F and a.
  A recursive 𝑁 that appears in expression for a non-terminal outside C(𝑁 ), or is the start
symbol, is an entry of its recursion class. It is the only non-terminal that can be first in a
recursion path. The recursion class of Figure 1 has entry A, and recursion classes of Figure 2
have E and F as their respective entries.
To simplify presentation, we assume that each class has only one entry.
3. Recursive ascent
Recursive descent constructs the syntax tree implicitly, as the structure of its procedure calls. We
assume that to serve any purpose, this tree has to be somehow registered. Thus, we assume that
parsing procedures include "semantic actions" that actually build data structure representing
the tree.
  We suggest that parsing procedure for entry 𝐸 builds its syntax tree in a special way, illustrated
below by parsing the string ’xabay’ from Figure 1. The parsing starts with procedure for
non-recursive Z that, in the usual way, calls the procedures for ’x’, A, and ’y’. After consuming
’x’, Z applies A to ’abay’. The procedure for entry A is not the usual choice between A1 and a;
instead, it reconstructs the subtree A as follows:
   1. The recursion path from A must end with a seed. The seeds are a and b. Decide to try a.
   2. Apply expression a to ’abay’. It consumes ’a’, leaving ’bay’. Use a as start of the
      tree.
   3. The only possible parent of seed a in the syntax tree is its exit A. Add A on top of the tree,
      with a as child.
   4. We have a tree for A, but decide to continue.
   5. A appears only in B1β†’A b, so B1 is the only possible parent of A.
   6. The tree for A is already constructed. Complete B1 by applying expression b to the
      remaining ’bay’. It consumes b, leaving ’ay’. Add B1 on top of the tree, with A,b as
      children.
   7. B1 appears only in B→B1|B2|b, so B is the only possible parent of B1.
   8. Add B on top of the tree, with B1 as child.
   9. The possible parents of B are A1β†’B a and B2β†’B b. Decide for A1.
  10. The tree for B is already constructed. Complete A1 by applying expression a to ’ay’. It
      consumes ’a’, leaving ’y’. Add A1 on top of the tree with B,a as children.
  11. The only possible parent of A1 is A→A1.
  12. Add A on top of the tree, with A1 as child.
  13. We have a tree for A, and decide to stop. Return the tree of the consumed ’aba’ to Z,
      which continues to consume y and constructs the tree for ’xabay’.

In general, the procedure for entry 𝐸 chooses and executes one of procedures that correspond
to different seeds. In the example, the choice is made in step 1, and the selected procedure
consists of steps 2-13. The procedure for seed 𝑆 starts with constructing the tree for 𝑆 and
follows by executing a procedure that adds node for the containing exit 𝑋; then it proceeds to
grow the resulting tree towards 𝐸. These are the steps 2 respectively 3-13 in our example. This
may be seen as parsing procedure that implements a new expression for 𝐸:

                                    𝐸 β†’ 𝑆1 $𝑋1 | . . . |𝑆𝑛 $𝑋𝑛                                   (1)

where 𝑆𝑖 are all the seeds of C(𝐸) and 𝑋𝑖 are the exits containing them.
  We can see $𝑋 as a special case of procedure $𝑅 that adds a new node 𝑅 on top of previously
constructed tree. This is the case in steps 6, 8, 10, and 12 of our example. The procedure then
continues to build the tree, which is in each case done in the subsequent steps up to the step 13.
                                       𝑅 β†’ 𝑇 𝑒2 . . . π‘’π‘š         𝑅 β†’ 𝑒1 | . . . |𝑇 | . . . |𝑒𝑛
                                              𝑅                               𝑅
                                              H
                                          TT HH
                   𝑇                  𝑇    𝑒2 . . . π‘’π‘š                       𝑇
                              -
                  
                   TT               TT TT      TT                    
                                                                             TT
                   π‘₯                   π‘₯ π‘₯2 . . . π‘₯π‘š                         π‘₯
Figure 3: Adding R


   The way of adding 𝑅 on top of tree 𝑇 depends on 𝑅 as illustrated in Figure 3.
   If 𝑅 β†’ 𝑒1 𝑒2 . . . π‘’π‘š (where 𝑒1 = 𝑇 ), $𝑅 builds the trees for 𝑒2 , . . . , π‘’π‘š , binds them with 𝑇
into the tree for 𝑅. The whole procedure can be seen as parsing procedure for new non-terminal:

                                       $𝑅 β†’ 𝑒2 . . . π‘’π‘š #𝑅                                       (2)

where 𝑒2 , . . . , π‘’π‘š are procedures for these expressions and #𝑅 continues the growing.
  If 𝑅 β†’ 𝑒1 | . . . |π‘’π‘š (where 𝑇 = 𝑒𝑖 for some 𝑖), the procedure just adds 𝑅 on top of 𝑇 and
proceeds to grow that tree. This can be seen as parsing procedure for:

                                           $𝑅 β†’ #𝑅 .                                             (3)

  Procedure #𝑅 consists of choosing a possible parent of node 𝑅 and adding that parent to
the tree. This can be seen as parsing procedure for another new non-terminal:

                                       #𝑅 β†’ $𝑃1 | . . . |$𝑃𝑛                                     (4)
                                               first
where 𝑃𝑖 are all members of C such that 𝑃𝑖 βˆ’β†’ 𝑅.
In our example, the choice (4) is performed in steps 5, 7, 9, and 11.
   If 𝑅 is identical to 𝐸, #𝑅 can choose to terminate building of the tree. Therefore, #𝐸 must
have an alternative to allow that:

                                     #𝐸 β†’ $𝑃1 | . . . |$𝑃𝑛 |$πœ€                                   (5)

where procedure $πœ€ returns the tree constructed for 𝐸. It was not chosen in step 4, but terminated
the process in step 13.


4. The dual grammar
As shown above, parsing procedure for an entry expression can be implemented by a set of
procedures that reconstruct portion of syntax tree bottom-up, by "recursive ascent". These
procedures can be seen as parsing procedures for new non-terminals. We can add these non-
terminals to the original grammar. The result for the grammar of Figure 1 is shown in Figure 4.
Here we replaced the expression for entry A by (1), and added the new non-terminals appearing
in it, and in their expressions. The other left-recursive non-terminals are no longer used, so they
are omitted. The non-recursive non-terminals, here represented by Z, are left unchanged. This
         Z   β†’ x A z                                   $B β†’ #B
         A   β†’ a $A | b $B                             $B1 β†’ b #B1
         $A β†’ #A                                       $B2 β†’ b #B2
         $A1 β†’ a #A1                                   #B β†’ $A1 | $B2
         #A β†’ $B1 | $πœ€                                 #B1 β†’ $B
         #A1 β†’ $A                                      #B2 β†’ $B
Figure 4: Dual grammar for grammar of Figure 1

         E   β†’ F $E                                    F   β†’ a $F
         $E β†’ #E                                       $F β†’ #F
         $E1 β†’ + F #E1                                 $F1 β†’ * a #F1
         #E β†’ $E1 | $πœ€                                 #F β†’ $F1 | $πœ€
         #E1 β†’ $E                                      #F1 β†’ $F
Figure 5: Dual grammar for grammar of Figure 2


is the "dual grammar" of the grammar in Figure 1. Figure 5 shows the dual grammar obtained in
a similar way for the grammar of Figure 2.
In general, the dual grammar is obtained as follows:

    β€’ For each entry 𝐸 replace β„°(𝐸) by 𝑆1 $𝑋1 | . . . |𝑆𝑛 $𝑋𝑛
      where 𝑆1 , . . . , 𝑆𝑛 are all seeds of C(𝐸), and 𝑋𝑖 is the exit containing 𝑆𝑖 .
    β€’ Replace each recursive 𝑅 β†’ 𝑒1 𝑒2 . . . 𝑒𝑛 by $𝑅 β†’ 𝑒2 . . . 𝑒𝑛 #𝑅.
    β€’ Replace each recursive 𝑅 β†’ 𝑒1 | . . . |𝑒𝑛 by $𝑅 β†’ #𝑅.
    β€’ For each 𝑅 ∈ R create:
         – #𝑅 β†’ $𝑃1 | . . . |$𝑃𝑛 if 𝑅 ΜΈ= 𝐸,
         – #𝑅 β†’ $𝑃1 | . . . |$𝑃𝑛 |$πœ€ if 𝑅 = 𝐸,
                                                                   first
      where 𝑃1 , . . . , 𝑃𝑛 are all members of C(𝑅) such that 𝑃𝑖 βˆ’β†’ 𝑅, and 𝐸 is the entry of C.

The dual grammar is an n-tuple 𝐷 = (N𝑑 , Ξ£, E𝑑 , ℰ𝑑 , 𝑁𝑠 ). Its set N𝑑 consists of:

    β€’ The non-recursive members of N;
    β€’ The entries to recursion classes;
    β€’ The $-expressions;
    β€’ The #-expressions.

In the following, the set of all non-recursive members of N is denoted by R, and the set of all
entries by RE . They appear as non-terminals in both 𝐺 and 𝐷. The set R βˆͺ RE of these common
symbols is denoted by N𝑐 . Functions β„° and ℰ𝑑 assign the same expressions to members of R.
The two important facts about the dual grammar are:

Proposition 1. The dual grammar is left-recursive only if the original grammar contains a cycle,
that is, a non-terminal that derives itself.

Proof is found in the Appendix.
Proposition 2. ℒ𝐷 (𝑁 ) = β„’(𝑁 ) for all 𝑁 ∈ N𝑐 .

Proof is found in the Appendix.
  The start symbol 𝑁𝑠 is either non-recursive or an entry, so it appears in both grammars, and
thus both grammars define the same language. It means that recursive-descent parser for the
dual grammar is a correct parser for the original grammar, and its "semantic actions" build
syntax tree for the original grammar.


5. Implementing the choices
The construction of dual grammar introduces a number of choice expressions. We assume that
they are treated in the same way as those originally present in the grammar. If dual grammar
has the LL(π‘˜) property, the choice can be made by looking at the next π‘˜ input terminals. The
dual grammar of Figure 4 is LL(1), so decisions in steps 1, 5, 10, and 14 of the example could
well be made by looking at the next terminal.
   If the dual grammar is not LL(π‘˜), the possible option is trial-and-error with backtracking. As
this may result in exponential processing time, the practical solution is limited backtracking,
recently being popular as the core of Parsing Expression Grammar (PEG) [5]. (As a matter of
fact, the three solutions named in the Introduction are all suggested as modifications to PEG.)
   The problem with limited backtracking is that it may fail to accept some legitimate input
strings. For some grammars, which may be called "PEG-complete", limited backtracking does
accept all legitimate strings. Thus, if the dual grammar is PEG-complete, it provides a correct
parser for the original grammar.
   There exist a sufficient condition that may be used to check if the dual grammar is PEG-
complete [6].
   The checks for LL(π‘˜) and PEG-completeness are carried on dual grammar. It is the subject of
further research how they can be replaced by checks applied to the original grammar.


6. Towards full grammar
We made here two simplifying assumptions. First, we did not include empty string πœ€ in the
grammar. Second, we assumed a unique entry to each recursion class.
   The result of adding πœ€ is that some expressions may derive empty string. These expressions
are referred to as nullable, and can be identified as such by analyzing the grammar.
   Nullable 𝑒1 in 𝑁 β†’ 𝑒1 . . . 𝑒𝑛 invalidates the whole analysis in Section 2. Nullable 𝑒2 . . . 𝑒𝑛
in recursive 𝑁 β†’ 𝑒1 𝑒2 . . . 𝑒𝑛 invalidates the proof of Proposition 1. Except for these two cases,
our approach seems to work with empty string.
  The assumption of unique entry per recursion class is not true for many grammars; for
example, the class of Primary in Java has four entries.
  The exit alternative $πœ€ must appear in #𝐸 for the entry 𝐸 that actually started the ascent.
This can be solved by having a separate version of #𝐸 and $𝐸 for each entry, multiplying the
number of these non-terminals by the number of entries.
A practical shortcut can exploit the fact that the ascents are nested. One can keep the stack of
entries for active ascents and modify the procedure for #𝑇 to check if 𝑇 is the entry to current
ascent.


7. Final remarks
The traditional way of eliminating left recursion is to rewrite the grammar so that left recursion
is replaced by right recursion. It can be found, for example, in [7], page 177. The process is
cumbersome and produces large results; most important, it loses the spirit of the grammar. Our
rewriting is straightforward and produces correct syntax tree for the original grammar.
   The methods described in [1, 2, 3] use memoization tables and special code to handle them.
The procedures are repeatedly applied to the same input. Our approach is in effect just another
recursive-descent parser.
Our idea of recursive ascent is in principle the same as that of [4], but this latter uses specially
coded "grower" to interpret data structures derived from the grammar.
   As indicated in Section 6, our method does not handle some cases of nullable expressions.
Among them is the known case of "hidden" left recursion: 𝐴 β†’ 𝑒1 𝑒2 . . . 𝑒𝑛 becomes left
recursive when 𝑒1 derives πœ€. Another unsolved problem is 𝐸 β†’ 𝐸 +𝐸 | 𝑛, which results in a
right-recursive parse for 𝐸.


References
[1] S. Medeiros, F. Mascarenhas, R. Ierusalimschy, Left recursion in Parsing Expression Gram-
    mars, Science of Computer Programming 96 (2014) 177–190.
[2] A. Warth, J. R. Douglass, T. D. Millstein, Packrat parsers can support left recursion, in:
    Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-
    based Program Manipulation, PEPM 2008, San Francisco, California, USA, January 7-8, 2008,
    pp. 103–110.
[3] L. Tratt, Direct left-recursive parsing expression grammars, Technical Report EIS-10-01,
    School of Engineering and Information Sciences, Middlesex University, 2010.
[4] O. Hill, Support for Left-Recursive PEGs, 2010.
    https://github.com/orlandohill/peg-left-recursion.
[5] B. Ford, Parsing expression grammars: A recognition-based syntactic foundation, in: N. D.
    Jones, X. Leroy (Eds.), Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on
    Principles of Programming Languages, POPL 2004, ACM, Venice, Italy, 2004, pp. 111–122.
[6] R. R. Redziejowski, More about converting BNF to PEG, Fundamenta Informaticae 133
    (2014) 177–191.
[7] A. V. Aho, R. Sethi, J. D. Ullman, Compilers, Principles, Techniques, and Tools, Addison-
    Wesley, 1987.
A. Proof of Proposition 1
                                                           firstD
For 𝑁 ∈ N𝑑 and 𝑒 ∈ N𝑑 βˆͺ Ξ£, define 𝑁 βˆ’β†’ 𝑒 to mean that parsing procedure 𝑁 may call
parsing procedure 𝑒 on the same input:
                    firstD                         first
  (a) For 𝑁 ∈ R, βˆ’β†’ is the same as βˆ’β†’.
                             firstD
  (b) For 𝑁 ∈ RE , 𝑁 βˆ’β†’ 𝑆 for each seed 𝑆 of C(𝑁 ).
                                          firstD
  (c) For $𝑅 β†’ 𝑒2 . . . 𝑒𝑛 #𝑅, $𝑅 βˆ’β†’ 𝑒2 .
                                 firstD
  (d) For $𝑅 β†’ #𝑅, $𝑅 βˆ’β†’ #𝑅.
           firstD                                                   first
  (e) #𝑅 βˆ’β†’ $𝑃𝑖 for each 𝑃𝑖 ∈ C(𝑅) such that 𝑃𝑖 βˆ’β†’ 𝑅.
                                          FirstD                            FirstD
Grammar 𝐷 is left-recursive if 𝑁 βˆ’β†’ 𝑁 for some 𝑁 ∈ N𝑑 , where βˆ’β†’ is the transitive closure
   firstD
of βˆ’β†’.
                                                                                         FirstD           FirstD
Suppose 𝐷 is left-recursive, that is, exist 𝑁1 , 𝑁2 , . . . , π‘π‘˜ ∈ N𝑑 such that 𝑁1 βˆ’β†’ 𝑁2 βˆ’β†’
     FirstD
. . . βˆ’β†’ π‘π‘˜ and π‘π‘˜ = 𝑁1 . We start by showing that none of them can be in N𝑐 .
                                                                                                  First
Assume, by contradiction, that 𝑁1 ∈ N𝑐 . We show that in this case 𝑁2 ∈ N𝑐 and 𝑁1 βˆ’β†’ 𝑁2 .
                                                                                                  first
(Case 1) 𝑁1 ∈ R. Because ℰ𝑑 (𝑁1 ) = β„°(𝑁1 ), 𝑁2 is in N, and thus in N𝑐 . We have 𝑁1 βˆ’β†’ 𝑁2 .
(Case 2) 𝑁1 ∈ RE . According to (b), 𝑁2 is a seed of C(𝑁1 ), which is either non-recursive or an
                                                             First
entry, and thus is in N𝑐 . For a seed 𝑁2 of C(𝑁1 ) holds 𝑁1 βˆ’β†’ 𝑁2 .
(Conclusion) The above can be repeated with 𝑁2 , . . . , π‘π‘˜βˆ’1 to show that if any of 𝑁𝑖 is in N𝑐 ,
                                          First
then for all 𝑖, 1 ≀ 𝑖 ≀ 𝑛 𝑁𝑖 ∈ N𝑐 and 𝑁𝑖 βˆ’β†’ 𝑁𝑖 . Thus, all 𝑁𝑖 belong to R and are all in the
same recursion class. As none of 𝑁𝑖 belongs to R, they must all be in RE . Then, in particular,
𝑁1 ∈ RE . But, according to (b), 𝑁2 is a seed of C(𝑁1 ), that cannot be a member of C(𝑁1 ).
Thus, 𝑁1 ∈/ N𝑐 .
   It follows that each 𝑁𝑖 is $𝑅𝑖 or #𝑅𝑖 for some 𝑅𝑖 ∈ R. If 𝑅𝑖 β†’ 𝑒2 . . . 𝑒𝑛 , we have from (c)
     firstD
$𝑅𝑖 βˆ’β†’ 𝑒2 , and 𝑒2 ∈ N𝑐 . That means all 𝑅𝑖 are choice expressions. Thus, the sequence of 𝑁𝑖
is a repetition of $𝑅𝑖 , #𝑅𝑖+1 , $𝑅𝑖+2 where, according to (d), 𝑅𝑖+1 = 𝑅𝑖 , and according to (e),
𝑅𝑖+2 β†’ . . . |𝑅𝑖+1 | . . . . That means 𝑅𝑖+2 derives 𝑅𝑖 , and in π‘˜/2 steps derives itself.    β–‘


B. Proof of Proposition 2
The proof is in terms of syntax trees. We write 𝑒◁ π‘₯ for syntax tree of π‘₯ with root 𝑒. We write
𝑒◁ [𝑒1 ◁ π‘₯1 +Β· Β· Β·+𝑒𝑛 ◁ π‘₯𝑛 ]π‘₯ to represent syntax tree with root 𝑒 and children 𝑒1 ◁ π‘₯1 , . . . , 𝑒𝑛 ◁ π‘₯𝑛
where π‘₯1 . . . π‘₯𝑛 = π‘₯.
  To distinguish between the trees according to grammar 𝐺 (𝐺-trees) and those according to
grammar 𝐷 (𝐷-trees) we use the symbols ◁𝐺 respectively ◁𝐷 .
  Assuming that a terminal derives itself, we show that every string π‘₯ derived from 𝑒 ∈ N𝑐 βˆͺ Ξ£
according to 𝐺 can be also derived from 𝑒 according to 𝐷 and vice-versa. The proof is by
induction on height of syntax trees.
(Induction base) Syntax tree of height 0. This is the syntax tree of a terminal. The terminals and
their syntax trees are identical in both grammars.
(Induction step) Assume that the stated property holds for all strings 𝑀 having syntax tree of
height β„Ž β‰₯ 0 or less. Lemmas 1 and 2 below show that it holds then for syntax trees of height
β„Ž + 1.

Lemma 1. Assume that for each 𝑒◁𝐺 𝑀 of height β„Ž β‰₯ 0 with 𝑒 ∈ N𝑐 βˆͺ Ξ£ exists 𝑒◁𝐷 𝑀. Then the
same holds for each 𝑒◁𝐺 𝑀 of height β„Ž + 1.

Proof. Take any 𝑁 ◁𝐺 𝑀 of height β„Ž + 1 where 𝑁 ∈ N𝑐 .
(Case 1) 𝑁 ∈ R. It means 𝑁 β†’ 𝑒1 . . . 𝑒𝑛 or 𝑁 β†’ 𝑒1 | . . . |𝑒𝑛 .
We have 𝑁 ◁𝐺 [𝑒1 ◁𝐺 𝑀1 + Β· Β· Β· + 𝑒𝑛 ◁𝐺 𝑀𝑛 ]𝑀 respectively 𝑁 ◁𝐺 [𝑒𝑗 ◁𝐺 𝑀]𝑀. Each of the subtrees
𝑒𝑖 ◁𝐺 𝑀𝑖 has height β„Ž or less and each 𝑒𝑖 is in N𝑐 βˆͺ Ξ£. By assumption, exists 𝑒𝑖 ◁𝐷 𝑀𝑖 for each 𝑖.
The required 𝐷-tree is 𝑁 ◁𝐷 [𝑒1 ◁𝐷 𝑀1 + Β· Β· Β· + 𝑒𝑛 ◁𝐷 𝑀𝑛 ]𝑀 respectively 𝑁 ◁𝐷 [𝑒𝑗 ◁𝐷 𝑀]𝑀.
                                                                   first    first    first
(Case 2) 𝑁 ∈ RE . The tree 𝑁 ◁𝐺 𝑀 has the leftmost path 𝑅𝑛 βˆ’β†’ . . . βˆ’β†’ 𝑅1 βˆ’β†’ 𝑅0 where
𝑅𝑛 = 𝑁 , 𝑅𝑖 ∈ C(𝑁 ) for 𝑛 β‰₯ 𝑖 > 0, and 𝑅0 is a seed of C(𝑁 ). Define π‘₯𝑖 to be the string
derived by 𝐺 from 𝑅𝑖 , for 0 ≀ 𝑖 ≀ 𝑛. We have thus 𝑀 = π‘₯𝑛 .
Define 𝑀0 be the string derived by 𝐺 from 𝑅0 , so the subtree for 𝑅0 is 𝑅0 ◁𝐺 𝑀0 .
Let 1 ≀ 𝑖 ≀ 𝑛. If 𝑅𝑖 β†’ 𝑒1 𝑒2 . . . π‘’π‘š , the subtree for 𝑅𝑖 is 𝑅𝑖 ◁𝐺 [π‘…π‘–βˆ’1 ◁𝐺 π‘₯π‘–βˆ’1 + 𝑒2 ◁𝐺 𝑣2 + Β· Β· Β· +
π‘’π‘š ◁𝐺 π‘£π‘š ]π‘₯𝑖 . Define 𝑀𝑖 = 𝑣2 . . . π‘£π‘š , so π‘₯𝑖 = π‘₯π‘–βˆ’1 𝑀𝑖 .
if 𝑅𝑖 β†’ 𝑒1 | . . . |π‘’π‘š , the subtree for 𝑅𝑖 is 𝑅𝑖 ◁𝐺 [π‘…π‘–βˆ’1 ◁𝐺 π‘₯π‘–βˆ’1 ]π‘₯𝑖 Define 𝑀𝑖 = πœ€, so we have again
π‘₯𝑖 = π‘₯π‘–βˆ’1 𝑀𝑖 .
One can easily see that 𝑀 = π‘₯𝑛 = 𝑀0 . . . 𝑀𝑛 .
Define 𝑦𝑖 to be the string derived by 𝐷 from $𝑅𝑖 , and 𝑧𝑖 to be the string derived by 𝐷 from
#𝑅𝑖 , for 0 ≀ 𝑖 ≀ 𝑛.
For entry 𝑅𝑛 exists the tree #𝑅𝑛 ◁𝐷 πœ€. If $𝑅𝑛 β†’ #𝑅𝑛 , exists the tree $𝑅𝑛 ◁𝐷 [#𝑅𝑛 ◁𝐷 πœ€]πœ€ = 𝑀𝑛 .
If $𝑅𝑛 β†’ 𝑒2 . . . 𝑒𝑛 #𝑅𝑛 , exists by assumption D-tree for each of 𝑒𝑗 , deriving, respectively, 𝑣𝑗 .
Thus, exists the tree $𝑅𝑛 ◁𝐷 [𝑒2 ◁𝐷 𝑣2 + π‘‘π‘œπ‘‘π‘  + π‘’π‘š ◁𝐷 π‘£π‘š + #𝑅𝑛 ◁𝐷 πœ€]𝑣2 . . . π‘£π‘š πœ€ = 𝑀𝑛 . Define
𝑦 𝑛 = 𝑀𝑛 .
Suppose that exists the tree $𝑅𝑖 ◁𝐷 𝑦𝑖 . Then exists the tree #π‘…π‘–βˆ’1 [◁𝐷 $𝑅𝑖 ◁𝐷 𝑦𝑖 ]𝑦𝑖 . By construc-
tion similar to the above, w find the D-tree for $π‘…π‘–βˆ’1 deriving π‘¦π‘–βˆ’1 = π‘€π‘–βˆ’1 𝑦𝑖 . At the end, we
find the D-tree for $𝑅1 deriving 𝑦1 = 𝑀1 𝑦2 .
By assumption there exists D-tree for the seed 𝑅0 deriving 𝑀0 . For entry 𝑅𝑛 we have 𝑅𝑛 β†’
𝑅0 $𝑅1 , which gives the D-tree 𝑅𝑛 ◁𝐷 [𝑅0 ◁𝐷 𝑀0 + $𝑅𝑖 ◁𝐷 𝑦1 ]𝑀0 𝑦1 . One can easily see that
𝑦1 = 𝑀1 . . . 𝑀𝑛 , so we have a D-tree for 𝑁 = 𝑅𝑛 deriving 𝑀 = 𝑀0 . . . 𝑀𝑛 .                      β–‘

Lemma 2. Assume that for each 𝑒◁𝐷 𝑀 of height β„Ž β‰₯ 0 with 𝑒 ∈ N𝑐 βˆͺ Ξ£ exists 𝑒◁𝐺 𝑀. Then the
same holds for each 𝑒◁𝐷 𝑀 of height β„Ž + 1.

Proof. Take any 𝑁 ◁𝐷 𝑀 of height β„Ž + 1 where 𝑁 ∈ N𝑐 .
(Case 1) 𝑁 ∈ R. It means 𝑁 β†’ 𝑒1 . . . 𝑒𝑛 or 𝑁 β†’ 𝑒1 | . . . |𝑒𝑛 . We have 𝑁 ◁𝐷 [𝑒1 ◁𝐷 𝑀1 + Β· Β· Β· +
𝑒𝑛 ◁𝐷 𝑀𝑛 ]𝑀 respectively 𝑁 ◁𝐷 [𝑒𝑗 ◁𝐷 𝑀]𝑀. Each of the subtrees 𝑒𝑖 ◁𝐷 𝑀𝑖 has height β„Ž or less
and each 𝑒𝑖 is in N𝑐 βˆͺ Ξ£. By assumption, exists 𝑒𝑖 ◁𝐺 𝑀𝑖 for each 𝑖. The required 𝐺-tree is
𝑁 ◁𝐺 [𝑒1 ◁𝐺 𝑀1 + Β· Β· Β· + 𝑒𝑛 ◁𝐺 𝑀𝑛 ]𝑀 respectively 𝑁 ◁𝐺 [𝑒𝑗 ◁𝐺 𝑀]𝑀.
(Case 2) 𝑁 ∈ RE . The D-tree of 𝑁 has root 𝑁 , node 𝑆 for seed of C(𝑁 ),and nodes $𝑅𝑖 , #𝑅𝑖
for 1 ≀ 𝑖 ≀ 𝑛. Denote 𝑅𝑛 = 𝑁 and 𝑅0 = 𝑆. Denote 𝑦𝑖 the string derived by $𝑅𝑖 and 𝑧𝑖 that
derived by #𝑅𝑖 . We have 𝑧𝑖 = 𝑦𝑖+1 for 𝑖 < 𝑛 and 𝑧𝑛 = πœ€. Denote by 𝑀0 the string derived by
𝑅0 and by 𝑀 one derived by 𝑁 = 𝑅𝑛 .
The D-tree for 𝑅𝑛 is 𝑅𝑛 ◁𝐷 [𝑅0 ◁𝐷 𝑀0 + $𝑅1 ◁𝐷 𝑦1 ]𝑀0 𝑦1 = 𝑀.
If $𝑅𝑖 β†’ 𝑒2 . . . 𝑒𝑛 #𝑅𝑖 , we have 𝑦𝑖 = 𝑣2 . . . π‘£π‘š 𝑧𝑖 . where 𝑣𝑗 is the string derived by 𝑒𝑗 . Defining
𝑀𝑖 = 𝑣2 . . . π‘£π‘š by 𝑀𝑖 , we have 𝑦𝑖 = 𝑀𝑖 𝑧𝑖 .
If $𝑅𝑖 #𝑅𝑖 , we have 𝑦𝑖 = 𝑧𝑖 . Defining 𝑀𝑖 = πœ€, we have again 𝑦𝑖 = 𝑀𝑖 𝑧𝑖 .
We have 𝑧𝑖 = 𝑦( 𝑖 + 1) for 𝑖 < 𝑛 and 𝑧𝑛 = πœ€, so 𝑦𝑖 = 𝑀𝑖 𝑦( 𝑖 + 1) for 𝑖 < 𝑛 and 𝑦𝑛 = 𝑀𝑛 . One
can easily see that 𝑀 = 𝑀0 . . . 𝑀𝑛 .
By assumption exists G-tree 𝑅0 ◁𝐺 𝑀0 . Suppose we have the G-tree for π‘…π‘–βˆ’1 where 1 ≀ 𝑖 ≀ 𝑛
that derives π‘₯π‘–βˆ’1 .
Suppose 𝑅𝑖 β†’ 𝑅( 𝑖 βˆ’ 1)𝑒2 . . . π‘’π‘š . By assumption exist G-trees that derive 𝑣2 . . . π‘£π‘š from
𝑒2 . . . π‘’π‘š , so exists G-tree for 𝑅𝑖 deriving π‘₯𝑖 = π‘₯( 𝑖 βˆ’ 1)𝑣2 . . . π‘£π‘š = π‘₯( 𝑖 βˆ’ 1)𝑀𝑖 . Suppose
𝑅𝑖 β†’ 𝑅( 𝑖 βˆ’ 1). Then exists G-tree deriving π‘₯𝑖 = π‘₯( 𝑖 βˆ’ 1)πœ€ = π‘₯( 𝑖 βˆ’ 1)𝑀𝑖 . One can easily see
that the string π‘₯𝑛 derived by 𝑅𝑛 is 𝑀0 . . . 𝑀𝑛 = 𝑀                                           β–‘