<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roman R. Redziejowski</string-name>
          <email>roman.redz@swipnet.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Giraf's Research</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Parsing Expression Grammar (PEG) is a way to define a recursive-descent parser with limited backtracking. Its properties are useful in many applications. In spite of its apparent similarity to Extended Backus-Naur Form (EBNF), PEG often defines quite a different language, sometimes difficult to describe exactly. However, a recent result by Medeiros shows that an EBNF grammar having the LL(1) property can be transcribed verbatim into a PEG that not only defines the same language, but also parses the input in exactly the same way. We show that such transcription is possible for a wider class of EBNF grammars, which is interesting because the backtracking of PEG is often a convenient way to circumvent just the LL(1) restriction.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>alternatives of Literal by looking at any predefined number of characters ahead.
But, treated as a PEG parser, this grammar recognizes exactly the language
defined by its EBNF interpretation.</p>
      <p>
        To see how it happens, suppose the PEG parser is presented with the input
”10B”. It performs a series of calls Literal → DecimalLiteral → [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7">0-9</xref>
        ]∗,
consuming ”10”, after which it fails to recognize ".", backtracks, and proceeds to try
BinaryLiteral. Parsing procedure for the nonterminal DecimalLiteral has here
the same overall effect as if it was a terminal: it explores the input ahead, and
either consumes the recognized portion, or backtracks and in effect does nothing.
In fact, the grammar above would be LL(1) if DecimalLiteral and BinaryLiteral
were terminals, corresponding to tokens produced by a lexer.
      </p>
      <p>In the following, we try to answer this question: given an EBNF grammar,
when can I transcribe it directly into PEG and obtain a correct parser?
2</p>
    </sec>
    <sec id="sec-2">
      <title>Some Notation</title>
      <p>We consider a finite alphabet of letters. A finite string of letters is a word.
The string of 0 letters is called the empty word and is denoted by ". The set of
all words is ∗. A subset of ∗ is a language.</p>
      <p>As usual, we write XY to mean the concatenation of languages X and Y ,
that is, the set of all words xy where x ∈ X and y ∈ Y .</p>
      <p>For x; y ∈ ∗, x is a pre x of y if y = xu for some u ∈ ∗. We write x ≤ y
to mean that x is a prefix of y. For X ⊆ ∗, the set of all prefixes of x ∈ X is
denoted by Pref(X).</p>
      <p>A relation R on E is a subset of E × E. As usual, we write R(e) to mean the
set of all e′ such that (e; e′) ∈ R, and R(E) to mean the union of R(e) for all
e ∈ E ⊆ E. The transitive and reflexive closure of R is denoted by R∗, and the
product of relations R and S by R × S.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The Grammar</title>
      <p>We consider a grammar G over the alphabet that will be interpreted as either
PEG or EBNF. The grammar is a set of rules of the form A 7→ e, where e is
a syntax expression and A is the name given to it. The expression e has one of
these forms:
{ e1 e2 (Sequence),
{ e1|e2 (Choice),
where each of e1; e2 is an expression name, a letter from , or ". The set of all
names appearing in the rules is denoted by N . For A ∈ N , e(A) is the expression
e in A 7→ e. We define E = N ∪ ∪ ".</p>
      <p>When G is interpreted as PEG, the rule A 7→ e is a definition A ← e of
parsing expression e, and e1|e2 is the ordered choice. When G is interpreted as
EBNF, the rule is a production A → e, and e1|e2 is the unordered choice.
For obvious reasons, we assume G to be free from left recursion.</p>
      <p>This grammar is reduced to bare bones in order to simplify the reasoning.
Any full EBNF grammar or PEG without predicates can be reduced to such
form by steps like these:
{ Replacing [a1-an] by a1|a2| : : : |an;
{ Rewriting e1e2e3 and e1|e2|e3 as e1(e2e3) respectively e1|(e2|e3) and
introducing new names for expressions in parentheses;
{ Replacing e+ by ee∗;
{ Replacing e∗ by A 7→ eA | ".
3.1</p>
      <p>The PEG Interpretation
When G is interpreted as PEG, the elements of E are parsing procedures that
can call each other recursively. In general, parsing procedure is applied to a
string from ∗ at a position indicated by some ”cursor”. It tries to recognize
a portion of the string ahead of the cursor. If it succeeds, it ”consumes” the
recognized portion by advancing the cursor and returns ”success”; otherwise, it
returns ”failure” and does not consume anything (does not advance the cursor).
The action of different procedures is as follows:
{ ": Indicate success without consuming any input.
{ a ∈ : If the text ahead starts with a, consume it and return success.</p>
      <p>Otherwise return failure.
{ A 7→ e1 e2: Call e1. If it succeeded, call e2 and return success if e2 succeeded.</p>
      <p>If e1 or e2 failed, reset cursor as it was before the invocation of e1 and return
failure.
{ A 7→ e1|e2: Call e1. Return success if it succeeded. Otherwise call expression
e2 and return success if e2 succeeded or failure if it failed.</p>
      <p>Backtracking occurs in the Sequence expression. If e1 succeeds and consumes
some input, and then e2 fails, the cursor is moved back to where it was before
trying e1. If this Sequence was called as the first alternative in a Choice, Choice
has an opportunity to try another alternative on the same input. However, the
backtracking is limited: once e1 in the Choice e1|e2 succeeded, e2 will never be
tried on the same input, even if the parse fails later on.</p>
      <p>
        Following [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we define formally the action of a parsing procedure with the
help of relation PEG⊆ E × ∗ × { ∗ ∪ fail}. For e ∈ E and x; y ∈ ∗:
{ [e] xy PEG y means: ”e applied to input xy consumes x and returns success”,
{ [e] x PEG fail means: ”e applied to input x returns failure”.
      </p>
      <p>
        The relation itself is defined using the method of ”natural semantics”: [e] x PEG X
holds if and only if it can be proved using the inference rules in Figure 1. The
rules come from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but were modified by integrating the step e(A) ⇒ A into
the last four rules.
      </p>
      <p>The proof tree of [e] x PEG X, when followed from bottom up, mimics
procedure calls in the process of parsing the string x. Rule seq.1p with Z = fail
corresponds to backtracking in Sequence, and rule choice.2p to calling e2 after
e1 failed.</p>
      <p>
        According to Lemma 2.3.1 in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a proof of [e] x PEG X exists if and only
if there exists the corresponding derivation ⇒G in the formal definition of PEG
given by Ford in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. According to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], such derivation exists if the grammar
is ”well-formed”. Grammar G is well-formed because it contains neither left
recursion nor iteration. This gives the following fact:
Lemma 1. For every e ∈ E and w ∈ ∗, there exists a proof of either
[e] w PEG fail or [e] w PEG y where w = xy.
      </p>
      <p>(empty.1p)
(letter.2p)
(seq.1p)
(seq.2p)
(choice.1p)
(choice.2p)
["] x PEG x</p>
      <p>b ̸= a
[b] ax PEG fail
(letter.1p)
(letter.3p)
[a] ax PEG x
[a] " PEG fail
[e1] xyz PEG yz [e2] yz PEG Z e(A) = e1e2</p>
      <p>[A] xyz PEG Z
[e1] x PEG fail e(A) = e1e2</p>
      <p>[A] x PEG fail
[e1] xy PEG y e(A) = e1|e2</p>
      <p>[A] xy PEG y
[e1] x PEG fail [e2] xy PEG Y e(A) = e1|e2</p>
      <p>[A] xy PEG Y
where Y denotes y or fail and Z denotes z or fail.
The grammar G with ”7→” interpreted as ”→” corresponds quite exactly to the
traditional Context-Free Grammar (CFG), so we shall speak of its interpretation
as CFG. Traditionally, CFG is a mechanism for generating words:
{ " generates empty word.
{ a ∈ generates itself.
{ A 7→ e1e2 generates any word generated by e1 followed by any word
generated by e2.</p>
      <p>{ A 7→ e1|e2 generates any word generated by e1 or e2.</p>
      <p>The language L(e) of e ∈ E is the set of all words that can be generated by e.
(empty.1c)
(seq.1c)
(choice.1c)
(choice.2c)
["] x CFG x
[e1] xyz CFG yz [e2] yz CFG z e(A) = e1e2
[A] xyz CFG z
(letter.1c)</p>
      <p>[a] ax CFG x
[e1] xy CFG y e(A) = e1|e2</p>
      <p>[A] xy CFG y
[e2] xy CFG y e(A) = e1|e2</p>
      <p>[A] xy CFG y</p>
      <p>
        Following [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we define formally the meaning of a CFG using relation CFG
similar to PEG. The relation is defined in the way similar to PEG: [e] x CFG y holds
if and only if it can be proved using the inference rules in Figure 2. Again, these
rules come from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and were modified by integrating the step e(A) ⇒ A into
the last three.
      </p>
      <p>The proof tree of [e] x CFG y, when followed from bottom up, mimics
procedure calls in an imagined recursive-descent parser processing the string x. The
procedures correspond to elements of E, and the procedure for Choice always
chooses the correct alternative, either by full backtracking or by an oracle.</p>
      <p>It can be verified that L(e) = {x ∈ ∗ : [e] xy CFG y for some y ∈ ∗}. It
can also be verified that if x ∈ L(e), [e] xy CFG y holds for every y ∈ ∗.
4</p>
    </sec>
    <sec id="sec-4">
      <title>When Are the Two Interpretations Equivalent?</title>
      <p>
        The following has been proved as Lemma 4.3.1 in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
Proposition 1. For any e ∈ E and x; y ∈
∗, [e] xy PEG y implies [e] xy CFG y.
      </p>
      <p>
        Proof. We spell out the proof sketched in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It is by induction on the height of
the proof tree.
(Induction base) Suppose the proof of [e] xy PEG y consists of one step. Then it
has to be the proof of ["] x PEG x or [a] ax PEG x using empty.1p or letter.1p,
respectively. But then, ["] x CFG x respectively [a] ax CFG x by empty.1c or
letter.1c.
(Induction step) Assume that for every proof tree for [e] xy PEG y of height n ≥ 1
exists a proof of [e] xy CFG y. Consider a proof tree for [e] xy PEG y of height
n + 1. Its last step must be one of these:
{ [A] xyz PEG x derived from [e1] xyz PEG yz, [e2] yz PEG z, and e(A) = e1e2
using seq.1p. By induction hypothesis, [e1] xyz CFG yz and [e2] yz CFG z, so
[A] xy CFG x follows from seq.1c.
{ [A] xy PEG x derived from [e1] xy PEG y and e(A) = e1e2 using choice.1p.
      </p>
      <p>By induction hypothesis, [e1] xy CFG y, so [A] xy CFG x follows from choice.1c.
{ [A] xy PEG x derived from [e1] xy PEG fail, [e2] xy PEG y, and e(A) = e1e2
using choice.2p. By induction hypothesis, [e2] xy CFG y, so [A] xy CFG x
follows from choice.2c.
⊓⊔</p>
      <p>Known examples show that the reverse of Proposition 1 is, in general, not
true. We intend to formulate some conditions under which the reverse does hold.
The problem is complicated by another known fact, namely that [e] xy PEG y
does not imply [e] xy PEG y′ for every y′ ∈ ∗: the action of e depends on the
string ahead. We have to consider the action of e in relation to the entire input
string.</p>
      <p>Let $, the end-of-text marker, be a letter that does not appear in any rules.
Define some S ∈ N as the starting rule of the grammar. As L(e) does not depend
on the input ahead, we have [S] w$ CFG $ if and only if w ∈ L(S). We note that
every partial result in the proof tree of [S] w$ CFG $ has the form [e] xy$ CFG y$
for some e ∈ E and x; y ∈ ∗.</p>
      <p>For A ∈ N , define Tail(A) to be any string that appears ahead of the ”CFG
parser” just after it completed the call to A while processing some w ∈ L(S).
Formally, y$ ∈ Tail(A) if there exists a proof tree for [S] w$ CFG $ that contains
[A] xy$ CFG y$ as a partial result.</p>
      <p>L(e1) ∩ Pref(L(e2) Tail(A)) = ∅;
(1)
there exists a proof of [S] w$ PEG $ for each w ∈ L(S). Moreover, for every
partial result [e] xy$ CFG y$ in the proof tree of [S] w$ CFG $ there exists a proof
of [e] xy$ PEG y$.</p>
      <p>Proof. Take any w ∈ L(S). It means there exists proof tree for [S] w$ CFG $. We
are going to show that for each partial result [e] xy$ CFG y$ in that tree there
exists a proof of [e] xy$ PEG $. We show it using induction on the height of the
proof tree.
(Induction base) Suppose the proof of the partial result [e] xy$ CFG y$ consists
of one step. Then it has to be the proof of ["] x$ CFG x$ or [a] ax$ CFG x$
using empty.1c or letter.1c, respectively. But then, ["] x$ PEG x$ respectively
[a] ax$ PEG x$ by empty.1p or letter.1p.
(Induction step) Assume that for every partial result [e] xy$ CFG y$ that has
ppraortoifaltrreeesuolft h[ee]igxhyt$ nCF≥G y1$ twheitrhe pexroisotfs tareperoofofheoifgh[et] nx+y$1P.EIGts yl$a.stCsotnepsidmeursat
be one of these:
{ [A] xyz$ CFG z$ derived from [e1] xyz$ CFG yz$, [e2] yz$ CFG z$, and
e(A) = e1e2 using seq.1c. By induction hypothesis, [e1] xyz$ PEG yz$ and
[e2] yz$ PEG z$, so [A] xyz$ PEG z$ follows from seq.1p.
{ [A] xy$ CFG y$ derived from [e1] xy$ CFG y$ and e(A) = e1|e2 using
choice.1c. By induction hypothesis, [e1] xy$ PEG y$, so [A] xy$ PEG y$
follows from choice.1p.
{ [A] xy$ CFG y$, derived from [e2] xy$ CFG y$ and e(A) = e1|e2 using
choice.2c. By induction hypothesis, [e2] xy$ PEG y$. But, to use choice.2p
we also need to verify that [e1] xy$ PEG fail.</p>
      <p>Assume that (1) holds for A and suppose that there is no proof of
[e1] xy$ PEG fail. According to Lemma 1, there exists a proof of
[e1] uv$ PEG v$ where uv$ = xy$. According to Proposition 1, there
exists a proof of [e1] uv$ CFG v$, so u ∈ L(e1). From [e2] xy$ CFG y$ follows
x ∈ L(e2). From [A] xy$ CFG y$ follows y$ ∈ Tail(A). From uv$ = xy$
follows u ≤ xy, so u ∈ Pref(L(e2) Tail(A)), which contradicts (1). We must
thus conclude that there exists a proof of [e1] xy$ PEG fail, so there exists a
proof of [A] xy$ PEG y$ using choice.2p.</p>
      <p>The proof of [S] w$ PEG $ exists as a special case of the above.</p>
      <sec id="sec-4-1">
        <title>The consequence of Propositions 1 and 2 is:</title>
        <p>⊓⊔
Proposition 3. The two interpretations of G are equivalent if every Choice
A 7→ e1|e2 ∈ G satis es (1). They are equivalent not only by accepting the same
language, but also by parsing the input in the same way.</p>
        <p>
          Note that the condition is sufficient, but not necessary. Unfortunately, (1) is
not easy to check. The languages appearing in it are context-free languages. The
problems of inclusion and emptiness of intersection are in general undecidable
for these languages, so one has to consider special cases, or use a conservative
approximation. In fact, condition (1) is identical to the ”general semi disjointness”
considered by Schmitz in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]; he checks it using an own grammar approximation
method.
        </p>
        <p>
          As it will be seen, the LL(1) property implies (1), which accounts for the
result obtained in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We are going to look for a weaker property that still
implies (1). One can think of the LL(1) property as approximating words by
their one-letter prefixes. We generalize this to approximating words by prefixes
of arbitrary lengths.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Pre x Covers</title>
      <sec id="sec-5-1">
        <title>We say that Y ⊆ ∗ is a pre x cover of X ⊆</title>
        <p>nonempty word in X has a nonempty prefix in Y .
Lemma 2. For any X; Y; Z ⊆
∗:
(a) If X ⊑ Y and Y ⊑ Z then X ⊑ Z.
(b) If PX ⊑ X and PY ⊑ Y then PX ∪ PY ⊑ X ∪ Y .
(c) If " ∈= X and PX ⊑ X then PX ⊑ XY .
(d) If " ∈ X, PX ⊑ X, and PY ⊑ Y then PX ∪ PY ⊑ XY .
∗, and write Y ⊑ X if each</p>
      </sec>
      <sec id="sec-5-2">
        <title>Proof. (a) Follows from the transitivity of ≤.</title>
        <p>(b) If x ̸= " is in X, it has a nonempty prefix in PX . If it is in Y , it has a
nonempty prefix in PY .
(c) If " ∈= X, each x ∈ XY has a nonempty prefix in X This, in turn, has a
nonempty prefix in PX .
(d) If " ∈ X, we have XY = (X − ")Y ∪ Y . From (c) we have PX ⊑ (X − ")Y .
The stated result follows from (b). ⊓⊔
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Pre x Disjointness</title>
      <p>We say that X ⊆ ∗ and Y ⊆ ∗ are pre x-disjoint, and write X ≍ Y to mean
that for all nonempty x ∈ X and y ∈ Y neither x ≤ y nor y ≤ x.</p>
      <p>One can easily see that prefix-disjoint languages are disjoint, but the reverse
is in general not true.</p>
      <p>Lemma 3. Let X ⊆ + and Y ⊆
implies X ∩ Pref(Y ) = ∅.</p>
      <p>∗. For any PX ⊑ X and PY ⊑ Y , PX ≍ PY
Proof. Let X; Y; PX ; PY be as stated. Suppose that X ∩ Pref(Y ) ̸= ∅, so there
exists x such that x ∈ X, x ∈ Pref(Y ). Since x ̸= ", we have x = uv for some
nonempty u ∈ PX and v ∈ ∗. As x ∈ Pref(Y ), we have xt ∈ Y for some
t ∈ ∗. As xt ̸= ", we have xt = pq for some nonempty p ∈ PY ; q ∈ ∗. This
gives uvt = pq; this means either u ≤ p or p ≤ u, which contradicts PX ≍ PY .</p>
      <sec id="sec-6-1">
        <title>We can now apply this result to approximate (1).</title>
        <p>⊓⊔
Proposition 4. The two interpretations of G are equivalent if for every Choice
A 7→ e1|e2 ∈ G:
{ " ∈= L(e1)
{ There exist P1 ⊑ L(e1) and P2 ⊑ L(e2) Tail(A) such that P1 ≍ P2.
Proof. Consider any Choice A 7→ e1|e2 ∈ G. Assume " ∈= L(e1) and P1, P2 as
stated. By Lemma 3, we have L(e1) ∩ Pref(L(e2) Tail(A)) = ∅, and the stated
result follows from Proposition 3.
⊓⊔
Proposition 5. The two interpretations of G are equivalent if for every Choice
A 7→ e1|e2 ∈ G:
{ " ∈= L(e1).
{ If " ∈= L(e2), there exist P1 ⊑ L(e1) and P2 ⊑ L(e2) such that P1 ≍ P2.
{ If " ∈ L(e2), there exist P1 ⊑ L(e1), P2 ⊑ L(e2), and PT ⊑ Tail(A) such
that P1 ≍ (P2 ∪ PT ).</p>
        <p>Proof. Consider any Choice A 7→ e1 | e2 ∈ G. Assume " ∈= L(e1), P1 ⊑ L(e1)
and P2 ⊑ L(e2). Suppose " ∈= L(e2). By Lemma 2(c), P2 ⊑ L(e2) Tail(A), so
P1 ≍ P2 satisfies Proposition 4. Suppose now that " ∈ L(e2) and PT ⊑ Tail(A).
By Lemma 2(d), P2∪PT ⊑ L(e2) Tail(A), so P1 ≍ P2∪PT satisfies Proposition 4.
⊓⊔
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Extending the Classical FIRST</title>
      <p>One can easily see that the classical First(e) and Follow(A) are prefix covers
of e and Tail(A) that consist of one-letter words. For such sets, X ≍ Y is the
same as X ∩Y = ∅, so First(e1) ∩ First(e2) = ∅ and First(e1) ∩ (First(e2)∪
Follow(A)) = ∅ are special cases of the last two conditions in Proposition 5.
They are exactly the LL(1) conditions.</p>
      <p>We are now going to look at other sets that may be used instead of First(e)
and Follow(A). For A ∈ N define:
{ First(A) = {e1; e2} if e(A) = e1| e2,
{ First(A) = {e1} if e(A) = e1 e2 and " ∈= L(e1),
{ First(A) = {e1; e2} if e(A) = e1 e2 and " ∈ L(e1).</p>
      <p>For E ⊆ E, let F IRST (E) be the family of subsets of E defined as follows:
{ E ∈ F IRST (E).
{ If F belongs to F IRST (E), the result of replacing its member A ∈ N by
all elements of First(A) also belongs to F IRST (E).
{ nothing else belongs to F IRST (E) unless its being so follows from the
above.</p>
      <p>These definitions are illustrated in Figure 3. An arrow from e1 to e2 means
e2 ∈ First(e1). One can easily see that F IRST (E) contains the classical
First(E).</p>
      <p>A 7→ C | D
B 7→ E A
C 7→ a | b
D 7→ b | X
E 7→ d | Y
X 7→ c D
Y 7→ c E
FIRST (A) = {{A}; {C; D}; {a; b; D}; {C; b; X}; {a; b; X}; {a; b; c}}
FIRST (B) = {{B}; {E}; {d; Y }; {d; c}}</p>
      <p>Lemma 4. For each A ∈ N holds L(First(A)) ⊑ L(A).</p>
      <p>Proof. In case e(A) = e1| e2 we have L(A) = L(e1) ∪ L(e2) = L(e1| e2).
Each language is its own prefix cover, so L{e1; e2} ⊑ L(e1| e2).</p>
      <p>In case e(A) = e1 e2 we have L(A) = L(e1)L(e2). Again, L(e1) ⊑ L(e1) and
L(e2) ⊑ L(e2).</p>
      <p>If " ∈= e1, we have, by Lemma 2(c), L{e1} ⊑ L(e1)L(e2).</p>
      <p>If " ∈ e1, we have, by Lemma 2(d), L{e1; e2} = L(e1) ∪ L(e2) ⊑ L(e1)L(e2). ⊓⊔
Lemma 5. For each F ∈ F IRST (E) holds L(F ) ⊑ L(E).</p>
      <p>Proof. We prove the Proposition by induction on the number of replacements.
(Induction base) As each language is its own prefix cover, L(E) ⊑ L(E).
(Induction step) Assume L{e1; e2; : : : ; en} ⊑ L(E). As sets are not ordered, we
can always consider replacing e1. Let First(e1) = {f1; f2}. (The proof will be
similar if First(e1) has only one element.) According to Lemma 4, L{f1; f2} ⊑
L(e1). We also have L{e2; : : : ; en} ⊑ L{e2; : : : ; en}. Using Lemma 2(b), we
obtain:</p>
      <p>L{f1; f2; e2; : : : ; en} = L{f1; f2} ∪ L{e2; : : : ; en}</p>
      <p>⊑ L(e1) ∪ L{e2; : : : ; en} = L{e1; e2; : : : ; en}:
From Lemma 2(a) follows L{f1; f2; e2; : : : ; en} ⊑ L(E).</p>
      <p>We can apply this result directly to "-free grammars:
Proposition 6. If none of the rules contains ", the two interpretations of G are
equivalent if for every Choice A 7→ e1|e2 ∈ G there exist First1 ∈ F IRST (e1)
and First2 ∈ F IRST (e2) such that L(First1) ≍ L(First2).
⊓⊔
Proof. Suppose the required First1 and First2 exist. If G is "-free, we have
" ∈= L(e1) and " ∈= L(e2). From Lemma 5 we have L(First1) ⊑ L(e1) and
L(First2) ⊑ L(e2). The stated result follows from Proposition 5. ⊓⊔
8</p>
    </sec>
    <sec id="sec-8">
      <title>Extending the Classical FOLLOW</title>
      <p>
        To handle the case when G is not "-free, we need to find suitable prefix covers of
Tail(A). For this purpose, we borrow from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] the definition of Follow ⊆ E × E.
For e ∈ E, define Last(e) to be the set of all A ∈ N such that:
{ e(A) = e|e1 for some e1 ∈ E, or
{ e(A) = e1|e for some e1 ∈ E, or
{ e(A) = e1 e for some e1 ∈ E, or
{ e(A) = e e1 for some e1 ∈ E where " ∈ L(e1).
      </p>
      <p>For e ∈ E, define Next(e) = {e1 ∈ E : exists A 7→ ee1 ∈ G}.</p>
      <p>Finally, define Follow = Last∗ × Next.</p>
      <p>Lemma 6. For each A ∈ N , L(Follow(A)) ⊑ Tail(A).</p>
      <p>Proof. Consider some A ∈ N and y$ ∈ Tail(A). By definition, there is a proof
of [S] w$ CFG $ that contains [A] xy$ CFG y$ as one of the partial results. This
partial result must be used in a subsequent derivation. This derivation can only
result in one of the following:
(a) [A1] xy$ CFG y$ where e(A1) = A | e from [e] xy$ CFG y$ using choice.1c.
(b) [A1] xy$ CFG y$ where e(A1) = e |A from [e] xy$ CFG y$ using choice.2c.
(c) [A1] zxy$ CFG y$ where e(A1) = e A from [e] zxy$ CFG xy$ using seq.1c.
(d) [A1] xy$ CFG y$ where e(A1) = A e from [e] "y$ CFG y$ using seq.1c.
(e) [A1] xy′y′′$ CFG y′′$ where e(A1) = A B, y′y′′ = y and y′ ̸= " from
[B] y′y′′$ CFG y′′$ using seq.1c.</p>
      <p>In each of the cases (a)-(d), the result is similar to the original one, and the
alternative derivations (a)-(e) apply again. We may have a chain of derivations
(a)-(d), but it must end with (e) as y must eventually be reduced. We have thus
in general a sequence of steps of this form:
[A0] xy$ CFG y$ : : : as in (a)-(d)
[A1] x1y$ CFG y$ : : : as in (a)-(d)
[A2] x2y$ CFG y$ : : : as in (a)-(d)</p>
      <p>: : :
[An−1] xn−1y$ CFG y$ : : : as in (a)-(d)
[An] xny$ CFG y$ [B] y′y′′$ CFG y′′$ e(An+1) = AnB</p>
      <p>[An+1] xny′y′′$ CFG y′′$
where A0 = A and n ≥ 0. We have A1 ∈ Last(A0), A2 ∈ Last(A1), etc.,
An ∈ Last(An−1), and B ∈ Next(An), which means B ∈ Follow(A). From
[B] y′y′′$ CFG y′′$ we have y′ ∈ L(B); since y = y′y′′ and y′ ̸= ", y$ has a
nonempty prefix in L(B) ⊆ L(Follow(A)). ⊓⊔
Proposition 7. The two interpretations of G are equivalent if for every Choice
A 7→ e1|e2 ∈ G:
{ " ∈= L(e1).
{ If " ∈= L(e2), there exist First1 ∈ F IRST (e1) and First2 ∈ F IRST (e2)
such that L(First1) ≍ L(First2).
{ If " ∈ L(e2), there exist First1 ∈ F IRST (e1), First2 ∈ F IRST (e2), and
Follow ∈ F IRST (Follow(A))
such that L(First1) ≍ L(First2 ∪ Follow).</p>
      <p>Proof. Suppose the required First1, First2, and Follow exist. From Lemma 5
we have L(First1) ⊑ L(e1) and L(First2) ⊑ L(e2). From Lemmas 5 and 6 we
have L(Follow) ⊑ Tail(A). The stated result follows from Proposition 5. ⊓⊔
9</p>
    </sec>
    <sec id="sec-9">
      <title>Final Remarks</title>
      <p>We have shown that a direct transcription of an EBNF grammar to equivalent
PEG is possible for grammars outside the LL(1) class. These are the grammars
where the choice of the way to proceed is made by examining the input within
the reach of one parsing procedure instead of examining a single letter. One can
call them ”LL(1P)” grammars, the ”1P” meaning ”one procedure”. However,
checking the conditions stated by Proposition 7 is not as simple as verifying
the LL(1) property. The families F IRST can be constructed in a mechanical
way, and are presumably not very large. But, checking the disjointness of their
members may be difficult. It becomes more difficult as one moves up the graph
in Figure 3; at the top of that graph we come close to the condition (1).</p>
      <p>
        If the sets First1, First2, and Follow satisfying Proposition 7 exist, the
information can be used for the improvement suggested by Mizushima et al.
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The set First1 lists the alternative parsing procedures that will be
eventually invoked to start the processing of e1. If a procedure P ∈ First1 succeeds,
but the further processing of e1 fails, the conditions of Proposition 7 mean that
e2 is doomed to fail, unless it succeeds on empty string. But in this case, the
subsequent processing of Tail(A) will fail. One can thus insert a ”cut” operator
after P to save memoization and an unnecessary attempt at e2.
      </p>
      <p>The class of EBNF grammars that can be directly transcribed to PEG
includes cases where the choice of the way to proceed is made by looking at the
input within the reach of more than one parsing procedure. The following is an
example of such grammar:</p>
      <p>InputLine → Assignment | Expression
Assignment → Name "=" Expression
Expression → Primary ([+-] Primary)∗
Primary → Name | Number
Name → [a-z]+</p>
      <p>
        Number → [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7">0-9</xref>
        ]+
Here both alternatives of InputLine may begin with Name. It is the next
procedure after Name, namely one for "=", that decides whether to proceed or
backtrack. This class of grammars, which may be called ”LL(2P)”, requires further
analysis. Propositions 3, 4, and 5 still apply there.
      </p>
      <p>Last but not least: it appears that natural semantics, as used by Medeiros
to define the meaning of a grammar, is a very convenient tool to investigate the
grammar’s properties.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ford</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Packrat parsing: a practical linear-time algorithm with backtracking</article-title>
          .
          <source>Master's thesis</source>
          , Massachusetts Institute of Technology (
          <year>2002</year>
          ) http://pdos.csail.mit.edu/papers/packrat-parsing
          <string-name>
            <surname>:</surname>
          </string-name>
          ford-ms.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ford</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Parsing expression grammars: A recognition-based syntactic foundation</article-title>
          . In Jones, N.D.,
          <string-name>
            <surname>Leroy</surname>
          </string-name>
          , X., eds.
          <source>: Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL</source>
          <year>2004</year>
          , Venice, Italy,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2004</year>
          )
          <fpage>111</fpage>
          -
          <lpage>122</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Redziejowski</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          :
          <source>Some aspects of Parsing Expression Grammar. Fundamenta Informaticae</source>
          <volume>85</volume>
          (
          <issue>1-4</issue>
          ) (
          <year>2008</year>
          )
          <fpage>441</fpage>
          -
          <lpage>454</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Medeiros</surname>
          </string-name>
          , S.: Correspondˆencia entre PEGs e Classes de Gram´aticas Livres de Contexto.
          <source>PhD thesis</source>
          , Pontif´ıcia Universidade Cat´olica do Rio de Janeiro (
          <year>2010</year>
          ) http://www2.dbd.puc-rio.br/pergamum/tesesabertas/0611957 10 pretextual.pdf http://www2.dbd.puc-rio.br/pergamum/tesesabertas/0611957 10 cap 01.pdf http://www2.dbd.puc-rio.br/pergamum/tesesabertas/0611957 10 cap 02.pdf etc. http://www2.dbd.puc-rio.br/pergamum/tesesabertas/0611957 10 cap 05.pdf http://www2.dbd.puc-rio.br/pergamum/tesesabertas/0611957 10 postextual.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mizushima</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maeda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamaguchi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Packrat parsers can handle practical grammars in mostly constant space</article-title>
          . In Lerner, S.,
          <string-name>
            <surname>Rountev</surname>
          </string-name>
          , A., eds.
          <source>: Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering</source>
          , PASTE'
          <fpage>10</fpage>
          , Toronto, Ontario, Canada, June 5-6,
          <year>2010</year>
          , ACM (
          <year>2010</year>
          )
          <fpage>29</fpage>
          -
          <lpage>36</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Schmitz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Modular syntax demands verification</article-title>
          .
          <source>Technical Report ISRN I3S/RR 2006-32-FR</source>
          ,
          <string-name>
            <surname>Laboratoire</surname>
            <given-names>I3S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sophia Antipolis</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Tremblay</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sorenson</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          :
          <article-title>The Theory and Practice of Compiler Writing</article-title>
          .
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>