=Paper=
{{Paper
|id=Vol-1698/CS&P2016_01_Redziejowski_Trying-to-understand-PEG
|storemode=property
|title=Trying to Understand PEG
|pdfUrl=https://ceur-ws.org/Vol-1698/CS&P2016_01_Redziejowski_Trying-to-understand-PEG.pdf
|volume=Vol-1698
|authors=Roman Redziejowski
|dblpUrl=https://dblp.org/rec/conf/csp/Redziejowski16
}}
==Trying to Understand PEG==
Trying to understand PEG Roman R. Redziejowski roman@redz.se Abstract. Parsing Expression Grammar (PEG) encodes a recursive- descent parser with limited backtracking. Its properties are useful in many applications. In its appearance, PEG is almost identical to a gram- mar in the Extended Backus-Naur Form (EBNF), but may define a dif- ferent language. Recent research formulated some conditions under which PEG is equivalent to its interpretation as EBNF. However, PEG has a useful feature, namely syntactic predicate, that is alien to EBNF. The equivalence results apply thus only to PEG without predicates. The pa- per considers PEG with predicates. Not being able to investigate equiv- alence, the paper turns to the limited backtracking that is the main source of difficulty in understanding PEG. It is shown that the limita- tion of backtracking has no effect under conditions similar to those for PEG without predicates. There is, in general, no mechanical way to check these conditions, but they can be often checked by inspection. The paper outlines an experimental tool to facilitate such inspection. 1 Introduction Parsing Expression Grammars (PEGs) have been introduced by Ford in [3] as a new formalism for describing syntax of programming languages. The formal- ism encodes a recursive-descent parser with limited backtracking. Backtracking removes the LL(1) restriction usually imposed on top-down parsers. The back- tracking being limited makes it possible for the parser to work in a linear time, which is achieved with the help of ”memoization” or ”packrat” technology de- scribed in [1, 2]. In addition to circumventing the LL(1) restriction, PEG can be used to define parsers that do not require a separate ”scanner” or ”tokenizer”. All this makes it useful, but PEG is not well understood as a language definition tool. Literature contains many examples of surprising behavior. In its appearance, PEG is almost identical to a grammar in the Extended Backus-Naur Form (EBNF). Few minor typographical changes convert EBNF to PEG. As EBNF is familiar to most, one expects that the identically-looking PEG defines the same language. This is often the case, but the confusion comes when it is not. In [6], the author tried to construct exact formulas for the language defined by a given PEG. This was a failure as the formulas became extremely complex with increasing complexity of the grammar. In a pioneering work [5]1 , Medeiros used ”natural semantics” to describe both PEG and EBNF. Using this approach, he demonstrated that any EBNF grammar satisfying the LL(1) condition defines exactly the same language as its PEG counterpart. In [7], the author extended this result to a much wider class of grammars. However, there is no general way to mechanically check the conditions specified there. Some manual methods were suggested, based mainly on inspection. PEG has one feature that does not have a counterpart in EBNF: the syntactic predicate. All results about the equivalence of PEG and EBNF must be thus restricted to PEG without predicates. And this is the case for all results from [5, 7]. But, predicates are useful in defining some features of the language like the distinction between keywords and identifiers. This paper is an attempt to better understand PEG even in the presence of predicates. Because one can no longer investigate the equivalence of PEG and EBNF, we concentrate on the main source of confusion: the limited backtracking. We find that limited backtracking has no effect on choice expressions that can be identified as ”disjoint”. The conditions for disjointness are similar to those from [7], with LL(1) as the strongest one. Again, there is no general mechanical way to check disjointness. We outline an experimental tool, called PEG Analyzer, that combines the LL(1) test with heuristics to investigate disjointness of choice expressions. We start by recalling, in Section 2, the definition of Parsing Expression Gram- mar and its formal semantics. In Section 3, we discuss limited backtracking and disjoint expressions. Section 4 introduces the PEG Analyzer with the help of three examples. The results obtained in Section 3 require a modification to the relation Follow known from the classical literature. This modification involves a rather tedious treatment not relevant for the main subject, so it is presented separately in Section 5. Finally, Section 6 contains few comments. 2 The grammar We start with a simplified Parsing Expression Grammar G over alphabet Σ. The grammar is a set of rules of the form A = e where A belongs to a set N of symbols distinct from the letters of Σ and e is an expression. Each expression is one of these: ε (”empty”), !e (”predicate”), a ∈ Σ (”terminal”), e1 e2 (”sequence”), A ∈ N (”nonterminal”), e1 | e2 (”choice”), where each of e1 , e2 , e is an expression. The set of all expressions is in the fol- lowing denoted by E. There is exactly one rule A = e for each A ∈ N . The expression e appearing in this rule is denoted by e(A). The predicate operator binds stronger than sequence and sequence stronger than choice. 1 This work is in Portuguese. An extended English version is available in [4]. The expressions represent parsing procedures, and rules represent named parsing procedures. In general, parsing procedure is applied to an input string from Σ ∗ and tries to recognize an initial portion of that string. If it succeeds, it returns ”success” and usually consumes the recognized portion. Otherwise, it returns ”failure” and does not consume anything. The actions of different procedures are specified in Figure 1. ε Indicate success without consuming any input. a If the text ahead starts with a, consume a and return success. Otherwise return failure. A Call e(A) and return result. !e Call e. Return failure if succeeded. Otherwise return success without consuming any input. e1 e2 Call e1 . If it succeeded, call e2 and return success if e2 succeeded. If e1 or e2 failed, backtrack: reset the input as it was before the invocation of e1 and return failure. e1 | e2 Call e1 . Return success if it succeeded. Otherwise call expression e2 and return success if e2 succeeded or failure if it failed. Fig. 1. Actions of expressions as parsing procedures We note the limited backtracking: once e1 in e1 | e2 succeeded, e2 will never be tried. The backtracking done by the sequence expression may only roll back e1 | e2 as a whole. The actions of parsing procedures can be formally defined using ”natural semantics” introduced in [4, 5]. For e ∈ E, we write [e] xy PEG y to mean that e applied to string xy consumes x, and [e] x PEG fail to mean that e fails when applied to x. One can see that [e] xy PEG y, respectively [e] x PEG fail , holds if and only if it can be formally proved using the inference rules shown in Figure 2. The PEG parser may end up in an infinite recursion, the well-known nemesis of top-down parsers. Formally, it means that there is no proof according to the rules of Figure 2. It has been demonstrated that if the grammar G is free from left-recursion, then for every e ∈ E and x ∈ Σ ∗ there exists a proof of [e] x PEG fail or [e] x PEG y for some y ∈ Σ ∗ . This has been shown in [4,5] by checking that PEG defined by natural semantics is equivalent to that defined by Ford in [3] and using the result from there. An independent proof for grammar without predicates is given in [7]. It is easily extended to grammar with predicates. We assume from now on that G is free from left-recursion. For e ∈ E, we denote by L(e) the set of words x ∈ Σ ∗ such that [e] xy PEG y for some y ∈ Σ ∗ . This is the language accepted by e. Note that, in general, x ∈ L(e) does not mean [e] xy PEG y for each y. [e(A)] xy PEG Y PEG (empty) (rule) [ε] x x [A] xy PEG Y [e] xy PEG y [e] x PEG fail PEG (not1) (not2) [! e] xy fail [! e] x PEG x b ̸= a PEG (letter1) (letter2) (letter3) [a] ax x [b] ax PEG fail [a] ε PEG fail [e1 ] xyz PEG yz [e2 ] yz PEG Z [e1 ] x PEG fail (seq1) (seq2) [e1 e2 ] xyz PEG Z [e1 e2 ] x PEG fail [e1 ] xy PEG y [e1 ] xy PEG fail [e2 ] xy PEG Y (choice1) (choice2) [e1 | e2 ] xy PEG y [e1 | e2 ] xy PEG Y where Y denotes y or fail and Z denotes z or fail . Fig. 2. Formal semantics of PEG We define the EBNF interpretation of G as the language LE (e) accepted by expression e ∈ E. It is defined recursively as LE (ε) = {ε}, LE (! e) = {ε}, LE (a) = {a}, LE (e1 e2 ) = LE (e1 )LE (e2 ), LE (A) = LE (e(A)), LE (e1 | e2 ) = LE (e1 ) ∪ LE (e2 ). By defining LE (! e) = {ε}, we extended the interpretation to PEG with predi- cates. This is not what one can expect looking at the grammar, just an approx- imation. The following result from [4, 5] is easily extended to our interpretation of predicates: L(e) ⊆ LE (e) for any e ∈ E . (1) The opposite of (1) does not, in general hold. This is, to some extent, due to the different interpretation of predicates, but the main cause is limited back- tracking. 3 Disjoint choice and limited backtracking We say that choice e = e1 | e2 is strictly disjoint if L(e1 )Σ ∗ ∩ L(e2 )Σ ∗ = ∅. (2) Such choice is not affected by the limitation of backtracking. Indeed, suppose that e is applied to some string s and e1 succeeds. This means s ∈ L(e1 )Σ ∗ . After this, any attempt to backtrack must result in a failure. It would mean applying e2 to s, and a success would mean s ∈ L(e2 )Σ ∗ , which is excluded by (2). So, not attempting it does not make any difference. The choice aa|a is not disjoint in the sense of (2). However, it is not af- fected by partial backtracking when it appears in some contexts, for example, in (aa|a)b. A success of aa means that input has the form aaΣ ∗ , so backtracking to a is useless as it will later result in a failure by not finding b. We are going to look at limited backtracking in the context of our grammar G successfully parsing a complete input string. Thus, we assume that G has a unique start symbol S ∈ N with the corresponding rule S = e # where e is an expression and # is a unique end-of-text marker that appears only in this rule. We say that a string w ∈ Σ ∗ is accepted by S to mean that [S] w PEG ε. For an expression e ∈ E, we define Tail(e) to be the set of strings y such that [e] xy PEG y appears as partial result in the proof of [S] w PEG ε for some w. We say that the choice e = e1 | e2 is disjoint (in the context of G) if L(e1 )Σ ∗ ∩ L(e2 ) Tail(e) = ∅. (3) The same argument as before shows that such choice is not affected by the limitation of backtracking. If e is applied to some string s in a proof [S] w PEG ε and e1 succeeds, we have s ∈ L(e1 )Σ ∗ . An attempt to backtrack would mean applying e2 to s, and a success would mean s ∈ L(e2 ) Tail(e), which is excluded by (3). A mechanical checking of (3) is in general impossible because of complexity of L(e) and Tail(e), and because it may involve checking emptiness of intersection of context-free languages - known to be undecidable. However, it can be often checked using approximation. With (1), we can approximate L(e1 ) and L(e2 ) by LE (e1 ) respectively LE (e2 ). This gives a stronger condition: LE (e1 )Σ ∗ ∩ LE (e2 ) Tail(e) = ∅. (4) It was shown in [7] that if this condition holds for all choice expressions in a grammar without predicates, we have L(e) = LE (e) for all e ∈ E. To approximate Tail(e), we need to modify the relation Follow known from the classical literature. The modification is described in Section 5, where it is shown (Proposition 1) that with the modified relation, we have Tail(e) ⊆ LE (Follow(e))Σ ∗ where LE (Follow(e)) = ∪x∈Follow(e) LE (x). This gives an even stronger condition: LE (e1 )Σ ∗ ∩ LE (e2 )LE (Follow(e))Σ ∗ = ∅. (5) Using known methods one can compute the sets of symbols that appear as first in strings from LE (e1 )Σ ∗ respectively LE (e2 )LE (Follow(e))Σ ∗ . Condition (5) is then obviously satisfied if these sets are disjoint. This is the familiar LL(1) condition. As suggested in [7, 8], one can obtain a weaker condition by approximating LE (e1 )Σ ∗ and LE (e2 )LE (Follow(e))Σ ∗ with sets of the form F Σ ∗ where F is some suitably chosen subset using the classical relation First. Some ways of choosing F have been suggested, but they can not, in general, be mechanized. Another approximation was suggested by Schmitz in [9]. The grammar G considered up to now is a simplified version of full PEG. This latter allows expressions such as e1 | e2 | . . . | en , e1 e2 . . . en , e∗ , e+ , and e? . The expression E = e1 | e2 | . . . | en is a syntactic sugar for E = e1 | E1 , E1 = e2 | E2 , . . . , En = en so (3) must hold for all of E, E1 , . . . , En−1 . One can verify that this is true if L(ei )Σ ∗ ∩ L(ej ) Tail(E) = ∅ for 1 ≤ i < j < n. (6) The expressions E = e∗ , E = e+ , and E = e? constitute syntactic sugar for, respectively E = eE/ε, E = eE/e, and E = e/ε so (3) must hold for each of them. One can verify that this is true if L(e)Σ ∗ ∩ Tail(E) = ∅. (7) The rules for computing Follow(e) given in the Section 5 can be similarly extended to the full PEG. The terminals in full PEG are not necessary single letters, and may be multi- letter quoted strings. Instead of sets of ”first letters” used in the test for LL(1), one has to compute sets of ”first terminals” and check their disjointness. 4 PEG Analyzer Giving up all hope for an automatic verification of (3), the author created an ex- perimental tool, the PEG Analyzer, that combines the LL(1) check with heuris- tics. It takes a grammar, tests all choice expressions for LL(1) using (5), and presents for inspection those that did not pass the test. To facilitate inspec- tion, it gradually expands the involved expressions by replacing them with their definitions, somewhat in the spirit of what was suggested in [7, 8]. 4.1 Example 1: Simple calculator To give some idea of the Analyzer, we apply it to the grammar shown in Figure 3. The grammar defines the syntax of a simple calculator. When Analyzer is applied to this grammar, it indicates that the choice be- tween the first two alternatives of Factor does not satisfy LL(1), and opens a win- dow shown in Figure 4. It is an invitation to verify (3) for e1 = Digits? Fraction, e2 = Digits, and Tail(Factor). The first two lines show these two expressions. The second is, in fact, a pseudo-expression, with pseudo-expression Tail(Factor) representing the tail. The third line tells that both expressions have [0-9] as ”first terminal”. Start = Sum # Sum = Product (AddOp Product)* Product = Factor (MultOp Factor)* Factor = Digits? Fraction | Digits | Lparen Sum Rparen Fraction = "." Digits AddOp = [-+] MultOp = [*/] Lparen = "(" Rparen = ")" Digits = [0-9]+ Fig. 3. A simple calculator Factor.1 = Digits? Fraction Factor.2 = Digits Tail(Factor) [0-9] <==> [0-9] Digits? Fraction Digits Fraction [0-9] [0-9]* "." Digits <==> Digits Tail(Factor) Digits (AddOp Product | MultOp Factor | Rparen) ... [0-9] [0-9]* (AddOp Product | MultOp Factor | Rparen) ... Fig. 4. Presentation of a non-LL(1) case The subsequent lines show the two expressions in more detail. Thus, the first expression stands for two alternative expressions, Digits Fraction and Fraction. Since this latter does not start with [0-9], only Digits Fraction appears, with an indentation showing that it is one of alternatives. In the next line, Digits is expanded following its definition to [0-9] [0-9]* and Fraction to "." Digits. The result is supposed to give an idea of strings starting with [0-9] that are accepted by Digits? Fraction. In the second expression, Tail(Factor) is replaced by the approximation LE (Follow(Factor))Σ ∗ in the form of pseudo-expression. Again, Digits is ex- panded to [0-9] [0-9]*. Verifying (3) means checking if any string represented by e1 can be a prefix of any string in L(e2 ) Tail(Factor). One can easily see that Digits in the first expression is always followed by a dot, while in the second it can be only followed by AddOp, MultOp, or Rparen, none of which is a dot. The condition (3) is thus satisfied. 4.2 Example 2: Grammar with predicates The second example illustrates treatment of predicates. The grammar in Figure 5 is a fragment of larger grammar that uses identifiers, with some of them being reserved as ”keywords”. Only one keyword, "print" is shown. Its definition is followed by ! Letter to make sure it is not recognized as a prefix of an identifier. The definition of Identifier is preceded by ! Keyword to ensure that keyword is not recognized as an identifier. Statement = (Keyword Number | Identifier Number) ";" # Keyword = "print" !Letter Identifier = !Keyword Letter+ Letter = [a-z] Number = [0-9]+ Fig. 5. Grammar with predicates The Analyzer applied to this grammar indicates that the choice in Statement does not satisfy LL(1), and opens the window shown in Figure 6. To check LL(1), the Analyzer approximated L("print"! Letter) with LE ("print") and L(! Keyword Letter+) with LE (Letter+): L("print"! Letter) ⊆ LE ("print"!Letter) = LE ("print") , L(! Keyword Letter+) ⊆ LE (! Keyword Letter+) = LE (Letter+) . It found their first terminals to be, respectively, "print" and [a-z]. As they are not disjoint, the choice does not satisfy LL(1) and is signaled as such. But, this is a false alarm. One can easily see that L("print" ! Letter ...) ∩ L(! Keyword Letter+ ...) = ∅ showing that the expression satisfies (3). Statement.1.1 = Keyword Number Statement.1.2 = Identifier Number Tail(Statement.1) "print" <==> [a-z] Keyword Number "print" !Letter Number <==> Identifier Number Tail(Statement.1) Identifier Number ";" ... !Keyword Letter+ Number ";" ... Fig. 6. Presentation of a non-LL(1) case 4.3 Example 3: Non-disjoint expressions Suppose now that in the calculator from Example 1, we want sometimes to skip the multiplication sign and write, for example 2(.3+4) instead of 2*(.3+4). To achieve this, we replace the definition of MultOp by MultOp = "*"? | "/". The Analyzer applied to the modified grammar shows now two cases not satisfying LL(1). The first produces the window shown in Figure 7. It says that [0-9] in [0-9]+ is followed by something that may start with [0-9], namely any of two different alternatives of Factor after omitted first alternative of MultOp. Clearly, [0-9] is a prefix of each alternative. The condition (3) is not satisfied. Digits.1 = [0-9] Tail(Digits) [0-9] <==> [0-9] [0-9] <==> Tail(Digits) MultOp Factor ... Factor ... Digits? Fraction ... Digits Fraction ... [0-9] [0-9]* Fraction ... Digits ... [0-9] [0-9]* ... Fig. 7. Presentation of a non-LL(1) case How does this happen and what does it mean? A look at the grammar shows that the offending [0-9]+ is one appearing at the end of the first Factor in Factor (MultOp Factor)*. With omitted MultOp it will gobble up any digits at the beginning of the second Factor, without any attempt to backtrack. Thus, for example, 234 will be treated by PEG as a single Factor, and not as shorthand for 2*3*4. In this example, the grammar interpreted as EBNF has an ambiguity, and PEG just selects one of the possible parses. The second case not satisfying LL(1) is the same as for the original grammar: the choice between the first two alternatives of Factor, and is reported exactly as shown in Figure 4. However, MultOp in MultOp Factor that appears in the tail of Factor may be omitted. And Factor has an alternative that begins with a dot. Thus, Digits in Digits Tail(Factor] can be followed by a dot and (3) is not satisfied. It means that, for example, 2.34 will be treated by PEG as a single Factor, and not as shorthand for 2*.34. Here the grammar interpreted as EBNF has another ambiguity and PEG chooses one possible parse. In both cases, we may accept the PEG’s choice because it corresponds to the perception of a human reader. 5 Computation of Follow We define a number of relations R ⊆ E×E, writing e′ ∈ R(e) to mean (e, e′ ) ∈ R. • Derivess (e) is the set of all e′ ∈ E such that [e′ ] x′ y PEG y can be derived from [e] xy PEG y using one inference rule. Thus, e′ ∈ Derivess (e) if and only if: – e′ = A ∈ N where e(A) = e, – e′ = e e2 for some e2 such that ε ∈ L(e2 ), – e′ = e1 e for some e1 , – e′ = e| e2 for some e2 , – e′ = e1 | e for some e1 . • Derivesf (e) is the set of all e′ ∈ E such that [e′ ] xy PEG fail can be derived from [e] xy PEG y using one inference rule. Thus, e′ ∈ Derivesf (e) if and only if: – e′ =! e, – e′ = e e2 for some e2 . • Deriveff (e) is the set of all e′ ∈ E such that [e′ ] x PEG fail can be derived from [e] x PEG fail using one inference rule. Thus, e′ ∈ Deriveff (e) if and only if: – e′ = A ∈ N where e(A) = e, – e′ = e1 e for some e1 , – e′ = e e2 for some e2 , – e′ = e| e2 for some e2 , – e′ = e1 | e for some e1 that can fail. • Nexts (e) is the set of all e′ ∈ E such that ε ∈ / L(e′ ) and there exists e e′ ∈ E. • Nextf (e) = {ε} for all e such that there exists ! e ∈ E or e| e2 ∈ E for some e2 . Define Follow = Derive∗ss × Nexts ∪ Derive∗ss × Derivesf × Derive∗ff × Nextf . Proposition 1. For each partial result [e] xy PEG y∪in the proof of [S] w PEG ε holds y ⊆ LE (Follow(e))Σ ∗ where LE (Follow(e)) = e′ ∈Follow(e) LE (e′ ). Proof. Consider any partial result [E1 ] xy PEG y in the proof of [S] w PEG ε. It is the first in a chain of n ≥ 1 partial results derived successively using the rules of Figure 2 and other partial results. The chain ends with final result [S] w PEG ε derived from [En #] xn # PEG ε. In this chain, the first j ≥ 1 partial results are of the form [Ei ] xi y PEG y. By definition of Derivess , we have Ei ∈ Derive∗ss (E1 ) for 1 ≤ i ≤ j. The first partial result in a different form must be one of these: (a) [Ej e2 ] xj uv PEG v where y = uv, u ̸= ε, and [e2 ] uv PEG v. (b) [! Ej ] xj y PEG fail . (c) [Ej e2 ] xj y PEG fail where [e2 ] xj y PEG fail . In case (a) we have e2 ∈ Nexts (Ej ), so e2 ∈ (Derive∗ss × Nextf )(E1 ) ⊆ Follow(E1 ). We have also y = uv where u ∈ L(e2 ) ⊆ LE (e2 ) ⊆ LE (Follow(E1 )), so y ∈ LE (Follow(E1 ))Σ ∗ . In cases (b) and (c), we have partial result [Ej+1 ] x PEG fail . According to the definition of Derivesf , we have Ej+1 ∈ Derivesf (Ej ). It is first in the chain of k ≥ 1 partial results in the form [Ei ] x PEG fail derived successively using the rules of Figure 2 and other partial results. By definition of Deriveff , we have Ei ∈ Derive∗ff (Ej ) for j + 1 ≤ i ≤ j + k. The first partial result in a different form must be one of these: (d) [! Ej+k ] uv PEG uv. (e) [Ej+k | e2 ] uv PEG v where [e2 ] uv PEG v, where uv = x. We only know that the original y is a suffix of uv, but is very unlikely to be the same as v. We can only approximate it as y ∈ Σ ∗ . In each of (d)-(e), we have Nextf (Ej+k ) = {ε}, so {ε} = (Derive∗ss × Derivesf × Derive∗ff × Nextf )(E1 ) ⊆ Follow(E1 ). We have ε ∈ LE (Follow(E1 )), so y ∈ LE (Follow(E1 ))Σ ∗ . ⊓ ⊔ Note that the very rough approximation of Tail(e) by Σ ∗ , changing (5) into a strict disjointness, applies to e that appears in a partial proof resulting in a failure (which is eventually needed to prove [S] w PEG ε). 6 Final remarks The fact that EBNF does not have predicates spoils the useful correspondence with PEG. The basic conditions for absence of effects of limited backtracking remain in principle unchanged, but there is no way of talking about equivalence. There is no way to just include the recognition-oriented predicates as part of the construction-oriented EBNF. But one may consider some more natural extensions to EBNF that could serve the same purpose as predicates in defining useful grammars. The example of PEG Analyzer shows that disjointness can often be checked by inspection. The tool as described here is quite primitive. One can improve it by letting the user interactively choose specific parts of expressions for detailed inspection. The subject for further research is a careful analysis of what happens in the case of non-disjoint choice. References 1. Ford, B.: Packrat Parsing: a Practical Linear-Time Algorithm with Backtracking. Master’s thesis, Massachusetts Institute of Technology (Sep 2002), http://pdos.csail.mit.edu/papers/packrat-parsing:ford-ms.pdf 2. Ford, B.: Packrat parsing: simple, powerful, lazy, linear time, functional pearl. In: Wand, M., Jones, S.L.P. (eds.) Proceedings of the Seventh ACM SIGPLAN Inter- national Conference on Functional Programming (ICFP ’02), Pittsburgh, Pennsyl- vania, USA, October 4-6, 2002. pp. 36–47. ACM (2002) 3. Ford, B.: Parsing expression grammars: A recognition-based syntactic foundation. In: Jones, N.D., Leroy, X. (eds.) Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2004. pp. 111–122. ACM, Venice, Italy (14–16 January 2004) 4. Mascarenhas, F., Medeiros, S., Ierusalimschy, R.: On the relation between context- free grammars and Parsing Expression Grammars. Science of Computer Program- ming 89, 235–250 (2014) 5. Medeiros, S.: Correspondência entre PEGs e Classes de Gramáticas Livres de Con- texto. Ph.D. thesis, Pontifı́cia Universidade Católica do Rio de Janeiro (Aug 2010) 6. Redziejowski, R.R.: Some aspects of Parsing Expression Grammar. Fundamenta Informaticae 85(1–4), 441–454 (2008) 7. Redziejowski, R.R.: From EBNF to PEG. Fundamenta Informaticae 128, 177–191 (2013) 8. Redziejowski, R.R.: More about converting BNF to PEG. Fundamenta Informaticae 133(2-3), 177–191 (2014) 9. Schmitz, S.: Modular syntax demands verification. Tech. Rep. I3S/RR-2006-32-FR, Laboratoire I3S, Université de Nice - Sophia Antipolis (Oct 2006)