1 Introduction

On Separations of LR(0)-Grammars by Two Types of Pumping Patterns

Martin Plátek

martin.platek@mff.cuni.cz 0

František Mráz

Dana Pardubská

Daniel Pru˚ša

Jirˇí Šíma

3 0 Charles University, Department of Computer Science Malostranské nám. 25, 118 00 PRAHA 1 , Czech Republic 1 Comenius University in Bratislava, Department of Computer Science Mlynská Dolina , 84248 Bratislava , Slovakia 2 Czech Technical University, Department of Cybernetics Karlovo nám. 13, 121 35 Prague 2 , Czech Republic 3 Institute of Computer Science of the Czech Academy of Sciences , P. O. Box 5, 18207 Prague 8 , Czech Republic

We present two types of pumping patterns that allow a total separation inside the class of LR(0)grammars. Using the same type of pumping patterns, we obtain a total separation inside of linear LR(0)-grammars. This type of study has a long-term motivation from computational linguistics and the area of syntactic error localization. A recent motivation also comes from the field of formal models of neural networks.

1 Introduction

This paper follows the paper [ 6 ], where we have studied restarting automata recognizing deterministic contextfree languages (DCFL) with a particular type of pumping patterns. This type of pumping patterns ensures the nonregularity of the recognized languages.

In this paper, we are looking for two types of pumping patterns. The first type of pumping patterns should ensure the non-regularity of the languages recognized by a certain type of restarting automata. In contrast, the second type of pumping patterns should ensure the regularity of the languages recognized by this type of restarting automata. The union of both of these types of pumping patterns should cover all possible pumping patterns defined by the mentioned type of restarting automata.

Any deterministic context-free language L can be accepted by a LR(1)-analyzer (using lookahead of size 1), but the language L f g

$ , where $ is a special end-marker not in the alphabet of L) can be accepted by a LR(0)analyzer [ 3 ]. In [ 4 ], it was shown that any deterministic context-free language can be accepted by a deterministic monotone restarting automaton. This was proved

Copyright ©2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). by simulating any given LR(0)-analyzer by a deterministic monotone restarting automaton. Hence, the computations of such automata were controlled by LR(0)grammars. Therefore, in what follows, we will use constructions based on properties of phrase structures generated by LR(0)-grammars and we will study pumping patterns based on LR(0)-grammars.

This paper should serve as a step to refine the concepts from [ 5 ], where it was shown that restarting automata can serve as a formal model for functional generative description (FGD) of a natural language. Functional generative description is a dependency-based descriptive system that has been developed since the 1960’s, see esp. [ 7 ]. FGD was originally implemented as a generative procedure, but later its authors have been interested in a more declarative representation. The subject of the paper [ 5 ] concerns the foundations of a reduction system using a complex restarting automaton. The reduction system is more complex than a reduction system for a (shallow) syntactic analyzer since it provides not only the possibility of checking the well-formedness of the (surface) analysis of a sentence, but also its meaning – tectogrammatical representation in terms of FGD. Such a reduction system makes it possible to define the analysis as well as the synthesis of a sentence formally.

A descriptive system [ 5 ] DS of a natural language L should determine: (a) The set LC of all correct sentences of the language L. (b) The set LM of all correct meaning descriptions of the language L. LM represents all meanings of all sentences in LC. (c) A relation SH LC LM between LC and LM. The relation describes the ambiguity and the synonymy of L.

Let us note that LM is an artificial (formal) unambiguous language, which can be described by a restarting automaton controlled by an LR(0)-grammar. An important feature modeled by LM is valency. More about valency can be found in [ 1 ]. There are types of valency features that can be modeled as pumping patterns of non-regular type (e.g., obligatory adjuncts), and there are other valency features that can be modeled as pumping patterns of regular type (e.g., optional adjuncts).

Another recent motivation for this paper comes from a completely different area. The last results of this paper should support the analysis of analog neuron hierarchy of binary-state neural networks (NNs) with an increasing number of extra analog-state neurons, which has been introduced in [ 8 ] for studying the power of NNs with realistic weights between integers and rational numbers with respect to the Chomsky hierarchy.

Note that the problem of the regularity of deterministic context-free languages gained attention already in the 60s. Algorithms deciding whether a given deterministic pushdown automaton accepts a regular language were proposed by Stearns [ 9 ] and Valiant [ 10 ].

This paper is structured as follows. The next section contains basic notions and basic properties of LR(0)grammars. Section 3 introduces the concepts of our interest and presents the first main result. Section 4 presents the results about total separations. The paper concludes with a summary of future work to be done to fulfill all our plans. 2

Basic Notions

We suppose that the reader is familiar with the basics of formal language and automata theory presented, e.g., in [ 3 ]. For technical reasons, we give a list of some notions and definitions related to formal grammars.

Context-free grammars. A context-free grammar G is defined by a 4-tuple G = (V; S; R; S) where S V , N = V n S, and N; S are finite sets of nonterminal and terminal symbols, respectively, R N V is a finite relation of rewrite (production) rules, and S 2 N is the start symbol. We use the usual notation A ! b 2 R for a production rule (A; b ) 2 R. This rule can be applied to u = u1Au2 where u1; u2 2 V , yielding v = u1b u2, which is denoted as u ) v, or u )G v. We write u )R v if the nonterminal A substituted by u ) v is the rightmost nonterminal in u.

The grammar G generates the context-free language L(G) = fw 2 S j S ) wg where ) ()R ) is the reflexive transitive closure of the binary relation ) ()R) on V . Note that L(G) can also be equivalently defined as L(G) = fw 2 S j S )R wg.

We write u ) p v for some non-negative integer p if u ) v and v is derived from u by at most p derivation steps ()). A similar meaning has the denotation u )R p v. In what follows, we will primarily work with the rightmost derivations and write ) instead of )R.

Any non-empty context-free language L = L(G) 6= 0/ can be generated by a reduced context-free grammar G that excludes unreachable and unproductive symbols, that is, for every A 2 N, there exist a; b 2 V such that S ) aAb , and for every A 2 N there is w 2 S such that A ) w, respectively. In the following, we work with reduced context-free grammars only.

We say that a context-free grammar G = (V; S; R; S) is linear if the right-hand side of any rule from R contains at most one variable (nonterminal).

For any non-empty word a1 an, with a1; : : : ; an 2 S, the derivation S ) a1 an can be described by a derivation tree which is an ordered tree satisfying: 1. Inner vertices are labeled with nonterminals. 2. The root is labeled with the start symbol S. 3. If inner vertex labeled with nonterminal A 2 N has children labeled (from left to right) with a1; a2; : : : ; ak 2 V then A ! a1a2 ak 2 R; A together with the subtrees rooted in its children form a complete branch of the tree. 4. The leaves in the tree are labeled (from left to right) with the terminal symbols a1; : : : ; an 2 S:

Removing condition 2 we get the definition of a (computation) derivation sub-tree.

A grammar G is unambiguous if every non-empty string w 2 L(G) has a unique derivation tree, or equivalently a unique rightmost derivation.

The results of our paper heavily depend on the characterization of the class of deterministic context-free languages by LR(0)-grammars and corresponding parsers. Following [ 3 ], we, therefore, give the necessary definitions and the most relevant results.

Definition 1. Let G = (V; S; R; S) be a context-free grammar, N = V n S, g 2 V : A handle of g is an ordered pair (r; i), r 2 R; i 0 such that there exists A 2 N; a; b 2 V and w 2 S such that (a) S )R aAw )R ab w = g, (b) r = A ! b , and (c) i = jab j.

Lemma 1 ([ 2 ]). Let G = (V; S; R; S) be a reduced contextfree grammar. Then G is unambiguous if and only if every a 2 V , such that S )R a, has exactly one handle except

S, which has none.

In general, the identification of a handle in a string is not uniquely defined, which is not true for LR(0) grammars. Definition 2. Let G = (V; S; R; S) be a reduced contextfree grammar such that S )R+ S is not possible in G. We say G is an LR(0)-grammar if, for each w; w0; x 2 S , h; a; a0; b ; b 0 2 V ; and A; A0 2 V n S, if (a) S )R aAw )R ab w = hw, and imply (A ! b ; jab j) = (A0 ! b 0; ja0b 0j)

Note that as a consequence of the above definition we have that A = A0, b = b 0, a = a0, g = ab = a0b 0 and x = w0. Thus, if G is an LR(0)-grammar, then the rightmost derivation of the word w by G and the left-right analysis is unique (deterministic). In this paper, we consider LR(0) grammars rather as analytical grammars. A language generated by an LR(0)-grammar is called LR(0)-language. Theorem 1 (LR(0)-language characterization theorem from [ 3 ]). Let L S . The following four statements are equivalent. (a) L is an LR(0) language. (b) L is a deterministic context-free language and for all x 2 S+, w; y 2 S , if w 2 L, wx 2 L, and y 2 L, then yx 2 L. (c) There exists a deterministic pushdown automaton A with a single final state q f and a pushdown symbol

Z f such that each word w 2 L is accepted by A by

entering the state q f with Z f as the only symbol in its pushdown. (d) There exist strict deterministic languages L0 and L1 such that L = L0L1.

Note that strict deterministic languages are deterministic context-free languages recognizable by empty pushdown, or equivalently prefix-free deterministic contextfree languages.

Let us recall semi-Dyck languages. Let r 1, Sr = fa1; : : : ; arg, S¯r = fa¯1; : : : ; a¯rg and S = Sr [ S¯r. The semiDyck language Dr is the language generated by the grammar Gr = (Vr; S; P; S), where Vr = S [ fSg, and R contains the following production rules:

S ! SaiSa¯iS j l , for each i; 1 i r:

Informally, Dr is the set of all well-balanced parentheses containing r different pairs of brackets [ 3 ]. The semi-Dyck languages are examples of context-free languages that are not linear context-free languages. Additionally, they can be used to separate LR(0)-languages from the strict deterministic languages.

Theorem 2 ([ 3 ]). Let Dr S be the semi-Dyck language for some r 1. Then Dr is an LR(0)-language, but not a strict deterministic language.

LR(0)-analyzer. If L is generated by an LR(0)grammar G, then there exists an LR(0)-analyzer P(G) for L with the following important properties (see [ 3 ]): (a) For each word w 2 L, there is exactly one rightmost derivation of w in G, which corresponds to the LR(0)analysis of the word w by P(G). (b) Let u be a prefix of the content of the input tape w = uv, which has already been read by a computation of P(G). Then there exists a suffix v0 such that uv0 2 L iff P(G) did not halt and not reject while reading u.

Let L S be a deterministic context-free language (DCFL), and $ 2= S. Then, according to [ 3 ], there are LR(0)-grammar G(0; $) generating the language L f g $ , and a corresponding LR(0)-analyzer P(G(0; $)) of L f g $ ; P(G(0; $)) is a deterministic push-down automaton with $ as the rightmost symbol on its input-tape.

To stress its property, we will sometimes write LR(0,$)grammar instead of grammar G(0; $). We denote the set of LR(0)-grammars by LRG and the set of LR(0,$)-grammars by LRG($).

Linear LRG. We say that a context-free grammar G is linear if the right-hand side of any rule from G contains at most one variable (nonterminal). We also consider linear LR(0)-grammars. We denote the linearity by the prefix lin-, and the class of linear LR(0)-grammars as lin-LRG. Classes of languages. In what follows, L (A), where A is some class of grammars, denotes the class of languages generated by grammars from A. E.g., the class of languages generated by linear LR(0)-grammars is denoted by L (lin-LRG). 3

Pumping by LR(0)-grammars

The first goal of our paper is to specify properties of deterministic (linear) context-free grammars that assure that the corresponding (linear) language is non-regular. In this section, we show such properties. We start with several definitions and notations.

Pumping notions. Let G = (V; S; R; S) be an LR(0)grammar generating (analyzing) the language L = L(G), and P(G) be the corresponding LR(0)-analyzer for G. Let w 2 L(G), w = xu1vu2y, where x; u1; v; u2; y 2 S , u1u2 2 S+ are given by the derivation tree Tw from Fig. 1. The proper sub-trees T1 and T2 of Tw are (computation) subtrees whose roots are labeled with the same nonterminal A, thus by replacing T1 with T2 properly inside of Tw, we again get a derivation tree, namely the derivation tree Tw(0) for the word w(0) = xvy 2 L(G).

Analogously, replacing T2 with a copy of T1, we get the derivation tree Tw(2) for a longer word w(2) = xu21vu22y. If we repeat i times such replacing of T2 with T1 we obtain the derivation tree Tw(i+1) for the word w(i + 1) = xui1+1vui2+1y. Pumping tree, prefix, infix, pattern and reduction. Let x; u1; v; u2; A; y; T1 be as on Fig. 1. Then we say that Pp = xu1vu2 is an (x; u1; A; v; u2)-pumping prefix by G, Pin = xu1vu2y is an (x; u1; A; v; u2; y)-pumping infix by G, and that T1 is a pumping tree of Pp. Recall that the LR(0)analysis by P(G) of the prefix Pp in any word of the form Ppg is the same, for any g 2 S . We say that (u1; A; v; u2) is a pumping pattern by G (of Pp). Let us recall that for any z 2 S , the LR(0)-analyzer P(G)) reduces xu1vu2z into xvz. We write xu1vu2z (P(G) xvz, and say that xu1vu2z (P(G) T2 xvz is an (x; u1; A; v; u2)-pumping reduction by G. Note that the word xu1vu2z needs not be a word from L(G). We will often say in the following that (x; u1; A; v; u2) is a pumping prefix and that (x; u1; A; v; u2; y) is a pumping infix.

Elementary infix etc. We say that an (x; u1; A; v; u2; y)pumping infix by G is elementary if the (x; u1; A; v; u2)pumping reduction is the only and, at the same time, the last pumping reduction by G that can be performed inside of the word xu1vu2y, i.e. xvy cannot be reduced by any pumping reduction by G at all. In this case, we say that (x; u1; A; v; u2) is an elementary pumping prefix, and that (u1; A; v; u2) is an elementary pumping pattern by G.

Let us note that the corresponding elementally pumping tree T1 does contain an internal node with the same nonterminal as the root of T1, and does not contain any other repetition of a nonterminal on any path from its root to a leaf. While the size (the number of terminal symbols) of a pumping infix by G is unbounded, there exists a constant c > 0 that depends on the number of terminals and nonterminals, and the length of rules of G such that any elementary pumping infix by G is of size at most c.

We can see the following obvious proposition that summarizes the (pumping) properties of LR(0)-grammars, which we will use in the following text. It is a direct consequence of the previous definitions and the properties of LR(0)-grammars and their analyzers summarized in [ 3 ], and inspired by [ 9 ].

Proposition 1. Let G = (V; S; R; S) be an LR(0)-grammar generating (analyzing) the language L = L(G). Let a word Pp = xu1vu2 be an (x; u1; A; v; u2)-pumping prefix by G, and xu1u1vu2u2 (P(G) xu1vu2 be an (xu1; u1; A; v; u2)-pumping reduction by G. Then (a) Any w 2 L determines its derivation tree Tw by G unambiguously. (b) Pp determines its pumping sub-tree unambiguously. (c) xu1mu1vu2u2nz (P(G) xu1mvu2nz (xu1m; u1; A; v; u2)-pumping reduction any m; n > 0, and any z 2 S .

is by

an G for (d) xu1vu2z 2 L iff xu1mvu2mz 2 L for any m z 2 S . 0, and any (e) An (x; u1; A; v; u2)-pumping prefix by G, and the pumping pattern (u1; A; v; u2) determine unambiguously a single pumping reduction. (f) xun1vu2mz 2 L iff xun1+kvu2m+kz 2 L for any m; n; k

Assertion (c) is essential for our further considerations. It shows that, for a non-empty u1, the distance of the place of pumping from the left end is not limited, and that the position of pumping is determined both by the pumping pattern of a pumping reduction and the pumping prefix of the pumping reduction. We use this property in the following definition formalizing a non-regular type of pumping. Example 1. Consider the non-regular context-free language Lab = fanbn j n 2 Ng, that can be generated by the grammar G = (fS; S1; a; bg; fa; bg; R; S)), with the following set of rules R:

S S1 ! !

S1 aS1b j ab

The grammar is reduced and unambiguous. Consider the

sentence g = aaabbb.

• The handle of g (cf. Definition 1) is the pair (S1 ! ab; 4), as and the division of g into a; b ; w is unique:

S )R aaS1bb )R aaabbb g = aa ab bb |{az} |{z} |{wz} b • G is an LR(0)-grammar, moreover, linear as (a) S )R aAw )R ab w = hw obviously implies (A ! b ; jab j) = (A0 ! b 0; ja0b 0j)

The pumping notions can be illustrated in Fig. 2 with a

derivation tree for w = xaabby 2 L(G), where x 2 a ,y 2 b .

We can see, by taking x = a, that Pp = aaabb is an (a; a; S1; ab; b)-pumping prefix by G, and that T1 is a pumping tree of Pp. Pumping pattern of Pp by G is (a; S1; ab; b) and aaabb (P(G) aab is an (a; a; S1; ab; b)pumping reduction by G. Realize that a pumping reduction by G can be applied to any word akbm, where k; m > 1, including the cases when k 6= m.

Moreover, we can see that (l ; a; S1; ab; b; l ) is the single elementary pumping infix by G.

Definition 3. Let G = (V; S; R; S) be an LR(0)-grammar generating (analyzing) the language L = L(G). Let x a T2 xu1vu2 (P(G) xv be an (x; u1; A; v; u2)-pumping reduction by G.

We say that A is a distinguishing nonterminal for G, and xu1vu2 (P(G) xv is an (x; u1; A; v; u2)-distinguishing reduction by G if at least one of the following conditions is true: (I) for some y 2 S there is a p > 0 such that xvy 2 L iff xvu2p jy 2= L, for all j > 0; (II) for some y 2 S , there is a p > 0 such that xvy 2 L iff xu1p jvy 2= L, for all j > 0.

We say that the tuple (u1; A; v; u2) is a distinguishing pattern for G.

If G contains a distinguishing nonterminal (pattern)

then we call G a distinguishing LR(0)-grammar (denoted dist-LRG).

For the sake of accuracy, we also say that A is an

(x; u1; A; v; u2)-distinguishing nonterminal for G. Example 1 (continued). According to case (I) of Definition 3 the nonterminal S1 is a distinguishing nonterminal and aabb (P(G) ab is a (l ; a; S1; ab; b)-distinguishing reduction for y = l , and p = 1. Thus, (a; S1; ab; b) is a distinguishing pattern for G.

Let us recall that the class of LR(0)-languages is not closed under complement (see [ 2 ]), but DCFL is closed under complement.

Example 1 (continued). The language Labc = c+ Lab can be generated by LR(0) grammar Gabc = (fS; S1; S2; a; b; cg; fa; b; cg; R; S), with the set of rules R : S S2 S1 ! ! ! cS2 cS2 j S1 aS1b j ab Then, analogously to the situation in Lab, nonterminal S1 is distinguishing, and there is a (c; a; S1; ab; b)distinguishing reduction based on case (I) of Definition 3; here, we can take again y = l , and p = 1. On the other hand, (c; S2; ab; l ) is a pumping pattern by Gabc, which is not distinguishing for any (x; c; S2; ab; l )-reduction by Gabc; thus (c; S2; ab; l ) is a non-distinguishing pumping pattern by Gabc.

Lemma 2. Let G = (V; S; R; S) be a dist-LR(0)-grammar. Let (u1; A; v; u2) be a distinguishing pattern by G. Then u1; u2 2 S+.

Proof. Assume, for example, that condition (I) from Definition 3 holds. It is then clear that u2 6= l , otherwise xvu2p jy = xvy for all p; j > 0.

It is also easy to see that u1 6= l . If u1 = l , we get from Proposition 1 that xvu2pmy 2 L and we have xvu2pmu2py 2= L for all m 0 and some p > 0. The first fact yields xvu2py 2 L for m = 1, the second fact yields xvu2py 2= L for m = 0. This is a contradiction.

The existence of a distinguishing nonterminal serves as a sufficient condition for non-regularity.

Theorem 3. Let L = L(G) be a language accepted by a dist-LR(0)-grammar G = (V; S; R; S). Then L is a nonregular language.

Proof. Assume, for a contradiction, that G = (V; S; R; S) is a dist-LR(0)-grammar generating a regular language L = L(G) and B be an (x; u1; B; v; u2)-distinguishing nonterminal for G.

As L is regular, there exists a deterministic finite automaton A = (Q; S; d ; q0; F) with nA = jQj states accepting the language L = L(A). The proof then follows by the case analysis based on properties of the (x; u1; B; v; u2)distinguishing pattern. Since there is an analogy of the case analysis, we will prove the theorem only by the assumption that condition (I) from Definition 3 is fulfilled.

Suppose that for some y 2 S and p > 0 it holds xvy 2 L, and, for all j > 0, xvu2p jy 2= L.

Proposition 1 implies that xu1pmvu2pmy 2 L for all m 0 and xu1pmvu2p(m+ j)y 2= L for all m 0, j > 0, and Lemma 2 implies that u1; u2 2 S+. Let us inspect states reachable by the automaton A when reading words of the form xu1pmvu2pk, for m > nA, k 0. Since A has nA states, the pigeonhole principle yields that there are integers r; s, 1 r < s nA + 1 and a state q of A such that

q = d (q0; xu1pmvu2pr) = d (q0; xu1pmvu2ps): As xu1pmvu2pmy 2 L, both words

xu1pmvu2pmy = xu1pmvu2pru2p(m r)y and

xu1pmvu2psu2p(m r)y = xu1pmvu2pmu2p(s r)y are accepted by A. Then, for j = s r we have a contradiction with the fact that xu1pmvu2pmu2p jy 2= L, for all m 0, j > 0.

Next, suppose that for some y 2 S and p > 0 it holds xvy 2= L, and xvu2p jy 2 L, for all j > 0.

Using the same reasoning as above, we derive that A rejects xu1pmvu2pmy = xu1pmvu2pru2p(m r)y as well as xu1pmvu2psu2p(m r)y = xu1pmvu2pmu2p(s r)y, which is again a contradiction.

The previous proof yields the following corollary. Corollary 1. Let G = (V; S; R; S) be a dist-LR(0)grammar. Let (x; u1; A; v; u2) be a distinguishing prefix by G, and Pin be an (x; u1; A; v; u2; y)-infix by G. Then the language L(G) \ fxu1nvu2my j n 0; m 0g is not a regular language.

We will use the following definition to stress the ability of LR(0)-grammars to define DCFL.

Definition 4. Let G be an LRG(0; $) grammar which generates L(G) = Lf$g, where L S . We say that G is an

LRG(1)-grammar which defines L. We take

L (LRG(1)) L (dist-LRG(1)) = fL j Lf$g 2 LRG($)g; = fL j Lf$g 2 dist-LRG($)g: In what follows, the sign means a proper subset.

Theorem 4. It holds the following: (a) (b) (c) (d) (e)

L (dist-LRG) L (lin-dist-LRG) L (dist-LRG(1)) L (lin-dist-LRG(1)) L (LRG(1)) = DCFL:

L (LRG);

L (lin-LRG); L (LRG(1));

L (lin-LRG(1)); Proof. All inclusions (a)–(d) follow from the fact that, according to Theorem 3, all languages generated (accepted) by distinguishing LR(0)-grammars are non-regular, while both L (lin-LRG) and L (lin-LRG(1)) contain all regular languages. The equality (e) is just the fact that for any deterministic context-free language L S , where $ 62 S, the language Lf$g is accepted by an LR(0)-grammar [ 3 ]. 4

Some total separations inside of LR(0)-languages

While the size of distinguishing pumping patterns is unbounded, the size of elementary pumping patterns for a given grammar is limited. Therefore, below we introduce a condition that implies non-regularity of a language generated by an LR(0)-grammar based on elementary pumping patterns only.

Definition 5. Let G = (V; S; R; S) be an LR(0)-grammar and a = (x; u1; A; v; u2; y) be an elementary pumping infix by G.

We say that a is regular if the language L(G; a) = L(G) \ fxun1vu2my j n 0; m 0g is a regular language.

We say that a is non-regular if the language L(G; a) = L(G) \ fxu1nvu2my j n 0; m 0g is a non-regular language.

We say that G is an elementary-regular grammar if all elementary pumping infixes by G are regular. We denote the class of elementary-regular grammars by the prefix er-.

We say that G is an elementary-non-regular grammar if there exists an elementary non-regular pumping infix by G.

We denote the class of elementary non-regular grammars

by the prefix enr-.

We say that a language L is an elementary-non-regular language if an elementary-non-regular LR(0)-grammar G exists such that L(G) = L. The property being elementarynon-regular we mark by the prefix enr- and the class of elementary-non-regular grammars is denoted as enr-LRG.

Similarly, the prefix er- will denote the property being elementary-regular. We say that a language L 2 L (LR(0)) is an elementary

regular language if there is not any elementary-nonregular LR(0)-grammar G such that L(G) = L. We denote the class of elementary-regular languages as L (er-LGR).

The next corollary follows from the previous definition and Corollary 1.

Corollary 2. Let G = (V; S; R; S) be a dist-LR(0)grammar. Let (x; u1; A; v; u2) be a distinguishing elementary prefix by G, and a = (x; u1; A; v; u2; y) be a pumping infix by G. Then the language L(G) is an elementary-nonregular language.

Theorem 5. Let L be an elementary-non-regular language. Then L is not a regular language.

Proof. Let G be an elementary-non-regular LR(0)grammar such that L(G) = L. Let a = (x; u1; A; v; u2; y) be a non-regular pumping infix by G. The facts that L0 = fxu1nvu2my j n 0; m 0g is a regular language and L \ L0 is not a regular language imply that L is not regular, because regular languages are closed under the intersection operation.

For the opposite direction that each non-regular language L generated by an LR(0)-grammar is elementarynon-regular, we still do not have proof.

The next corollary presents our current results about total separations achieved by LR(0)-grammars. We consider these results important from the point of view of our motivations mentioned in the introduction. The corollary is mainly a consequence of Definition 5.

Corollary 3. The following equations hold: (a) L (er-LRG) [ L (enr-LRG) = L (LRG); (b) L (er-lin-LRG) [ L (enr-lin-LRG) =

L (lin-LRG); (c) L (er-LRG) \ L (enr-LRG) = 0/ ; (d) L (er-lin-LRG) \ L (enr-lin-LRG) = 0/ :

What remains is a further study of relations between the class of regular languages and the classes of languages generated by LR(0)-grammars that are not distinguishing or elementary-regular. We believe that the following conjectures hold.

Conjecture 1. It holds the following (a) L (er-LRG) = Reg: (b) L (enr-LRG) = L (LRG) Reg:

Above, in Corollary 2, we have seen that if an elementary pumping prefix is distinguishing for an LR(0)grammar G, we can obtain a non-regular pumping infix for G. We conjecture that also the opposite direction holds. Conjecture 2. Let G = (V; S; R; S) be an LR(0)-grammar and (x; u1; A; v; u2; y) be a pumping infix by G. If (x; u1; A; v; u2; y) is non-regular pumping infix by G then (u1; A; v; u2) is a distinguishing pattern for G. 5

Conclusion and Future Work

It remains yet some work to do in order to fulfill our plans mentioned in the introduction and show our conjectures. It also remains to show the relation between elementary nonregular pumping patterns and distinguishing pumping patterns, and between elementary regular pumping patterns and non-distinguishing pumping patterns. We believe that we will make that in the near future.

We also plan to show that the distinguishing pumping and the non-distinguishing pumping by LR(0)-grammars can be characterized by pumping patterns of limited size (e.g., elementary pumping patterns) depending only on corresponding LR(0)-grammars. Such patterns could help with proving the decidability of the problem whether an LR(0)-grammar is distinguishing or not. After that, we plan to present the mentioned transformation to restarting automata controlled by LR(0)-grammars. This step will open a broad field on studying the descriptional complexity of DCFL and its subclasses. One interesting type of descriptional complexity will be the degree of non-regularity of DCFL and some of its subclasses. Similarly, a degree of regularity of a deterministic context-free language can be measured.

Finally, let us note that total separations of the presented type have essential importance for computational and comparative linguistic, and for expressing of properties of programming languages. We can, in this way, formally express, e.g., the (non-)existence of obligatory adjuncts in a natural language (or in a sentence of it) or the use of different types of parenthesis in a programming language.

[1] Bejcˇek , E. , Panevová , J. , Popelka , J. , Stranˇák, P., Sevcˇíková, M., Šteˇpánek , J., Žabokrtský , Z. : Prague dependency treebank 2.5 - a revisited version of PDT 2.0 . In: Kay, M. , Boitet , C. (eds.) COLING 2012 , 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers , 8 - 15 December 2012 , Mumbai, India. pp. 231 - 246 . Indian Institute of Technology Bombay ( 2012 ), https://www.aclweb. org/anthology/C12-1015/

[2] Geller , M.M. , Harrison , M.A. : On LR(k) grammars and languages . Theor. Comput. Sci. 4 ( 3 ), 245 - 276 ( 1977 ). https://doi.org/10.1016/ 0304 - 3975 ( 77 ) 90013 - 5

[3] Harrison , M.A. : Introduction to formal language theory . Series in computer science, Addison-Wesley Longman Publishing Co., Inc . ( 1978 )

[4] Jancˇar , P. , Mráz , F. , Plátek , M. , Vogel , J.: Restarting automata . In: Reichel, H. (ed.) Fundamentals of Computation Theory, 10th International Symposium , FCT '95, Dresden , Germany, August 22-25 , 1995 , Proceedings. Lecture Notes in Computer Science , vol. 965 , pp. 283 - 292 . Springer ( 1995 ). https://doi.org/10.1007/3-540-60249-6_ 60

[5] Lopatková , M. , Plátek , M. , Sgall , P. : Towards a formal model for functional generative description: Analysis by reduction and restarting automata . Prague Bull. Math. Linguistics 87 , 7 - 26 ( 2007 ), http://ufal.mff.cuni.cz/ pbml/87/lopatkova-et-al.pdf

[6] Mráz , F. , Pardubská , D. , Plátek , M. , Šíma , J.: Pumping deterministic monotone restarting automata and DCFL . In: Holena, M. , Horváth , T. , Kelemenová , A. , Mráz , F. , Pardubská , D. , Plátek , M. , Sosík , P. (eds.) Proceedings of the 20th Conference Information Technologies - Applications and Theory (ITAT 2020 ), Hotel Tyrapol, Oravská Lesná, Slovakia, September 18-22 , 2020 . CEUR Workshop Proceedings , vol. 2718 , pp. 51 - 58 . CEUR-WS.org ( 2020 ), http://ceur-ws. org/ Vol- 2718 /paper13.pdf

[7] Sgall , P. , Goralcˇíková , A. , Nebeský , L. , Hajicˇová , E.: A functional approach to syntax in generative description of language . American Elsevier Publishing Company, Inc., New York ( 1969 )

[8] Šíma , J. , Plátek , M. : One analog neuron cannot recognize deterministic context-free languages . In: Gedeon, T. , Wong , K.W. , Lee , M. (eds.) Neural Information Processing - 26th International Conference, ICONIP 2019 , Sydney , NSW , Australia, December 12-15 , 2019 , Proceedings, Part III. Lecture Notes in Computer Science , vol. 11955 , pp. 77 - 89 . Springer ( 2019 ). https://doi.org/10.1007/978-3- 030 -36718- 3 _ 7

[9] Stearns , R.E. : A regularity test for pushdown machines . Inf. Control . 11 ( 3 ), 323 - 340 ( 1967 ). https://doi.org/10.1016/S0019- 9958 ( 67 ) 90591 - 8

[10] Valiant , L.G. : Regularity and related problems for deterministic pushdown automata . Journal of the ACM 22 , 1 - 10 ( 1975 )