=Paper= {{Paper |id=Vol-2962/paper05 |storemode=property |title=On Separations of LR(0)-Grammars by Two Types of Pumping Patterns |pdfUrl=https://ceur-ws.org/Vol-2962/paper05.pdf |volume=Vol-2962 |authors=Martin Plátek,František Mráz,Dana Pardubská,Daniel Průša ,Jiří Šíma |dblpUrl=https://dblp.org/rec/conf/itat/PlatekMPPS21 }} ==On Separations of LR(0)-Grammars by Two Types of Pumping Patterns == https://ceur-ws.org/Vol-2962/paper05.pdf
      On Separations of LR(0)-Grammars by Two Types of Pumping Patterns

                      Martin Plátek1, *, František Mráz1 , Dana Pardubská2,†, Daniel Průša3,‡, and Jiří Šíma4, *
                                            1 Charles University, Department of Computer Science

                                           Malostranské nám. 25, 118 00 PRAHA 1, Czech Republic
                                       martin.platek@mff.cuni.cz, frantisek.mraz@mff.cuni.cz
                                   2    Comenius University in Bratislava, Department of Computer Science
                                                 Mlynská Dolina, 84248 Bratislava, Slovakia
                                                     pardubska@dcs.fmph.uniba.sk
                                             3   Czech Technical University, Department of Cybernetics
                                                  Karlovo nám. 13, 121 35 Prague 2, Czech Republic
                                                               prusa@fel.cvut.cz
                                       4   Institute of Computer Science of the Czech Academy of Sciences,
                                                     P. O. Box 5, 18207 Prague 8, Czech Republic
                                                                   sima@cs.cas.cz

Abstract: We present two types of pumping patterns                           by simulating any given LR(0)-analyzer by a determin-
that allow a total separation inside the class of LR(0)-                     istic monotone restarting automaton. Hence, the com-
grammars. Using the same type of pumping patterns, we                        putations of such automata were controlled by LR(0)-
obtain a total separation inside of linear LR(0)-grammars.                   grammars. Therefore, in what follows, we will use con-
   This type of study has a long-term motivation from                        structions based on properties of phrase structures gener-
computational linguistics and the area of syntactic error lo-                ated by LR(0)-grammars and we will study pumping pat-
calization. A recent motivation also comes from the field                    terns based on LR(0)-grammars.
of formal models of neural networks.                                            This paper should serve as a step to refine the concepts
                                                                             from [5], where it was shown that restarting automata can
1    Introduction                                                            serve as a formal model for functional generative descrip-
                                                                             tion (FGD) of a natural language. Functional generative
This paper follows the paper [6], where we have stud-                        description is a dependency-based descriptive system that
ied restarting automata recognizing deterministic context-                   has been developed since the 1960’s, see esp. [7]. FGD
free languages (DCFL) with a particular type of pumping                      was originally implemented as a generative procedure, but
patterns. This type of pumping patterns ensures the non-                     later its authors have been interested in a more declarative
regularity of the recognized languages.                                      representation. The subject of the paper [5] concerns the
   In this paper, we are looking for two types of pumping                    foundations of a reduction system using a complex restart-
patterns. The first type of pumping patterns should ensure                   ing automaton. The reduction system is more complex
the non-regularity of the languages recognized by a cer-                     than a reduction system for a (shallow) syntactic analyzer
tain type of restarting automata. In contrast, the second                    since it provides not only the possibility of checking the
type of pumping patterns should ensure the regularity of                     well-formedness of the (surface) analysis of a sentence,
the languages recognized by this type of restarting auto-                    but also its meaning – tectogrammatical representation in
mata. The union of both of these types of pumping pat-                       terms of FGD. Such a reduction system makes it possible
terns should cover all possible pumping patterns defined                     to define the analysis as well as the synthesis of a sentence
by the mentioned type of restarting automata.                                formally.
   Any deterministic context-free language L can be ac-                         A descriptive system [5] DS of a natural language L
cepted by a LR(1)-analyzer (using lookahead of size 1),                      should determine:
but the language L · {$}, where $ is a special end-marker
not in the alphabet of L) can be accepted by a LR(0)-                         (a) The set LC of all correct sentences of the language L.
analyzer [3]. In [4], it was shown that any determinis-                      (b) The set LM of all correct meaning descriptions of the
tic context-free language can be accepted by a determin-                         language L. LM represents all meanings of all sen-
istic monotone restarting automaton. This was proved                             tences in LC.
    * Supported partially by the grant GA19-05704S of the Czech Science
Foundation and by the institutional support RVO: 67985807.                    (c) A relation SH ⊆ LC × LM between LC and LM. The
    † Supported by the grant 1/0601/20 of the Slovak Scientific Grant
                                                                                  relation describes the ambiguity and the synonymy of
Agency VEGA.
    ‡ Supported by the grant GA19-21198S of the Czech Science Founda-             L.
tion.
       Copyright ©2021 for this paper by its authors. Use permitted under      Let us note that LM is an artificial (formal) unambigu-
Creative Commons License Attribution 4.0 International (CC BY 4.0).          ous language, which can be described by a restarting auto-
maton controlled by an LR(0)-grammar. An important fea-         every A ∈ N, there exist α, β ∈ V ∗ such that S ⇒∗ αAβ ,
ture modeled by LM is valency. More about valency can           and for every A ∈ N there is w ∈ Σ∗ such that A ⇒∗ w,
be found in [1]. There are types of valency features that       respectively. In the following, we work with reduced
can be modeled as pumping patterns of non-regular type          context-free grammars only.
(e.g., obligatory adjuncts), and there are other valency fea-      We say that a context-free grammar G = (V, Σ, R, S) is
tures that can be modeled as pumping patterns of regular        linear if the right-hand side of any rule from R contains at
type (e.g., optional adjuncts).                                 most one variable (nonterminal).
   Another recent motivation for this paper comes from             For any non-empty word a1 · · · an , with a1 , . . . , an ∈ Σ,
a completely different area. The last results of this pa-       the derivation S ⇒∗ a1 · · · an can be described by a deriva-
per should support the analysis of analog neuron hierarchy      tion tree which is an ordered tree satisfying:
of binary-state neural networks (NNs) with an increasing
number of extra analog-state neurons, which has been in-          1. Inner vertices are labeled with nonterminals.
troduced in [8] for studying the power of NNs with real-          2. The root is labeled with the start symbol S.
istic weights between integers and rational numbers with
respect to the Chomsky hierarchy.                                 3. If inner vertex labeled with nonterminal A ∈ N
   Note that the problem of the regularity of deterministic          has children labeled (from left to right) with
context-free languages gained attention already in the 60s.          α1 , α2 , . . . , αk ∈ V then A → α1 α2 · · · αk ∈ R; A to-
Algorithms deciding whether a given deterministic push-              gether with the subtrees rooted in its children form a
down automaton accepts a regular language were proposed              complete branch of the tree.
by Stearns [9] and Valiant [10].
   This paper is structured as follows. The next sec-             4. The leaves in the tree are labeled (from left to right)
tion contains basic notions and basic properties of LR(0)-           with the terminal symbols a1 , . . . , an ∈ Σ.
grammars. Section 3 introduces the concepts of our inter-         Removing condition 2 we get the definition of a (com-
est and presents the first main result. Section 4 presents      putation) derivation sub-tree.
the results about total separations. The paper concludes
with a summary of future work to be done to fulfill all our       A grammar G is unambiguous if every non-empty string
plans.                                                          w ∈ L(G) has a unique derivation tree, or equivalently a
                                                                unique rightmost derivation.
2   Basic Notions                                               The results of our paper heavily depend on the characteri-
                                                                zation of the class of deterministic context-free languages
We suppose that the reader is familiar with the basics of
                                                                by LR(0)-grammars and corresponding parsers. Follow-
formal language and automata theory presented, e.g., in
                                                                ing [3], we, therefore, give the necessary definitions and
[3]. For technical reasons, we give a list of some notions
                                                                the most relevant results.
and definitions related to formal grammars.
Context-free grammars. A context-free grammar G is              Definition 1. Let G = (V, Σ, R, S) be a context-free gram-
defined by a 4-tuple G = (V, Σ, R, S) where Σ ⊂ V , N =         mar, N = V \ Σ, γ ∈ V ∗ . A handle of γ is an ordered pair
V \ Σ, and N, Σ are finite sets of nonterminal and terminal     (r, i), r ∈ R, i ≥ 0 such that there exists A ∈ N, α, β ∈ V ∗
symbols, respectively, R ⊂ N × V ∗ is a finite relation of      and w ∈ Σ∗ such that
rewrite (production) rules, and S ∈ N is the start symbol.
We use the usual notation A → β ∈ R for a production            (a) S ⇒∗R αAw ⇒R αβ w = γ,
rule (A, β ) ∈ R. This rule can be applied to u = u1 Au2
where u1 , u2 ∈ V ∗ , yielding v = u1 β u2 , which is denoted   (b) r = A → β , and
as u ⇒ v, or u ⇒G v. We write u ⇒R v if the nonterminal          (c) i = |αβ |.
A substituted by u ⇒ v is the rightmost nonterminal in u.
   The grammar G generates the context-free language            Lemma 1 ([2]). Let G = (V, Σ, R, S) be a reduced context-
L(G) = {w ∈ Σ∗ | S ⇒∗ w} where ⇒∗ (⇒∗R ) is the re-             free grammar. Then G is unambiguous if and only if every
flexive transitive closure of the binary relation ⇒ (⇒R ) on    α ∈ V ∗ , such that S ⇒∗R α, has exactly one handle except
V ∗ . Note that L(G) can also be equivalently defined as        S, which has none.
L(G) = {w ∈ Σ∗ | S ⇒∗R w}.
   We write u ⇒≤p v for some non-negative integer p if            In general, the identification of a handle in a string is not
u ⇒∗ v and v is derived from u by at most p derivation          uniquely defined, which is not true for LR(0) grammars.
steps (⇒). A similar meaning has the denotation u ⇒≤p    R v.   Definition 2. Let G = (V, Σ, R, S) be a reduced context-
In what follows, we will primarily work with the rightmost      free grammar such that S ⇒+         R S is not possible in G. We
derivations and write ⇒ instead of ⇒R .                         say G is an LR(0)-grammar if, for each w, w0 , x ∈ Σ∗ ,
   Any non-empty context-free language L = L(G) 6= 0/ can       η, α, α 0 , β , β 0 ∈ V ∗ , and A, A0 ∈ V \ Σ, if
be generated by a reduced context-free grammar G that ex-
cludes unreachable and unproductive symbols, that is, for       (a) S ⇒∗R αAw ⇒R αβ w = ηw, and
 (b) S ⇒∗R α 0 A0 x ⇒R α 0 β 0 x = ηw0                                        (b) Let u be a prefix of the content of the input tape w =
                                                                                  uv, which has already been read by a computation of
imply (A → β , |αβ |) = (A0 → β 0 , |α 0 β 0 |)                                   P(G). Then there exists a suffix v0 such that uv0 ∈ L
   Note that as a consequence of the above definition we                          iff P(G) did not halt and not reject while reading u.
have that A = A0 , β = β 0 , α = α 0 , γ = αβ = α 0 β 0 and                      Let L ⊆ Σ∗ be a deterministic context-free language
x = w0 . Thus, if G is an LR(0)-grammar, then the rightmost                   (DCFL), and $ ∈    / Σ. Then, according to [3], there are
derivation of the word w by G and the left-right analysis                     LR(0)-grammar G(0, $) generating the language L · {$},
is unique (deterministic). In this paper, we consider LR(0)                   and a corresponding LR(0)-analyzer P(G(0, $)) of L · {$};
grammars rather as analytical grammars. A language gen-                       P(G(0, $)) is a deterministic push-down automaton with $
erated by an LR(0)-grammar is called LR(0)-language.                          as the rightmost symbol on its input-tape.
Theorem 1 (LR(0)-language characterization theorem                               To stress its property, we will sometimes write LR(0,$)-
from [3]). Let L ⊆ Σ∗ . The following four statements are                     grammar instead of grammar G(0, $). We denote the set of
equivalent.                                                                   LR(0)-grammars by LRG and the set of LR(0,$)-grammars
                                                                              by LRG($).
 (a) L is an LR(0) language.                                                  Linear LRG. We say that a context-free grammar G is
                                                                              linear if the right-hand side of any rule from G contains at
 (b) L is a deterministic context-free language and for all
                                                                              most one variable (nonterminal). We also consider linear
     x ∈ Σ+ , w, y ∈ Σ∗ , if w ∈ L, wx ∈ L, and y ∈ L, then
                                                                              LR(0)-grammars. We denote the linearity by the prefix
     yx ∈ L.
                                                                              lin-, and the class of linear LR(0)-grammars as lin-LRG.
 (c) There exists a deterministic pushdown automaton A                        Classes of languages. In what follows, L (A), where A is
     with a single final state q f and a pushdown symbol                      some class of grammars, denotes the class of languages
     Z f such that each word w ∈ L is accepted by A by                        generated by grammars from A. E.g., the class of lan-
     entering the state q f with Z f as the only symbol in its                guages generated by linear LR(0)-grammars is denoted by
     pushdown.                                                                L (lin-LRG).

 (d) There exist strict deterministic languages L0 and L1
     such that L = L0 L1∗ .                                                   3    Pumping by LR(0)-grammars
   Note that strict deterministic languages are determinis-                   The first goal of our paper is to specify properties of de-
tic context-free languages recognizable by empty push-                        terministic (linear) context-free grammars that assure that
down, or equivalently prefix-free deterministic context-                      the corresponding (linear) language is non-regular. In this
free languages.                                                               section, we show such properties. We start with several
   Let us recall semi-Dyck languages. Let r ≥ 1, Σr =                         definitions and notations.
{a1 , . . . , ar }, Σ̄r = {ā1 , . . . , ār } and Σ = Σr ∪ Σ̄r . The semi-   Pumping notions. Let G = (V, Σ, R, S) be an LR(0)-
Dyck language Dr is the language generated by the gram-                       grammar generating (analyzing) the language L = L(G),
mar Gr = (Vr , Σ, P, S), where Vr = Σ ∪ {S}, and R contains                   and P(G) be the corresponding LR(0)-analyzer for G. Let
the following production rules:                                               w ∈ L(G), w = xu1 vu2 y, where x, u1 , v, u2 , y ∈ Σ∗ , u1 u2 ∈
                                                                              Σ+ are given by the derivation tree Tw from Fig. 1. The
             S → Sai Sāi S | λ , for each i, 1 ≤ i ≤ r.                      proper sub-trees T1 and T2 of Tw are (computation) sub-
                                                                              trees whose roots are labeled with the same nonterminal
   Informally, Dr is the set of all well-balanced parentheses
                                                                              A, thus by replacing T1 with T2 properly inside of Tw , we
containing r different pairs of brackets [3]. The semi-Dyck
                                                                              again get a derivation tree, namely the derivation tree Tw(0)
languages are examples of context-free languages that are
                                                                              for the word w(0) = xvy ∈ L(G).
not linear context-free languages. Additionally, they can
                                                                                 Analogously, replacing T2 with a copy of T1 , we get the
be used to separate LR(0)-languages from the strict deter-
                                                                              derivation tree Tw(2) for a longer word w(2) = xu21 vu22 y. If
ministic languages.
                                                                              we repeat i times such replacing of T2 with T1 we obtain the
Theorem 2 ([3]). Let Dr ⊂ Σ∗ be the semi-Dyck language                        derivation tree Tw(i+1) for the word w(i+1) = xui+1         i+1
                                                                                                                                    1 vu2 y.
for some r ≥ 1. Then Dr is an LR(0)-language, but not a                       Pumping tree, prefix, infix, pattern and reduction. Let
strict deterministic language.                                                x, u1 , v, u2 , A, y, T1 be as on Fig. 1. Then we say that
                                                                              Pp = xu1 vu2 is an (x, u1 , A, v, u2 )-pumping prefix by G,
  LR(0)-analyzer. If L is generated by an LR(0)-                              Pin = xu1 vu2 y is an (x, u1 , A, v, u2 , y)-pumping infix by G,
grammar G, then there exists an LR(0)-analyzer P(G) for                       and that T1 is a pumping tree of Pp . Recall that the LR(0)-
L with the following important properties (see [3]):                          analysis by P(G) of the prefix Pp in any word of the form
                                                                              Pp γ is the same, for any γ ∈ Σ∗ . We say that (u1 , A, v, u2 ) is
 (a) For each word w ∈ L, there is exactly one rightmost                      a pumping pattern by G (of Pp ). Let us recall that for any
     derivation of w in G, which corresponds to the LR(0)-                    z ∈ Σ∗ , the LR(0)-analyzer P(G)) reduces xu1 vu2 z into xvz.
     analysis of the word w by P(G).                                          We write xu1 vu2 z ⇐P(G) xvz, and say that xu1 vu2 z ⇐P(G)
                    T1          @ Tw                               (d) xu1 vu2 z ∈ L iff xum   m
                                                                                           1 vu2 z ∈ L for any m ≥ 0, and any
                                 @                                     z∈Σ . ∗
                   T2              @
                                     @                              (e) An (x, u1 , A, v, u2 )-pumping prefix by G, and the
                         A
                                      @                                 pumping pattern (u1 , A, v, u2 ) determine unambigu-
                                              @
                          @                        @                    ously a single pumping reduction.
                        A @                                                                  n+k m+k
                                                                    (f) xun1 vum
                                                    @
                         @ @                            @                      2 z ∈ L iff xu1 vu2   z ∈ L for any m, n, k ≥ 0.
                           @ @                           @
         x    u1        v    u2                y                      Assertion (c) is essential for our further considerations.
                                                                   It shows that, for a non-empty u1 , the distance of the place
                                                                   of pumping from the left end is not limited, and that the
        Figure 1: The structure of a derivation tree.              position of pumping is determined both by the pumping
                                                                   pattern of a pumping reduction and the pumping prefix of
                                                                   the pumping reduction. We use this property in the follow-
                                                                   ing definition formalizing a non-regular type of pumping.
xvz is an (x, u1 , A, v, u2 )-pumping reduction by G. Note
that the word xu1 vu2 z needs not be a word from L(G).             Example 1. Consider the non-regular context-free lan-
We will often say in the following that (x, u1 , A, v, u2 ) is     guage Lab = {an bn | n ∈ N}, that can be generated by the
a pumping prefix and that (x, u1 , A, v, u2 , y) is a pumping      grammar G = ({S, S1 , a, b}, {a, b}, R, S)), with the follow-
infix.                                                             ing set of rules R:
Elementary infix etc. We say that an (x, u1 , A, v, u2 , y)-
pumping infix by G is elementary if the (x, u1 , A, v, u2 )-                            S     →    S1
pumping reduction is the only and, at the same time, the                                S1    →    aS1 b | ab
last pumping reduction by G that can be performed inside
                                                                   The grammar is reduced and unambiguous. Consider the
of the word xu1 vu2 y, i.e. xvy cannot be reduced by any
                                                                   sentence γ = aaabbb.
pumping reduction by G at all. In this case, we say that
(x, u1 , A, v, u2 ) is an elementary pumping prefix, and that         • The handle of γ (cf. Definition 1) is the pair (S1 →
(u1 , A, v, u2 ) is an elementary pumping pattern by G.                 ab, 4), as
   Let us note that the corresponding elementally pumping
tree T1 does contain an internal node with the same non-                               S ⇒∗R aaS1 bb ⇒R aaabbb
terminal as the root of T1 , and does not contain any other
repetition of a nonterminal on any path from its root to a              and the division of γ into α, β , w is unique:
leaf. While the size (the number of terminal symbols) of
a pumping infix by G is unbounded, there exists a con-                                       γ = |{z}
                                                                                                  aa |{z}
                                                                                                      ab |{z}
                                                                                                          bb
stant c > 0 that depends on the number of terminals and                                           α    β    w
nonterminals, and the length of rules of G such that any
elementary pumping infix by G is of size at most c.                   • G is an LR(0)-grammar, moreover, linear as
   We can see the following obvious proposition that sum-
marizes the (pumping) properties of LR(0)-grammars,                      (a) S ⇒∗R αAw ⇒R αβ w = ηw
which we will use in the following text. It is a direct con-             (b) S ⇒∗R α 0 A0 x ⇒R α 0 β 0 x = ηw0
sequence of the previous definitions and the properties of
LR(0)-grammars and their analyzers summarized in [3],                   obviously implies (A → β , |αβ |) = (A0 → β 0 , |α 0 β 0 |)
and inspired by [9].
                                                                   The pumping notions can be illustrated in Fig. 2 with a
Proposition 1. Let G = (V, Σ, R, S) be an LR(0)-grammar            derivation tree for w = xaabby ∈ L(G), where x ∈ a∗ ,y ∈
generating (analyzing) the language L = L(G). Let a word           b∗ .
Pp = xu1 vu2 be an (x, u1 , A, v, u2 )-pumping prefix by G, and       We can see, by taking x = a, that Pp = aaabb is an
xu1 u1 vu2 u2 ⇐P(G) xu1 vu2 be an (xu1 , u1 , A, v, u2 )-pumping   (a, a, S1 , ab, b)-pumping prefix by G, and that T1 is a
reduction by G. Then                                               pumping tree of Pp . Pumping pattern of Pp by G is
                                                                   (a, S1 , ab, b) and aaabb ⇐P(G) aab is an (a, a, S1 , ab, b)-
(a) Any w ∈ L determines its derivation tree Tw by G un-           pumping reduction by G. Realize that a pumping reduction
    ambiguously.                                                   by G can be applied to any word ak bm , where k, m > 1, in-
                                                                   cluding the cases when k 6= m.
(b) Pp determines its pumping sub-tree unambiguously.                 Moreover, we can see that (λ , a, S1 , ab, b, λ ) is the sin-
                                                                   gle elementary pumping infix by G.
 (c) xum
       1 u1 vu2 u2 z
                    n       ⇐P(G) xum     n
                                     1 vu2 z         is       an
        m
     (xu1 , u1 , A, v, u2 )-pumping reduction      by G      for   Definition 3. Let G = (V, Σ, R, S) be an LR(0)-grammar
     any m, n > 0, and any z ∈ Σ∗ .                                generating (analyzing) the language L = L(G). Let
                      T1           @ Tw                                Lemma 2. Let G = (V, Σ, R, S) be a dist-LR(0)-grammar.
                                    @                                  Let (u1 , A, v, u2 ) be a distinguishing pattern by G. Then
                    T2                @                                u1 , u2 ∈ Σ+ .
                                        @
                          S1
                                         @                             Proof. Assume, for example, that condition (I) from Def-
                                                 @                     inition 3 holds. It is then clear that u2 6= λ , otherwise
                                                                       xvu2p j y = xvy for all p, j > 0.
                           @                          @
                         S1 @                          @
                          @ @                              @              It is also easy to see that u1 6= λ . If u1 = λ , we get from
                             @ @
                                                  y
                                                            @          Proposition 1 that xvu2pm y ∈ L and we have xvu2pm u2p y ∈    /L
         x      a        ab    b
                                                                       for all m ≥ 0 and some p > 0. The first fact yields xvu2p y ∈
                                                                       L for m = 1, the second fact yields xvu2p y ∈    / L for m = 0.
                                                                       This is a contradiction.
        Figure 2: The structure of a derivation tree.
                                                                          The existence of a distinguishing nonterminal serves as
                                                                       a sufficient condition for non-regularity.
xu1 vu2 ⇐P(G) xv be an (x, u1 , A, v, u2 )-pumping reduction           Theorem 3. Let L = L(G) be a language accepted by a
by G.                                                                  dist-LR(0)-grammar G = (V, Σ, R, S). Then L is a non-
   We say that A is a distinguishing nonterminal for G, and            regular language.
xu1 vu2 ⇐P(G) xv is an (x, u1 , A, v, u2 )-distinguishing reduc-
tion by G if at least one of the following conditions is true:         Proof. Assume, for a contradiction, that G = (V, Σ, R, S)
                                                                       is a dist-LR(0)-grammar generating a regular language
 (I) for some y ∈ Σ∗ there is a p > 0 such that xvy ∈ L iff            L = L(G) and B be an (x, u1 , B, v, u2 )-distinguishing non-
     xvu2p j y ∈
               / L, for all j > 0;                                     terminal for G.
(II) for some y ∈ Σ∗ , there is a p > 0 such that xvy ∈ L iff             As L is regular, there exists a deterministic finite au-
     xu1p j vy ∈
               / L, for all j > 0.                                     tomaton A = (Q, Σ, δ , q0 , F) with nA = |Q| states accept-
                                                                       ing the language L = L(A). The proof then follows by
   We say that the tuple (u1 , A, v, u2 ) is a distinguishing pat-     the case analysis based on properties of the (x, u1 , B, v, u2 )-
tern for G.                                                            distinguishing pattern. Since there is an analogy of the
   If G contains a distinguishing nonterminal (pattern)                case analysis, we will prove the theorem only by the as-
then we call G a distinguishing LR(0)-grammar (denoted                 sumption that condition (I) from Definition 3 is fulfilled.
dist-LRG).                                                                Suppose that for some y ∈ Σ∗ and p > 0 it holds xvy ∈ L,
   For the sake of accuracy, we also say that A is an                  and, for all j > 0, xvu2p· j y ∈
                                                                                                      / L.
(x, u1 , A, v, u2 )-distinguishing nonterminal for G.                     Proposition 1 implies that xu1pm vu2pm y ∈ L for all m ≥ 0
                                                                                      p(m+ j)
Example 1 (continued). According to case (I) of Defini-                and xu1pm vu2          y∈
                                                                                               / L for all m ≥ 0, j > 0, and Lemma
tion 3 the nonterminal S1 is a distinguishing nonterminal              2 implies that u1 , u2 ∈ Σ+ . Let us inspect states reach-
and aabb ⇐P(G) ab is a (λ , a, S1 , ab, b)-distinguishing re-          able by the automaton A when reading words of the form
duction for y = λ , and p = 1. Thus, (a, S1 , ab, b) is a dis-         xu1pm vu2pk , for m > nA , k ≥ 0. Since A has nA states,
tinguishing pattern for G.                                             the pigeonhole principle yields that there are integers r, s,
                                                                       1 ≤ r < s ≤ nA + 1 and a state q of A such that
   Let us recall that the class of LR(0)-languages is not
closed under complement (see [2]), but DCFL is closed                            q = δ (q0 , xu1pm vu2pr ) = δ (q0 , xu1pm vu2ps ).
under complement.
                                                                       As xu1pm vu2pm y ∈ L, both words
Example 1 (continued). The language Labc = c+ · Lab
can be generated by LR(0) grammar Gabc =                                                                            p(m−r)
                                                                                      xu1pm vu2pm y = xu1pm vu2pr u2         y
({S, S1 , S2 , a, b, c}, {a, b, c}, R, S), with the set of rules R :
                         S    →   cS2                                  and
                                                                                               p(m−r)                    p(s−r)
                         S2   →   cS2 | S1                                        xu1pm vu2ps u2        y = xu1pm vu2pm u2        y
                         S1   →   aS1 b | ab                           are accepted by A. Then, for j = s − r we have a contra-
Then, analogously to the situation in Lab , nontermi-                  diction with the fact that xu1pm vu2pm u2p· j y ∈
                                                                                                                       / L, for all m ≥ 0,
nal S1 is distinguishing, and there is a (c, a, S1 , ab, b)-           j > 0.
distinguishing reduction based on case (I) of Definition 3;               Next, suppose that for some y ∈ Σ∗ and p > 0 it holds
here, we can take again y = λ , and p = 1. On the other                xvy ∈/ L, and xvu2p· j y ∈ L, for all j > 0.
hand, (c, S2 , ab, λ ) is a pumping pattern by Gabc , which               Using the same reasoning as above, we derive that
                                                                                                                  p(m−r)
is not distinguishing for any (x, c, S2 , ab, λ )-reduction by         A rejects xu1pm vu2pm y = xu1pm vu2pr u2           y as well as
                                                                          pm ps p(m−r)             pm pm p(s−r)
Gabc ; thus (c, S2 , ab, λ ) is a non-distinguishing pumping           xu1 vu2 u2        y = xu1 vu2 u2            y, which is again a
pattern by Gabc .                                                      contradiction.
    The previous proof yields the following corollary.              We denote the class of elementary non-regular grammars
                                                                    by the prefix enr-.
Corollary 1. Let G = (V, Σ, R, S) be a dist-LR(0)-
                                                                       We say that a language L is an elementary-non-regular
grammar. Let (x, u1 , A, v, u2 ) be a distinguishing prefix by
                                                                    language if an elementary-non-regular LR(0)-grammar G
G, and Pin be an (x, u1 , A, v, u2 , y)-infix by G. Then the lan-
                                                                    exists such that L(G) = L. The property being elementary-
guage L(G) ∩ {xun1 vum 2 y | n ≥ 0, m ≥ 0} is not a regular         non-regular we mark by the prefix enr- and the class of
language.
                                                                    elementary-non-regular grammars is denoted as enr-LRG.
   We will use the following definition to stress the ability       Similarly, the prefix er- will denote the property being
of LR(0)-grammars to define DCFL.                                   elementary-regular.
                                                                       We say that a language L ∈ L (LR(0)) is an elementary-
Definition 4. Let G be an LRG(0, $) grammar which gen-              regular language if there is not any elementary-non-
erates L(G) = L{$}, where L ⊆ Σ∗ . We say that G is an              regular LR(0)-grammar G such that L(G) = L. We denote
LRG(1)-grammar which defines L.                                     the class of elementary-regular languages as L (er-LGR).
  We take
                                                                      The next corollary follows from the previous definition
          L (LRG(1)) = {L | L{$} ∈ LRG($)},
                                                                    and Corollary 1.
      L (dist-LRG(1)) = {L | L{$} ∈ dist-LRG($)}.
                                                                    Corollary 2. Let G = (V, Σ, R, S) be a dist-LR(0)-
    In what follows, the sign ⊂ means a proper subset.
                                                                    grammar. Let (x, u1 , A, v, u2 ) be a distinguishing elemen-
Theorem 4. It holds the following:                                  tary prefix by G, and α = (x, u1 , A, v, u2 , y) be a pumping
                                                                    infix by G. Then the language L(G) is an elementary-non-
       (a)   L (dist-LRG) ⊂ L (LRG),                                regular language.
       (b)   L (lin-dist-LRG) ⊂ L (lin-LRG),
       (c)   L (dist-LRG(1)) ⊂ L (LRG(1)),                          Theorem 5. Let L be an elementary-non-regular lan-
       (d)   L (lin-dist-LRG(1)) ⊂ L (lin-LRG(1)),                  guage. Then L is not a regular language.
       (e)   L (LRG(1)) = DCFL.                                     Proof. Let G be an elementary-non-regular LR(0)-
Proof. All inclusions (a)–(d) follow from the fact that, ac-        grammar such that L(G) = L. Let α = (x, u1 , A, v, u2 , y)
cording to Theorem 3, all languages generated (accepted)            be a non-regular pumping infix by G. The facts that
by distinguishing LR(0)-grammars are non-regular, while                           L0 = {xun1 vum
                                                                                               2 y | n ≥ 0, m ≥ 0}
both L (lin-LRG) and L (lin-LRG(1)) contain all regular
languages. The equality (e) is just the fact that for any de-       is a regular language and L ∩ L0 is not a regular language
terministic context-free language L ⊆ Σ∗ , where $ 6∈ Σ, the        imply that L is not regular, because regular languages are
language L{$} is accepted by an LR(0)-grammar [3].                  closed under the intersection operation.
                                                                       For the opposite direction that each non-regular lan-
4     Some total separations inside of                              guage L generated by an LR(0)-grammar is elementary-
      LR(0)-languages                                               non-regular, we still do not have proof.
                                                                       The next corollary presents our current results about to-
While the size of distinguishing pumping patterns is un-            tal separations achieved by LR(0)-grammars. We consider
bounded, the size of elementary pumping patterns for a              these results important from the point of view of our mo-
given grammar is limited. Therefore, below we introduce             tivations mentioned in the introduction. The corollary is
a condition that implies non-regularity of a language gen-          mainly a consequence of Definition 5.
erated by an LR(0)-grammar based on elementary pump-
ing patterns only.                                                  Corollary 3. The following equations hold:
                                                                       (a) L (er-LRG) ∪ L (enr-LRG) = L (LRG),
Definition 5. Let G = (V, Σ, R, S) be an LR(0)-grammar                 (b) L (er-lin-LRG) ∪ L (enr-lin-LRG) =
and α = (x, u1 , A, v, u2 , y) be an elementary pumping infix               L (lin-LRG),
by G.                                                                  (c) L (er-LRG) ∩ L (enr-LRG) = 0,   /
   We say that α is regular if the language L(G, α) =                  (d) L (er-lin-LRG) ∩ L (enr-lin-LRG) = 0.
                                                                                                               /
L(G) ∩ {xun1 vum2 y | n ≥ 0, m ≥ 0} is a regular language.
   We say that α is non-regular if the language L(G, α) =              What remains is a further study of relations between
L(G) ∩ {xun1 vum                                                    the class of regular languages and the classes of languages
                2 y | n ≥ 0, m ≥ 0} is a non-regular lan-
guage.                                                              generated by LR(0)-grammars that are not distinguishing
   We say that G is an elementary-regular grammar if all            or elementary-regular. We believe that the following con-
elementary pumping infixes by G are regular. We denote              jectures hold.
the class of elementary-regular grammars by the prefix er-.         Conjecture 1. It holds the following
   We say that G is an elementary-non-regular grammar if             (a) L (er-LRG) = Reg.
there exists an elementary non-regular pumping infix by G.           (b) L (enr-LRG) = L (LRG) − Reg.
  Above, in Corollary 2, we have seen that if an ele-             [3] Harrison, M.A.: Introduction to formal language theory.
mentary pumping prefix is distinguishing for an LR(0)-                Series in computer science, Addison-Wesley Longman
grammar G, we can obtain a non-regular pumping infix for              Publishing Co., Inc. (1978)
G. We conjecture that also the opposite direction holds.          [4] Jančar, P., Mráz, F., Plátek, M., Vogel, J.: Restarting auto-
                                                                      mata. In: Reichel, H. (ed.) Fundamentals of Computation
Conjecture 2. Let G = (V, Σ, R, S) be an LR(0)-grammar                Theory, 10th International Symposium, FCT ’95, Dresden,
and (x, u1 , A, v, u2 , y) be a pumping infix by G.         If        Germany, August 22–25, 1995, Proceedings. Lecture Notes
(x, u1 , A, v, u2 , y) is non-regular pumping infix by G then         in Computer Science, vol. 965, pp. 283–292. Springer
(u1 , A, v, u2 ) is a distinguishing pattern for G.                   (1995). https://doi.org/10.1007/3-540-60249-6_60
                                                                  [5] Lopatková, M., Plátek, M., Sgall, P.: Towards a formal
                                                                      model for functional generative description: Analysis by
5   Conclusion and Future Work                                        reduction and restarting automata. Prague Bull. Math. Lin-
                                                                      guistics 87, 7–26 (2007), http://ufal.mff.cuni.cz/
It remains yet some work to do in order to fulfill our plans          pbml/87/lopatkova-et-al.pdf
mentioned in the introduction and show our conjectures. It        [6] Mráz, F., Pardubská, D., Plátek, M., Šíma, J.: Pumping
also remains to show the relation between elementary non-             deterministic monotone restarting automata and DCFL. In:
regular pumping patterns and distinguishing pumping pat-              Holena, M., Horváth, T., Kelemenová, A., Mráz, F., Par-
terns, and between elementary regular pumping patterns                dubská, D., Plátek, M., Sosík, P. (eds.) Proceedings of the
and non-distinguishing pumping patterns. We believe that              20th Conference Information Technologies - Applications
we will make that in the near future.                                 and Theory (ITAT 2020), Hotel Tyrapol, Oravská Lesná,
                                                                      Slovakia, September 18-22, 2020. CEUR Workshop Pro-
   We also plan to show that the distinguishing pumping
                                                                      ceedings, vol. 2718, pp. 51–58. CEUR-WS.org (2020),
and the non-distinguishing pumping by LR(0)-grammars                  http://ceur-ws.org/Vol-2718/paper13.pdf
can be characterized by pumping patterns of limited size
                                                                  [7] Sgall, P., Goralčíková, A., Nebeský, L., Hajičová, E.: A
(e.g., elementary pumping patterns) depending only on                 functional approach to syntax in generative description of
corresponding LR(0)-grammars. Such patterns could help                language. American Elsevier Publishing Company, Inc.,
with proving the decidability of the problem whether an               New York (1969)
LR(0)-grammar is distinguishing or not. After that, we            [8] Šíma, J., Plátek, M.: One analog neuron cannot recog-
plan to present the mentioned transformation to restarting            nize deterministic context-free languages. In: Gedeon, T.,
automata controlled by LR(0)-grammars. This step will                 Wong, K.W., Lee, M. (eds.) Neural Information Process-
open a broad field on studying the descriptional complex-             ing – 26th International Conference, ICONIP 2019, Syd-
ity of DCFL and its subclasses. One interesting type of de-           ney, NSW, Australia, December 12–15, 2019, Proceedings,
scriptional complexity will be the degree of non-regularity           Part III. Lecture Notes in Computer Science, vol. 11955,
of DCFL and some of its subclasses. Similarly, a degree               pp. 77–89. Springer (2019). https://doi.org/10.1007/978-3-
of regularity of a deterministic context-free language can            030-36718-3_7
be measured.                                                      [9] Stearns, R.E.:         A regularity test for pushdown
   Finally, let us note that total separations of the pre-            machines. Inf. Control. 11(3), 323–340 (1967).
sented type have essential importance for computational               https://doi.org/10.1016/S0019-9958(67)90591-8
and comparative linguistic, and for expressing of proper-        [10] Valiant, L.G.: Regularity and related problems for deter-
ties of programming languages. We can, in this way, for-              ministic pushdown automata. Journal of the ACM 22, 1–10
                                                                      (1975)
mally express, e.g., the (non-)existence of obligatory ad-
juncts in a natural language (or in a sentence of it) or the
use of different types of parenthesis in a programming lan-
guage.


References
 [1] Bejček, E., Panevová, J., Popelka, J., Straňák, P.,
     Sevčíková, M., Štěpánek, J., Žabokrtský, Z.: Prague de-
     pendency treebank 2.5 – a revisited version of PDT 2.0.
     In: Kay, M., Boitet, C. (eds.) COLING 2012, 24th In-
     ternational Conference on Computational Linguistics, Pro-
     ceedings of the Conference: Technical Papers, 8–15 De-
     cember 2012, Mumbai, India. pp. 231–246. Indian Institute
     of Technology Bombay (2012), https://www.aclweb.
     org/anthology/C12-1015/
 [2] Geller, M.M., Harrison, M.A.: On LR(k) grammars and
     languages. Theor. Comput. Sci. 4(3), 245–276 (1977).
     https://doi.org/10.1016/0304-3975(77)90013-5