<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On Separations of LR(0)-Grammars by Two Types of Pumping Patterns</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Plátek</string-name>
          <email>martin.platek@mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>František Mráz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dana Pardubská</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Pru˚ša</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jirˇí Šíma</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University, Department of Computer Science Malostranské nám.</institution>
          <addr-line>25, 118 00 PRAHA 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Comenius University in Bratislava, Department of Computer Science Mlynská Dolina</institution>
          ,
          <addr-line>84248 Bratislava</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Czech Technical University, Department of Cybernetics Karlovo nám.</institution>
          <addr-line>13, 121 35 Prague 2</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institute of Computer Science of the Czech Academy of Sciences</institution>
          ,
          <addr-line>P. O. Box 5, 18207 Prague 8</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present two types of pumping patterns that allow a total separation inside the class of LR(0)grammars. Using the same type of pumping patterns, we obtain a total separation inside of linear LR(0)-grammars. This type of study has a long-term motivation from computational linguistics and the area of syntactic error localization. A recent motivation also comes from the field of formal models of neural networks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        This paper follows the paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where we have
studied restarting automata recognizing deterministic
contextfree languages (DCFL) with a particular type of pumping
patterns. This type of pumping patterns ensures the
nonregularity of the recognized languages.
      </p>
      <p>In this paper, we are looking for two types of pumping
patterns. The first type of pumping patterns should ensure
the non-regularity of the languages recognized by a
certain type of restarting automata. In contrast, the second
type of pumping patterns should ensure the regularity of
the languages recognized by this type of restarting
automata. The union of both of these types of pumping
patterns should cover all possible pumping patterns defined
by the mentioned type of restarting automata.</p>
      <p>Any deterministic context-free language L can be
accepted by a LR(1)-analyzer (using lookahead of size 1),
but the language L f g</p>
      <p>
        $ , where $ is a special end-marker
not in the alphabet of L) can be accepted by a
LR(0)analyzer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], it was shown that any
deterministic context-free language can be accepted by a
deterministic monotone restarting automaton. This was proved
      </p>
      <p>Copyright ©2021 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
by simulating any given LR(0)-analyzer by a
deterministic monotone restarting automaton. Hence, the
computations of such automata were controlled by
LR(0)grammars. Therefore, in what follows, we will use
constructions based on properties of phrase structures
generated by LR(0)-grammars and we will study pumping
patterns based on LR(0)-grammars.</p>
      <p>
        This paper should serve as a step to refine the concepts
from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where it was shown that restarting automata can
serve as a formal model for functional generative
description (FGD) of a natural language. Functional generative
description is a dependency-based descriptive system that
has been developed since the 1960’s, see esp. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. FGD
was originally implemented as a generative procedure, but
later its authors have been interested in a more declarative
representation. The subject of the paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] concerns the
foundations of a reduction system using a complex
restarting automaton. The reduction system is more complex
than a reduction system for a (shallow) syntactic analyzer
since it provides not only the possibility of checking the
well-formedness of the (surface) analysis of a sentence,
but also its meaning – tectogrammatical representation in
terms of FGD. Such a reduction system makes it possible
to define the analysis as well as the synthesis of a sentence
formally.
      </p>
      <p>
        A descriptive system [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] DS of a natural language L
should determine:
(a) The set LC of all correct sentences of the language L.
(b) The set LM of all correct meaning descriptions of the
language L. LM represents all meanings of all
sentences in LC.
(c) A relation SH LC LM between LC and LM. The
relation describes the ambiguity and the synonymy of
L.
      </p>
      <p>
        Let us note that LM is an artificial (formal)
unambiguous language, which can be described by a restarting
automaton controlled by an LR(0)-grammar. An important
feature modeled by LM is valency. More about valency can
be found in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. There are types of valency features that
can be modeled as pumping patterns of non-regular type
(e.g., obligatory adjuncts), and there are other valency
features that can be modeled as pumping patterns of regular
type (e.g., optional adjuncts).
      </p>
      <p>
        Another recent motivation for this paper comes from
a completely different area. The last results of this
paper should support the analysis of analog neuron hierarchy
of binary-state neural networks (NNs) with an increasing
number of extra analog-state neurons, which has been
introduced in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for studying the power of NNs with
realistic weights between integers and rational numbers with
respect to the Chomsky hierarchy.
      </p>
      <p>
        Note that the problem of the regularity of deterministic
context-free languages gained attention already in the 60s.
Algorithms deciding whether a given deterministic
pushdown automaton accepts a regular language were proposed
by Stearns [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Valiant [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>This paper is structured as follows. The next
section contains basic notions and basic properties of
LR(0)grammars. Section 3 introduces the concepts of our
interest and presents the first main result. Section 4 presents
the results about total separations. The paper concludes
with a summary of future work to be done to fulfill all our
plans.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Basic Notions</title>
      <p>
        We suppose that the reader is familiar with the basics of
formal language and automata theory presented, e.g., in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. For technical reasons, we give a list of some notions
and definitions related to formal grammars.
      </p>
      <p>Context-free grammars. A context-free grammar G is
defined by a 4-tuple G = (V; S; R; S) where S V , N =
V n S, and N; S are finite sets of nonterminal and terminal
symbols, respectively, R N V is a finite relation of
rewrite (production) rules, and S 2 N is the start symbol.
We use the usual notation A ! b 2 R for a production
rule (A; b ) 2 R. This rule can be applied to u = u1Au2
where u1; u2 2 V , yielding v = u1b u2, which is denoted
as u ) v, or u )G v. We write u )R v if the nonterminal
A substituted by u ) v is the rightmost nonterminal in u.</p>
      <p>The grammar G generates the context-free language
L(G) = fw 2 S j S ) wg where ) ()R ) is the
reflexive transitive closure of the binary relation ) ()R) on
V . Note that L(G) can also be equivalently defined as
L(G) = fw 2 S j S )R wg.</p>
      <p>We write u ) p v for some non-negative integer p if
u ) v and v is derived from u by at most p derivation
steps ()). A similar meaning has the denotation u )R p v.
In what follows, we will primarily work with the rightmost
derivations and write ) instead of )R.</p>
      <p>Any non-empty context-free language L = L(G) 6= 0/ can
be generated by a reduced context-free grammar G that
excludes unreachable and unproductive symbols, that is, for
every A 2 N, there exist a; b 2 V such that S ) aAb ,
and for every A 2 N there is w 2 S such that A ) w,
respectively. In the following, we work with reduced
context-free grammars only.</p>
      <p>We say that a context-free grammar G = (V; S; R; S) is
linear if the right-hand side of any rule from R contains at
most one variable (nonterminal).</p>
      <p>For any non-empty word a1 an, with a1; : : : ; an 2 S,
the derivation S ) a1 an can be described by a
derivation tree which is an ordered tree satisfying:
1. Inner vertices are labeled with nonterminals.
2. The root is labeled with the start symbol S.
3. If inner vertex labeled with nonterminal A 2 N
has children labeled (from left to right) with
a1; a2; : : : ; ak 2 V then A ! a1a2 ak 2 R; A
together with the subtrees rooted in its children form a
complete branch of the tree.
4. The leaves in the tree are labeled (from left to right)
with the terminal symbols a1; : : : ; an 2 S:</p>
      <p>Removing condition 2 we get the definition of a
(computation) derivation sub-tree.</p>
      <p>A grammar G is unambiguous if every non-empty string
w 2 L(G) has a unique derivation tree, or equivalently a
unique rightmost derivation.</p>
      <p>
        The results of our paper heavily depend on the
characterization of the class of deterministic context-free languages
by LR(0)-grammars and corresponding parsers.
Following [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we, therefore, give the necessary definitions and
the most relevant results.
      </p>
      <p>Definition 1. Let G = (V; S; R; S) be a context-free
grammar, N = V n S, g 2 V : A handle of g is an ordered pair
(r; i), r 2 R; i 0 such that there exists A 2 N; a; b 2 V
and w 2 S such that
(a) S )R aAw )R ab w = g,
(b) r = A ! b , and
(c) i = jab j.</p>
      <p>
        Lemma 1 ([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). Let G = (V; S; R; S) be a reduced
contextfree grammar. Then G is unambiguous if and only if every
a 2 V , such that S )R a, has exactly one handle except
      </p>
      <sec id="sec-2-1">
        <title>S, which has none.</title>
        <p>In general, the identification of a handle in a string is not
uniquely defined, which is not true for LR(0) grammars.
Definition 2. Let G = (V; S; R; S) be a reduced
contextfree grammar such that S )R+ S is not possible in G. We
say G is an LR(0)-grammar if, for each w; w0; x 2 S ,
h; a; a0; b ; b 0 2 V ; and A; A0 2 V n S, if
(a) S )R aAw )R ab w = hw, and
imply (A ! b ; jab j) = (A0 ! b 0; ja0b 0j)</p>
        <p>
          Note that as a consequence of the above definition we
have that A = A0, b = b 0, a = a0, g = ab = a0b 0 and
x = w0. Thus, if G is an LR(0)-grammar, then the rightmost
derivation of the word w by G and the left-right analysis
is unique (deterministic). In this paper, we consider LR(0)
grammars rather as analytical grammars. A language
generated by an LR(0)-grammar is called LR(0)-language.
Theorem 1 (LR(0)-language characterization theorem
from [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]). Let L S . The following four statements are
equivalent.
(a) L is an LR(0) language.
(b) L is a deterministic context-free language and for all
x 2 S+, w; y 2 S , if w 2 L, wx 2 L, and y 2 L, then
yx 2 L.
(c) There exists a deterministic pushdown automaton A
with a single final state q f and a pushdown symbol
        </p>
        <sec id="sec-2-1-1">
          <title>Z f such that each word w 2 L is accepted by A by</title>
          <p>entering the state q f with Z f as the only symbol in its
pushdown.
(d) There exist strict deterministic languages L0 and L1
such that L = L0L1.</p>
          <p>Note that strict deterministic languages are
deterministic context-free languages recognizable by empty
pushdown, or equivalently prefix-free deterministic
contextfree languages.</p>
          <p>Let us recall semi-Dyck languages. Let r 1, Sr =
fa1; : : : ; arg, S¯r = fa¯1; : : : ; a¯rg and S = Sr [ S¯r. The
semiDyck language Dr is the language generated by the
grammar Gr = (Vr; S; P; S), where Vr = S [ fSg, and R contains
the following production rules:</p>
          <p>S ! SaiSa¯iS j l , for each i; 1
i
r:</p>
          <p>
            Informally, Dr is the set of all well-balanced parentheses
containing r different pairs of brackets [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. The semi-Dyck
languages are examples of context-free languages that are
not linear context-free languages. Additionally, they can
be used to separate LR(0)-languages from the strict
deterministic languages.
          </p>
          <p>
            Theorem 2 ([
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]). Let Dr S be the semi-Dyck language
for some r 1. Then Dr is an LR(0)-language, but not a
strict deterministic language.
          </p>
          <p>
            LR(0)-analyzer. If L is generated by an
LR(0)grammar G, then there exists an LR(0)-analyzer P(G) for
L with the following important properties (see [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]):
(a) For each word w 2 L, there is exactly one rightmost
derivation of w in G, which corresponds to the
LR(0)analysis of the word w by P(G).
(b) Let u be a prefix of the content of the input tape w =
uv, which has already been read by a computation of
P(G). Then there exists a suffix v0 such that uv0 2 L
iff P(G) did not halt and not reject while reading u.
          </p>
          <p>
            Let L S be a deterministic context-free language
(DCFL), and $ 2= S. Then, according to [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], there are
LR(0)-grammar G(0; $) generating the language L f g
$ ,
and a corresponding LR(0)-analyzer P(G(0; $)) of L f g
$ ;
P(G(0; $)) is a deterministic push-down automaton with $
as the rightmost symbol on its input-tape.
          </p>
          <p>To stress its property, we will sometimes write
LR(0,$)grammar instead of grammar G(0; $). We denote the set of
LR(0)-grammars by LRG and the set of LR(0,$)-grammars
by LRG($).</p>
          <p>Linear LRG. We say that a context-free grammar G is
linear if the right-hand side of any rule from G contains at
most one variable (nonterminal). We also consider linear
LR(0)-grammars. We denote the linearity by the prefix
lin-, and the class of linear LR(0)-grammars as lin-LRG.
Classes of languages. In what follows, L (A), where A is
some class of grammars, denotes the class of languages
generated by grammars from A. E.g., the class of
languages generated by linear LR(0)-grammars is denoted by
L (lin-LRG).
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Pumping by LR(0)-grammars</title>
      <p>The first goal of our paper is to specify properties of
deterministic (linear) context-free grammars that assure that
the corresponding (linear) language is non-regular. In this
section, we show such properties. We start with several
definitions and notations.</p>
      <p>Pumping notions. Let G = (V; S; R; S) be an
LR(0)grammar generating (analyzing) the language L = L(G),
and P(G) be the corresponding LR(0)-analyzer for G. Let
w 2 L(G), w = xu1vu2y, where x; u1; v; u2; y 2 S , u1u2 2
S+ are given by the derivation tree Tw from Fig. 1. The
proper sub-trees T1 and T2 of Tw are (computation)
subtrees whose roots are labeled with the same nonterminal
A, thus by replacing T1 with T2 properly inside of Tw, we
again get a derivation tree, namely the derivation tree Tw(0)
for the word w(0) = xvy 2 L(G).</p>
      <p>Analogously, replacing T2 with a copy of T1, we get the
derivation tree Tw(2) for a longer word w(2) = xu21vu22y. If
we repeat i times such replacing of T2 with T1 we obtain the
derivation tree Tw(i+1) for the word w(i + 1) = xui1+1vui2+1y.
Pumping tree, prefix, infix, pattern and reduction. Let
x; u1; v; u2; A; y; T1 be as on Fig. 1. Then we say that
Pp = xu1vu2 is an (x; u1; A; v; u2)-pumping prefix by G,
Pin = xu1vu2y is an (x; u1; A; v; u2; y)-pumping infix by G,
and that T1 is a pumping tree of Pp. Recall that the
LR(0)analysis by P(G) of the prefix Pp in any word of the form
Ppg is the same, for any g 2 S . We say that (u1; A; v; u2) is
a pumping pattern by G (of Pp). Let us recall that for any
z 2 S , the LR(0)-analyzer P(G)) reduces xu1vu2z into xvz.
We write xu1vu2z (P(G) xvz, and say that xu1vu2z (P(G)
T2
xvz is an (x; u1; A; v; u2)-pumping reduction by G. Note
that the word xu1vu2z needs not be a word from L(G).
We will often say in the following that (x; u1; A; v; u2) is
a pumping prefix and that (x; u1; A; v; u2; y) is a pumping
infix.</p>
      <p>Elementary infix etc. We say that an (x; u1; A; v; u2;
y)pumping infix by G is elementary if the (x; u1; A; v;
u2)pumping reduction is the only and, at the same time, the
last pumping reduction by G that can be performed inside
of the word xu1vu2y, i.e. xvy cannot be reduced by any
pumping reduction by G at all. In this case, we say that
(x; u1; A; v; u2) is an elementary pumping prefix, and that
(u1; A; v; u2) is an elementary pumping pattern by G.</p>
      <p>Let us note that the corresponding elementally pumping
tree T1 does contain an internal node with the same
nonterminal as the root of T1, and does not contain any other
repetition of a nonterminal on any path from its root to a
leaf. While the size (the number of terminal symbols) of
a pumping infix by G is unbounded, there exists a
constant c &gt; 0 that depends on the number of terminals and
nonterminals, and the length of rules of G such that any
elementary pumping infix by G is of size at most c.</p>
      <p>
        We can see the following obvious proposition that
summarizes the (pumping) properties of LR(0)-grammars,
which we will use in the following text. It is a direct
consequence of the previous definitions and the properties of
LR(0)-grammars and their analyzers summarized in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
and inspired by [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Proposition 1. Let G = (V; S; R; S) be an LR(0)-grammar
generating (analyzing) the language L = L(G). Let a word
Pp = xu1vu2 be an (x; u1; A; v; u2)-pumping prefix by G, and
xu1u1vu2u2 (P(G) xu1vu2 be an (xu1; u1; A; v; u2)-pumping
reduction by G. Then
(a) Any w 2 L determines its derivation tree Tw by G
unambiguously.
(b) Pp determines its pumping sub-tree unambiguously.
(c) xu1mu1vu2u2nz (P(G) xu1mvu2nz
(xu1m; u1; A; v; u2)-pumping reduction
any m; n &gt; 0, and any z 2 S .</p>
      <p>is
by</p>
      <p>an
G for
(d) xu1vu2z 2 L iff xu1mvu2mz 2 L for any m
z 2 S .
0, and any
(e) An (x; u1; A; v; u2)-pumping prefix by G, and the
pumping pattern (u1; A; v; u2) determine
unambiguously a single pumping reduction.
(f) xun1vu2mz 2 L iff xun1+kvu2m+kz 2 L for any m; n; k</p>
      <p>Assertion (c) is essential for our further considerations.
It shows that, for a non-empty u1, the distance of the place
of pumping from the left end is not limited, and that the
position of pumping is determined both by the pumping
pattern of a pumping reduction and the pumping prefix of
the pumping reduction. We use this property in the
following definition formalizing a non-regular type of pumping.
Example 1. Consider the non-regular context-free
language Lab = fanbn j n 2 Ng, that can be generated by the
grammar G = (fS; S1; a; bg; fa; bg; R; S)), with the
following set of rules R:</p>
      <p>S
S1
!
!</p>
      <p>S1
aS1b j ab</p>
      <sec id="sec-3-1">
        <title>The grammar is reduced and unambiguous. Consider the</title>
        <p>sentence g = aaabbb.</p>
        <p>• The handle of g (cf. Definition 1) is the pair (S1 !
ab; 4), as
and the division of g into a; b ; w is unique:</p>
        <p>S )R aaS1bb )R aaabbb
g = aa ab bb
|{az} |{z} |{wz}
b
• G is an LR(0)-grammar, moreover, linear as
(a) S )R aAw )R ab w = hw
obviously implies (A ! b ; jab j) = (A0 ! b 0; ja0b 0j)</p>
      </sec>
      <sec id="sec-3-2">
        <title>The pumping notions can be illustrated in Fig. 2 with a</title>
        <p>derivation tree for w = xaabby 2 L(G), where x 2 a ,y 2
b .</p>
        <p>We can see, by taking x = a, that Pp = aaabb is an
(a; a; S1; ab; b)-pumping prefix by G, and that T1 is a
pumping tree of Pp. Pumping pattern of Pp by G is
(a; S1; ab; b) and aaabb (P(G) aab is an (a; a; S1; ab;
b)pumping reduction by G. Realize that a pumping reduction
by G can be applied to any word akbm, where k; m &gt; 1,
including the cases when k 6= m.</p>
        <p>Moreover, we can see that (l ; a; S1; ab; b; l ) is the
single elementary pumping infix by G.</p>
        <p>Definition 3. Let G = (V; S; R; S) be an LR(0)-grammar
generating (analyzing) the language L = L(G). Let
x
a
T2
xu1vu2 (P(G) xv be an (x; u1; A; v; u2)-pumping reduction
by G.</p>
        <p>We say that A is a distinguishing nonterminal for G, and
xu1vu2 (P(G) xv is an (x; u1; A; v; u2)-distinguishing
reduction by G if at least one of the following conditions is true:
(I) for some y 2 S there is a p &gt; 0 such that xvy 2 L iff
xvu2p jy 2= L, for all j &gt; 0;
(II) for some y 2 S , there is a p &gt; 0 such that xvy 2 L iff
xu1p jvy 2= L, for all j &gt; 0.</p>
        <p>We say that the tuple (u1; A; v; u2) is a distinguishing
pattern for G.</p>
      </sec>
      <sec id="sec-3-3">
        <title>If G contains a distinguishing nonterminal (pattern)</title>
        <p>then we call G a distinguishing LR(0)-grammar (denoted
dist-LRG).</p>
      </sec>
      <sec id="sec-3-4">
        <title>For the sake of accuracy, we also say that A is an</title>
        <p>(x; u1; A; v; u2)-distinguishing nonterminal for G.
Example 1 (continued). According to case (I) of
Definition 3 the nonterminal S1 is a distinguishing nonterminal
and aabb (P(G) ab is a (l ; a; S1; ab; b)-distinguishing
reduction for y = l , and p = 1. Thus, (a; S1; ab; b) is a
distinguishing pattern for G.</p>
        <p>
          Let us recall that the class of LR(0)-languages is not
closed under complement (see [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]), but DCFL is closed
under complement.
        </p>
        <p>Example 1 (continued). The language Labc = c+ Lab
can be generated by LR(0) grammar Gabc =
(fS; S1; S2; a; b; cg; fa; b; cg; R; S), with the set of rules R :
S
S2
S1
!
!
!
cS2
cS2 j S1
aS1b j ab
Then, analogously to the situation in Lab,
nonterminal S1 is distinguishing, and there is a (c; a; S1; ab;
b)distinguishing reduction based on case (I) of Definition 3;
here, we can take again y = l , and p = 1. On the other
hand, (c; S2; ab; l ) is a pumping pattern by Gabc, which
is not distinguishing for any (x; c; S2; ab; l )-reduction by
Gabc; thus (c; S2; ab; l ) is a non-distinguishing pumping
pattern by Gabc.</p>
        <p>Lemma 2. Let G = (V; S; R; S) be a dist-LR(0)-grammar.
Let (u1; A; v; u2) be a distinguishing pattern by G. Then
u1; u2 2 S+.</p>
        <p>Proof. Assume, for example, that condition (I) from
Definition 3 holds. It is then clear that u2 6= l , otherwise
xvu2p jy = xvy for all p; j &gt; 0.</p>
        <p>It is also easy to see that u1 6= l . If u1 = l , we get from
Proposition 1 that xvu2pmy 2 L and we have xvu2pmu2py 2= L
for all m 0 and some p &gt; 0. The first fact yields xvu2py 2
L for m = 1, the second fact yields xvu2py 2= L for m = 0.
This is a contradiction.</p>
        <p>The existence of a distinguishing nonterminal serves as
a sufficient condition for non-regularity.</p>
        <p>Theorem 3. Let L = L(G) be a language accepted by a
dist-LR(0)-grammar G = (V; S; R; S). Then L is a
nonregular language.</p>
        <p>Proof. Assume, for a contradiction, that G = (V; S; R; S)
is a dist-LR(0)-grammar generating a regular language
L = L(G) and B be an (x; u1; B; v; u2)-distinguishing
nonterminal for G.</p>
        <p>As L is regular, there exists a deterministic finite
automaton A = (Q; S; d ; q0; F) with nA = jQj states
accepting the language L = L(A). The proof then follows by
the case analysis based on properties of the (x; u1; B; v;
u2)distinguishing pattern. Since there is an analogy of the
case analysis, we will prove the theorem only by the
assumption that condition (I) from Definition 3 is fulfilled.</p>
        <p>Suppose that for some y 2 S and p &gt; 0 it holds xvy 2 L,
and, for all j &gt; 0, xvu2p jy 2= L.</p>
        <p>Proposition 1 implies that xu1pmvu2pmy 2 L for all m
0
and xu1pmvu2p(m+ j)y 2= L for all m 0, j &gt; 0, and Lemma
2 implies that u1; u2 2 S+. Let us inspect states
reachable by the automaton A when reading words of the form
xu1pmvu2pk, for m &gt; nA, k 0. Since A has nA states,
the pigeonhole principle yields that there are integers r; s,
1 r &lt; s nA + 1 and a state q of A such that</p>
        <p>q = d (q0; xu1pmvu2pr) = d (q0; xu1pmvu2ps):
As xu1pmvu2pmy 2 L, both words</p>
        <p>xu1pmvu2pmy = xu1pmvu2pru2p(m r)y
and</p>
        <p>xu1pmvu2psu2p(m r)y = xu1pmvu2pmu2p(s r)y
are accepted by A. Then, for j = s r we have a
contradiction with the fact that xu1pmvu2pmu2p jy 2= L, for all m 0,
j &gt; 0.</p>
        <p>Next, suppose that for some y 2 S and p &gt; 0 it holds
xvy 2= L, and xvu2p jy 2 L, for all j &gt; 0.</p>
        <p>Using the same reasoning as above, we derive that
A rejects xu1pmvu2pmy = xu1pmvu2pru2p(m r)y as well as
xu1pmvu2psu2p(m r)y = xu1pmvu2pmu2p(s r)y, which is again a
contradiction.</p>
        <p>The previous proof yields the following corollary.
Corollary 1. Let G = (V; S; R; S) be a
dist-LR(0)grammar. Let (x; u1; A; v; u2) be a distinguishing prefix by
G, and Pin be an (x; u1; A; v; u2; y)-infix by G. Then the
language L(G) \ fxu1nvu2my j n 0; m 0g is not a regular
language.</p>
        <p>We will use the following definition to stress the ability
of LR(0)-grammars to define DCFL.</p>
        <p>Definition 4. Let G be an LRG(0; $) grammar which
generates L(G) = Lf$g, where L S . We say that G is an</p>
      </sec>
      <sec id="sec-3-5">
        <title>LRG(1)-grammar which defines L.</title>
      </sec>
      <sec id="sec-3-6">
        <title>We take</title>
        <p>L (LRG(1))
L (dist-LRG(1))
= fL j Lf$g 2 LRG($)g;
= fL j Lf$g 2 dist-LRG($)g:
In what follows, the sign
means a proper subset.</p>
        <p>Theorem 4. It holds the following:
(a)
(b)
(c)
(d)
(e)</p>
        <p>L (dist-LRG)
L (lin-dist-LRG)
L (dist-LRG(1))
L (lin-dist-LRG(1))
L (LRG(1)) = DCFL:</p>
        <p>L (LRG);</p>
        <p>L (lin-LRG);
L (LRG(1));</p>
        <p>
          L (lin-LRG(1));
Proof. All inclusions (a)–(d) follow from the fact that,
according to Theorem 3, all languages generated (accepted)
by distinguishing LR(0)-grammars are non-regular, while
both L (lin-LRG) and L (lin-LRG(1)) contain all regular
languages. The equality (e) is just the fact that for any
deterministic context-free language L S , where $ 62 S, the
language Lf$g is accepted by an LR(0)-grammar [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Some total separations inside of</title>
    </sec>
    <sec id="sec-5">
      <title>LR(0)-languages</title>
      <p>While the size of distinguishing pumping patterns is
unbounded, the size of elementary pumping patterns for a
given grammar is limited. Therefore, below we introduce
a condition that implies non-regularity of a language
generated by an LR(0)-grammar based on elementary
pumping patterns only.</p>
      <p>Definition 5. Let G = (V; S; R; S) be an LR(0)-grammar
and a = (x; u1; A; v; u2; y) be an elementary pumping infix
by G.</p>
      <p>We say that a is regular if the language L(G; a) =
L(G) \ fxun1vu2my j n 0; m 0g is a regular language.</p>
      <p>We say that a is non-regular if the language L(G; a) =
L(G) \ fxu1nvu2my j n 0; m 0g is a non-regular
language.</p>
      <p>We say that G is an elementary-regular grammar if all
elementary pumping infixes by G are regular. We denote
the class of elementary-regular grammars by the prefix er-.</p>
      <p>We say that G is an elementary-non-regular grammar if
there exists an elementary non-regular pumping infix by G.</p>
      <sec id="sec-5-1">
        <title>We denote the class of elementary non-regular grammars</title>
        <p>by the prefix enr-.</p>
        <p>We say that a language L is an elementary-non-regular
language if an elementary-non-regular LR(0)-grammar G
exists such that L(G) = L. The property being
elementarynon-regular we mark by the prefix enr- and the class of
elementary-non-regular grammars is denoted as enr-LRG.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Similarly, the prefix er- will denote the property being elementary-regular.</title>
        <sec id="sec-5-2-1">
          <title>We say that a language L 2 L (LR(0)) is an elementary</title>
          <p>regular language if there is not any
elementary-nonregular LR(0)-grammar G such that L(G) = L. We denote
the class of elementary-regular languages as L (er-LGR).</p>
          <p>The next corollary follows from the previous definition
and Corollary 1.</p>
          <p>Corollary 2. Let G = (V; S; R; S) be a
dist-LR(0)grammar. Let (x; u1; A; v; u2) be a distinguishing
elementary prefix by G, and a = (x; u1; A; v; u2; y) be a pumping
infix by G. Then the language L(G) is an
elementary-nonregular language.</p>
          <p>Theorem 5. Let L be an elementary-non-regular
language. Then L is not a regular language.</p>
          <p>Proof. Let G be an elementary-non-regular
LR(0)grammar such that L(G) = L. Let a = (x; u1; A; v; u2; y)
be a non-regular pumping infix by G. The facts that
L0 = fxu1nvu2my j n
0; m
0g
is a regular language and L \ L0 is not a regular language
imply that L is not regular, because regular languages are
closed under the intersection operation.</p>
          <p>For the opposite direction that each non-regular
language L generated by an LR(0)-grammar is
elementarynon-regular, we still do not have proof.</p>
          <p>The next corollary presents our current results about
total separations achieved by LR(0)-grammars. We consider
these results important from the point of view of our
motivations mentioned in the introduction. The corollary is
mainly a consequence of Definition 5.</p>
          <p>Corollary 3. The following equations hold:
(a) L (er-LRG) [ L (enr-LRG) = L (LRG);
(b) L (er-lin-LRG) [ L (enr-lin-LRG) =</p>
          <p>L (lin-LRG);
(c) L (er-LRG) \ L (enr-LRG) = 0/ ;
(d) L (er-lin-LRG) \ L (enr-lin-LRG) = 0/ :</p>
          <p>What remains is a further study of relations between
the class of regular languages and the classes of languages
generated by LR(0)-grammars that are not distinguishing
or elementary-regular. We believe that the following
conjectures hold.</p>
          <p>Conjecture 1. It holds the following
(a) L (er-LRG) = Reg:
(b) L (enr-LRG) = L (LRG)
Reg:</p>
          <p>Above, in Corollary 2, we have seen that if an
elementary pumping prefix is distinguishing for an
LR(0)grammar G, we can obtain a non-regular pumping infix for
G. We conjecture that also the opposite direction holds.
Conjecture 2. Let G = (V; S; R; S) be an LR(0)-grammar
and (x; u1; A; v; u2; y) be a pumping infix by G. If
(x; u1; A; v; u2; y) is non-regular pumping infix by G then
(u1; A; v; u2) is a distinguishing pattern for G.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>It remains yet some work to do in order to fulfill our plans
mentioned in the introduction and show our conjectures. It
also remains to show the relation between elementary
nonregular pumping patterns and distinguishing pumping
patterns, and between elementary regular pumping patterns
and non-distinguishing pumping patterns. We believe that
we will make that in the near future.</p>
      <p>We also plan to show that the distinguishing pumping
and the non-distinguishing pumping by LR(0)-grammars
can be characterized by pumping patterns of limited size
(e.g., elementary pumping patterns) depending only on
corresponding LR(0)-grammars. Such patterns could help
with proving the decidability of the problem whether an
LR(0)-grammar is distinguishing or not. After that, we
plan to present the mentioned transformation to restarting
automata controlled by LR(0)-grammars. This step will
open a broad field on studying the descriptional
complexity of DCFL and its subclasses. One interesting type of
descriptional complexity will be the degree of non-regularity
of DCFL and some of its subclasses. Similarly, a degree
of regularity of a deterministic context-free language can
be measured.</p>
      <p>Finally, let us note that total separations of the
presented type have essential importance for computational
and comparative linguistic, and for expressing of
properties of programming languages. We can, in this way,
formally express, e.g., the (non-)existence of obligatory
adjuncts in a natural language (or in a sentence of it) or the
use of different types of parenthesis in a programming
language.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Bejcˇek</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panevová</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popelka</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Stranˇák, P., Sevcˇíková, M.,
          <string-name>
            <surname>Šteˇpánek</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Žabokrtský</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Prague dependency treebank 2.5 - a revisited version of PDT 2.0</article-title>
          . In: Kay,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Boitet</surname>
          </string-name>
          , C. (eds.)
          <source>COLING</source>
          <year>2012</year>
          , 24th International Conference on Computational Linguistics,
          <source>Proceedings of the Conference: Technical Papers</source>
          ,
          <fpage>8</fpage>
          -
          <issue>15</issue>
          <year>December 2012</year>
          , Mumbai, India. pp.
          <fpage>231</fpage>
          -
          <lpage>246</lpage>
          . Indian Institute of Technology Bombay (
          <year>2012</year>
          ), https://www.aclweb. org/anthology/C12-1015/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Geller</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrison</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>On LR(k) grammars and languages</article-title>
          .
          <source>Theor. Comput. Sci. 4</source>
          (
          <issue>3</issue>
          ),
          <fpage>245</fpage>
          -
          <lpage>276</lpage>
          (
          <year>1977</year>
          ). https://doi.org/10.1016/
          <fpage>0304</fpage>
          -
          <lpage>3975</lpage>
          (
          <issue>77</issue>
          )
          <fpage>90013</fpage>
          -
          <lpage>5</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Harrison</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Introduction to formal language theory</article-title>
          . Series in computer science,
          <source>Addison-Wesley Longman Publishing Co., Inc</source>
          . (
          <year>1978</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Jancˇar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mráz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plátek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vogel</surname>
          </string-name>
          , J.:
          <article-title>Restarting automata</article-title>
          . In: Reichel, H. (ed.)
          <source>Fundamentals of Computation Theory, 10th International Symposium</source>
          , FCT '95,
          <string-name>
            <surname>Dresden</surname>
          </string-name>
          , Germany,
          <source>August 22-25</source>
          ,
          <year>1995</year>
          ,
          <source>Proceedings. Lecture Notes in Computer Science</source>
          , vol.
          <volume>965</volume>
          , pp.
          <fpage>283</fpage>
          -
          <lpage>292</lpage>
          . Springer (
          <year>1995</year>
          ). https://doi.org/10.1007/3-540-60249-6_
          <fpage>60</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Lopatková</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plátek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sgall</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Towards a formal model for functional generative description: Analysis by reduction and restarting automata</article-title>
          .
          <source>Prague Bull. Math. Linguistics 87</source>
          ,
          <fpage>7</fpage>
          -
          <lpage>26</lpage>
          (
          <year>2007</year>
          ), http://ufal.mff.cuni.cz/ pbml/87/lopatkova-et-al.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Mráz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pardubská</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plátek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Šíma</surname>
          </string-name>
          , J.:
          <article-title>Pumping deterministic monotone restarting automata and DCFL</article-title>
          . In: Holena,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Horváth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Kelemenová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Mráz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Pardubská</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Plátek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Sosík</surname>
          </string-name>
          , P. (eds.)
          <source>Proceedings of the 20th Conference Information Technologies - Applications and Theory (ITAT</source>
          <year>2020</year>
          ), Hotel Tyrapol, Oravská Lesná, Slovakia,
          <source>September 18-22</source>
          ,
          <year>2020</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2718</volume>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>58</lpage>
          . CEUR-WS.org (
          <year>2020</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2718</volume>
          /paper13.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Sgall</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goralcˇíková</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nebeský</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajicˇová</surname>
          </string-name>
          , E.:
          <article-title>A functional approach to syntax in generative description of language</article-title>
          . American Elsevier Publishing Company, Inc., New York (
          <year>1969</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Šíma</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plátek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>One analog neuron cannot recognize deterministic context-free languages</article-title>
          . In: Gedeon,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.W.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <source>(eds.) Neural Information Processing - 26th International Conference, ICONIP</source>
          <year>2019</year>
          ,
          <article-title>Sydney</article-title>
          ,
          <string-name>
            <surname>NSW</surname>
          </string-name>
          , Australia,
          <source>December 12-15</source>
          ,
          <year>2019</year>
          , Proceedings,
          <source>Part III. Lecture Notes in Computer Science</source>
          , vol.
          <volume>11955</volume>
          , pp.
          <fpage>77</fpage>
          -
          <lpage>89</lpage>
          . Springer (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -36718-
          <issue>3</issue>
          _
          <fpage>7</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Stearns</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          :
          <article-title>A regularity test for pushdown machines</article-title>
          .
          <source>Inf. Control</source>
          .
          <volume>11</volume>
          (
          <issue>3</issue>
          ),
          <fpage>323</fpage>
          -
          <lpage>340</lpage>
          (
          <year>1967</year>
          ). https://doi.org/10.1016/S0019-
          <volume>9958</volume>
          (
          <issue>67</issue>
          )
          <fpage>90591</fpage>
          -
          <lpage>8</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Valiant</surname>
            ,
            <given-names>L.G.</given-names>
          </string-name>
          :
          <article-title>Regularity and related problems for deterministic pushdown automata</article-title>
          .
          <source>Journal of the ACM 22</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          (
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>