On Some Succinct Representations of Regular
                    Languages
                             Extended Abstract

           Bruno Guillon, Giovanni Pighizzini, and Luca Prigioniero


        Dipartimento di Informatica, Università degli Studi di Milano, Italy
               {guillonb, pighizzini, prigioniero}@di.unimi.it


       Abstract.    Non-self-embedding grammars, constant-height pushdown
       automata and 1-limited automata are restrictions of context-free gram-
       mars, pushdown automata and Turing machines, respectively. All of them
       characterize the class of regular languages. There is a double size expo-
       nential gap from each of these models to deterministic nite automata.
       Non-self-embedding grammars and constant-height pushdown automata
       are polynomially related in size. Moreover, there exists a polynomial size
       simulation by 1-limited automata. In contrast, the converse transforma-
       tion costs exponential.


1    Introduction
Regular languages are usually represented using regular expression or nite au-
tomata. It is well known that, in the worst case, deterministic automata can
require exponentially many states with respect to equivalent nondeterministic
automata. Hence, there is an exponential size gap from nondeterministic to de-
terministic automata. Further representations which can be more succinct than
nondeterministic automata have been discovered and investigated. Three of them
are considered in this work: non-self-embedding grammars, constant-height push-
down automata, and 1-limited automata. The size gap from each one of these
representations to equivalent deterministic automata is double exponential.

    To describe non-self-embedding grammars, we rst recall that the extra capa-
bility of context-free grammars with respect to regular ones is that of describing
recursive structures as, for instance, nested parentheses, arithmetic expressions,
typical programming language constructs. In terms of recognizing devices, this
capability is implemented through the pushdown store, which is used to ex-
tend nite automata in order to make the resulting model, namely pushdown
automata, equivalent to context-free grammars.
    To emphasize this capability, in one of his pioneering papers, Chomsky inves-
tigated the self-embedding property [4]: a context-free grammar is self-embedding
if it contains a variable A which, in some sentential form, is able to reproduce
                                                                              ?
itself surrounded by two nonempty strings α and β , in symbols A =⇒ αAβ .
Roughly speaking, this means that such self-embedded variable A is truly re-
cursive. He proved that, among all context-free grammars, only self-embedding
2                                B. Guillon, G. Pighizzini, and L. Prigioniero

ones can generate nonregular languages. Hence, non-self-embedding grammars
are no more powerful than nite automata.
    The proof given by Chomsky of this result is constructive, namely it pro-
vides a method for obtaining a nite automaton equivalent to a given non-self-
embedding grammar [3,4]. A dierent constructive proof of the same result was
given by Anselmo, Giammarresi, and Varricchio [1], by showing a decomposition
of non-self-embedding grammars in regular grammars and then iteratively apply-
ing regular substitutions to obtain equivalent nite automata. In the same paper,
the authors also proved that the size gap from non-self-embedding grammars to
equivalent automata is at least exponential.
    It is worthwhile to mention that, in 1971, Meyer and Fischer proved that for
any recursive function f and arbitrarily large integer n, there exists a context-free
grammar whose description has size n and which generates a regular language,
such that any equivalent nite automaton requires at least f (n) states [9]. This
means that it is not possible to obtain a recursive bound relating the size of
context-free grammars generating regular languages with the number of states
of equivalent deterministic nite automata. It is important to notice that the
result of Meyer and Fischer was obtained by considering grammars with a two-
letter terminal alphabet. The unary, i.e., one-letter, case was studied in 2002 by
Pighizzini, Shallit, and Wang, who obtained optimal recursive bounds [13].
    We recently proved that also in the case of non-self-embedding grammars,
the bounds are recursive, independently on the alphabet size [12]. In particular,
by inspecting and rening the construction presented in [1], we showed that each
non-self-embedding grammar of size s can be converted into equivalent nonde-
                                                O(s)       2O(s)
terministic and deterministic automata with 2          and 2       states, respectively.
We also obtained a family of languages that witness that these gaps cannot be
reduced. Furthermore, these gaps do not change if we allow the variables which
generate only unary strings (i.e., strings consisting of occurrences of only one ter-
minal) to be self-embedded. Such grammars, which are also equivalent to nite
automata, are called quasi-non-self-embedding grammars.

    Constant-height pushdown automata are standard nondeterministic push-
down automata where the amount of available pushdown store is xed. Hence,
the number of their possible congurations is nite, thus implying that they
are no more powerful than nite automata. Exponential and double exponential
gaps from constant-height pushdown automata to nondeterministic and deter-
ministic automata, respectively, have been proved in [5]. Furthermore, in [2] the
authors showed the interesting result that also the gap from nondeterministic
to deterministic constant-height pushdown automata is double exponential. As
non-self-embedding grammars, constant-height pushdown automata are restric-
tions of the corresponding general model, where true recursions are not possible.
By comparing these two models, we proved that they are polynomially related
in size [6].

    For each integer d > 0, a d-limited automaton is a one-tape nondeterminstic
Turing machine which is allowed to rewrite the content of each tape cell only in
the rst d visits. These models have been introduced by Hibbard in 1967, who
                  NSE Grammars, Constant-Height PDAs, Limited Automata                3

proved that for each d ≥ 2 they characterize context-free languages [7]. This
yields a hierarchy of acceptors, merely obtained by restricting one-tape Tur-
ing machines, corresponding to Chomsky's classication. Furthermore, as shown
in [14, Thm. 12.1], 1-limited automata are equivalent to nite automata. This
equivalence has been investigated from the descriptional complexity point of view
in [11], by proving exponential and double exponential gaps from 1-limited au-
tomata to nondeterministic and deterministic nite automata, respectively. Our
main result is a construction transforming each non-self-embedding grammar
into a 1-limited automaton of polynomial size. For the converse transformation,
we show that an exponential size is necessary. Indeed, we prove a stronger result
by exhibiting, for each n > 0, a language Ln accepted by a two-way deterministic
nite automaton with O(n) states, which requires exponentially many states to
be accepted even by an unrestricted pushdown automaton. From the cost of the
conversion of 1-limited automata into nondeterministic automata, it turns out
that for the conversion of 1-limited automata into non-self-embedding grammars
an exponential size is also sucient.
    We use standard abbreviations as cfg, pda, etc. The prex 2 before nfa
or dfa is used to indicate two-way automata. pdas with pushdown store height
bounded by     h are indicated as h-pdas. nse is an abbreviation for non-self-
embedding. Figure 1 summarizes some of the results discussed in this extended
abstract. More details can be found in [12] and [6].


               poly (Thm 2)                         poly (Thm 3)
       nse                       h-pda                                       1-la
               poly (Thm 2)                          exp (Cor 1)

                 exp (Thm 4)                exp                 exp [11, Thm 4]

                        2dfa                2nfa
                                   ?
                        [Sakoda-Sipser problem]

Fig. 1. Some bounds discussed in the paper. Dotted arrows denote trivial relationships,
while the dashed arrow indicates the famous Sakoda and Sipser's question. The expo-
nential cost of the simulation of h-pdas by 2nfas is discussed at the end of Section 2.


2      A Summary of the Results
For each model under consideration, we evaluate its size as the total number of
symbols used to write down its description. For instance, the size of a grammar
is linear in the sum of the lengths of its productions.

Theorem 1 ([12]). Given an nse grammar of size s, there exist an equiva-
lent   nfa and an equivalent dfa with a number of states exponential and double
exponential in   s, respectively. In the worst case these sizes cannot be reduced.
4                                    B. Guillon, G. Pighizzini, and L. Prigioniero

      To show that the bounds in Theorem 1 cannot be reduced, we can consider,
                                                         ∗
for any integer h > 0, the language Lh ⊆ {a, b}              dened as the set of strings
composed of k blocks w1 w2 · · · wk each of length h, for some k > 1, such that
the last block wk is the reverse of one of the rst k − 1 blocks, i.e.,

            Lh = {w1 w2 · · · wk−1 wk | k > 1, wi ∈ {a, b}h , i = 1, . . . , k,
                                                                              R
                                          and ∃j, 1 ≤ j < k, s.t. wj = wk }.

It can be proved that Lh is generated by a nse of size O(n), while, by a standard
                                                                        2h
distinguish ability argument, each dfa accepting it requires 2               many states.
      The statement of Theorem 1 remains true if the grammar is quasi -nse, i.e.,
it is allowed to contain self-embedded variables, provided that each terminal
string generated by them is unary, namely it consists only of occurrences of a
same symbol. In contrast, for quasi -nse grammars generating letter bounded
languages, the cost of the conversion into dfas reduces to a simple exponential
in the size cost.

Theorem 2 ([6]). h-pdas and nse grammars are polynomially related in size.
In the proof of Theorem 2, the transformation from h-pdas to nse grammars is
an adaption of a standard transformation from pdas to cfgs. For the converse,
a modication of a decomposition of nse grammars presented in [1] is used.

      Now, we consider the size relationships of nse grammars and h-pdas with
1-limited automata.
Theorem 3 ([6]). For every nse grammar G, there exist a 1-state letter-to-
letter nondeterministic transducer T and a 2nfa A of polynomial size, such
that a wordw is generated by G if and only if A accepts an image u of w by T .
As a consequence, G can be transformed into a 1-la of polynomial size.

Given an input w , the transducer T nondeterministically generates a compression
of a derivation tree of w . The 2nfa A veries the validity of such a guess. The
resulting   1-la is a composition of T        and   A. The converse transformation is
exponential. Actually we have a stronger result:

Theorem 4 ([6]). For each n > 0, let Ln be the language of the powers of any
                                                                 n
string of length    n over {0, 1}, i.e., Ln = uk | u ∈ {0, 1} , k ≥ 0 . Then:
     Ln is accepted by a 2dfa of size O(n);
     each context-free grammar in Chomsky normal form needs exponentially
      many variables in   n to generate Ln ;
     the size of any pda accepting Ln is at least exponential in n.
Corollary 1. The size cost of the conversion of 1-las into nse grammars and
h-pdas is exponential.
Proof. The lower bound derives from Theorem 4. For the upper bound, in [11]
it was proved that each 1-la can be transformed into a 1nfa of exponential size
from which, by standard construction, we can obtain a regular (and, so, nse)
grammar, without increasing the size asymptotically.                                        t
                                                                                            u
                   NSE Grammars, Constant-Height PDAs, Limited Automata                 5

   In [2], the question of the cost of the conversion of deterministic h-pdas into
                                                                           n   ∗
1nfas was raised. To this regard, we observe that the language (a2             )   is ac-
cepted by a deterministic h-pda of size polynomial in n for large enough h (see,
e.g., [10]) but, by a standard pumping argument, it requires at least       2n states
to be accepted by 1nfas. Actually, as a consequence of state lower bound pre-
               n
sented in [8], 2   states are also necessary to accept it on each 2nfa. Considering
Theorem 4, we can conclude that both simulations from two-way automata to
h-pdas and from h-pdas to two-way automata cost at least exponential.


References
 1. Anselmo, M., Giammarresi, D., Varricchio, S.: Finite automata and non-self-
    embedding grammars. In: CIAA 2002. LNCS, vol. 2608, pp. 4756 (2002)
 2. Bednárová, Z., Geert, V., Mereghetti, C., Palano, B.: Removing nondeterminism
    in constant height pushdown automata. Inf. Comput. 237, 257267 (2014)
 3. Chomsky, N.: On certain formal properties of grammars. Information and Control
    2(2), 137167 (1959)
 4. Chomsky, N.: A note on phrase structure grammars. Information and Control 2(4),
    393395 (1959)
 5. Geert, V., Mereghetti, C., Palano, B.: More concise representation of regular
    languages by automata and regular expressions. Inf. Comput. 208(4), 385394
    (2010)
 6. Guillon, B., Pighizzini, G., Prigioniero, L.: Non-self-embedding grammars,
    constant-height pushdown automata, and limited automata. In: CIAA 2018. LNCS
    (2018), to appear
 7. Hibbard, T.N.: A generalization of context-free determinism. Information and Con-
    trol 11(1/2), 196238 (1967)
 8. Mereghetti, C., Pighizzini, G.: Two-way automata simulations and unary lan-
    guages. Journal of Automata, Languages and Combinatorics 5(3), 287300 (2000)
 9. Meyer, A.R., Fischer, M.J.: Economy of description by automata, grammars, and
    formal systems. In: 12th Annual Symposium on Switching and Automata Theory,
    East Lansing, Michigan, USA, October 13-15, 1971. pp. 188191. IEEE Computer
    Society (1971)
10. Pighizzini, G.: Deterministic pushdown automata and unary languages. Int. J.
    Found. Comput. Sci. 20(4), 629645 (2009)
11. Pighizzini, G., Pisoni, A.: Limited automata and regular languages. Int. J. Found.
    Comput. Sci. 25(7), 897916 (2014)
12. Pighizzini, G., Prigioniero, L.: Non-self-embedding grammars and descriptional
    complexity. In: NCMA 2017. pp. 197209 (2017)
13. Pighizzini, G., Shallit, J., Wang, M.: Unary context-free grammars and pushdown
    automata, descriptional complexity and auxiliary space lower bounds. J. Comput.
    Syst. Sci. 65(2), 393414 (2002)
14. Wagner, K.W., Wechsung, G.: Computational Complexity. D. Reidel Publishing
    Company, Dordrecht (1986)