On Some Succinct Representations of Regular Languages Extended Abstract Bruno Guillon, Giovanni Pighizzini, and Luca Prigioniero Dipartimento di Informatica, Università degli Studi di Milano, Italy {guillonb, pighizzini, prigioniero}@di.unimi.it Abstract. Non-self-embedding grammars, constant-height pushdown automata and 1-limited automata are restrictions of context-free gram- mars, pushdown automata and Turing machines, respectively. All of them characterize the class of regular languages. There is a double size expo- nential gap from each of these models to deterministic nite automata. Non-self-embedding grammars and constant-height pushdown automata are polynomially related in size. Moreover, there exists a polynomial size simulation by 1-limited automata. In contrast, the converse transforma- tion costs exponential. 1 Introduction Regular languages are usually represented using regular expression or nite au- tomata. It is well known that, in the worst case, deterministic automata can require exponentially many states with respect to equivalent nondeterministic automata. Hence, there is an exponential size gap from nondeterministic to de- terministic automata. Further representations which can be more succinct than nondeterministic automata have been discovered and investigated. Three of them are considered in this work: non-self-embedding grammars, constant-height push- down automata, and 1-limited automata. The size gap from each one of these representations to equivalent deterministic automata is double exponential. To describe non-self-embedding grammars, we rst recall that the extra capa- bility of context-free grammars with respect to regular ones is that of describing recursive structures as, for instance, nested parentheses, arithmetic expressions, typical programming language constructs. In terms of recognizing devices, this capability is implemented through the pushdown store, which is used to ex- tend nite automata in order to make the resulting model, namely pushdown automata, equivalent to context-free grammars. To emphasize this capability, in one of his pioneering papers, Chomsky inves- tigated the self-embedding property [4]: a context-free grammar is self-embedding if it contains a variable A which, in some sentential form, is able to reproduce ? itself surrounded by two nonempty strings α and β , in symbols A =⇒ αAβ . Roughly speaking, this means that such self-embedded variable A is truly re- cursive. He proved that, among all context-free grammars, only self-embedding 2 B. Guillon, G. Pighizzini, and L. Prigioniero ones can generate nonregular languages. Hence, non-self-embedding grammars are no more powerful than nite automata. The proof given by Chomsky of this result is constructive, namely it pro- vides a method for obtaining a nite automaton equivalent to a given non-self- embedding grammar [3,4]. A dierent constructive proof of the same result was given by Anselmo, Giammarresi, and Varricchio [1], by showing a decomposition of non-self-embedding grammars in regular grammars and then iteratively apply- ing regular substitutions to obtain equivalent nite automata. In the same paper, the authors also proved that the size gap from non-self-embedding grammars to equivalent automata is at least exponential. It is worthwhile to mention that, in 1971, Meyer and Fischer proved that for any recursive function f and arbitrarily large integer n, there exists a context-free grammar whose description has size n and which generates a regular language, such that any equivalent nite automaton requires at least f (n) states [9]. This means that it is not possible to obtain a recursive bound relating the size of context-free grammars generating regular languages with the number of states of equivalent deterministic nite automata. It is important to notice that the result of Meyer and Fischer was obtained by considering grammars with a two- letter terminal alphabet. The unary, i.e., one-letter, case was studied in 2002 by Pighizzini, Shallit, and Wang, who obtained optimal recursive bounds [13]. We recently proved that also in the case of non-self-embedding grammars, the bounds are recursive, independently on the alphabet size [12]. In particular, by inspecting and rening the construction presented in [1], we showed that each non-self-embedding grammar of size s can be converted into equivalent nonde- O(s) 2O(s) terministic and deterministic automata with 2 and 2 states, respectively. We also obtained a family of languages that witness that these gaps cannot be reduced. Furthermore, these gaps do not change if we allow the variables which generate only unary strings (i.e., strings consisting of occurrences of only one ter- minal) to be self-embedded. Such grammars, which are also equivalent to nite automata, are called quasi-non-self-embedding grammars. Constant-height pushdown automata are standard nondeterministic push- down automata where the amount of available pushdown store is xed. Hence, the number of their possible congurations is nite, thus implying that they are no more powerful than nite automata. Exponential and double exponential gaps from constant-height pushdown automata to nondeterministic and deter- ministic automata, respectively, have been proved in [5]. Furthermore, in [2] the authors showed the interesting result that also the gap from nondeterministic to deterministic constant-height pushdown automata is double exponential. As non-self-embedding grammars, constant-height pushdown automata are restric- tions of the corresponding general model, where true recursions are not possible. By comparing these two models, we proved that they are polynomially related in size [6]. For each integer d > 0, a d-limited automaton is a one-tape nondeterminstic Turing machine which is allowed to rewrite the content of each tape cell only in the rst d visits. These models have been introduced by Hibbard in 1967, who NSE Grammars, Constant-Height PDAs, Limited Automata 3 proved that for each d ≥ 2 they characterize context-free languages [7]. This yields a hierarchy of acceptors, merely obtained by restricting one-tape Tur- ing machines, corresponding to Chomsky's classication. Furthermore, as shown in [14, Thm. 12.1], 1-limited automata are equivalent to nite automata. This equivalence has been investigated from the descriptional complexity point of view in [11], by proving exponential and double exponential gaps from 1-limited au- tomata to nondeterministic and deterministic nite automata, respectively. Our main result is a construction transforming each non-self-embedding grammar into a 1-limited automaton of polynomial size. For the converse transformation, we show that an exponential size is necessary. Indeed, we prove a stronger result by exhibiting, for each n > 0, a language Ln accepted by a two-way deterministic nite automaton with O(n) states, which requires exponentially many states to be accepted even by an unrestricted pushdown automaton. From the cost of the conversion of 1-limited automata into nondeterministic automata, it turns out that for the conversion of 1-limited automata into non-self-embedding grammars an exponential size is also sucient. We use standard abbreviations as cfg, pda, etc. The prex 2 before nfa or dfa is used to indicate two-way automata. pdas with pushdown store height bounded by h are indicated as h-pdas. nse is an abbreviation for non-self- embedding. Figure 1 summarizes some of the results discussed in this extended abstract. More details can be found in [12] and [6]. poly (Thm 2) poly (Thm 3) nse h-pda 1-la poly (Thm 2) exp (Cor 1) exp (Thm 4) exp exp [11, Thm 4] 2dfa 2nfa ? [Sakoda-Sipser problem] Fig. 1. Some bounds discussed in the paper. Dotted arrows denote trivial relationships, while the dashed arrow indicates the famous Sakoda and Sipser's question. The expo- nential cost of the simulation of h-pdas by 2nfas is discussed at the end of Section 2. 2 A Summary of the Results For each model under consideration, we evaluate its size as the total number of symbols used to write down its description. For instance, the size of a grammar is linear in the sum of the lengths of its productions. Theorem 1 ([12]). Given an nse grammar of size s, there exist an equiva- lent nfa and an equivalent dfa with a number of states exponential and double exponential in s, respectively. In the worst case these sizes cannot be reduced. 4 B. Guillon, G. Pighizzini, and L. Prigioniero To show that the bounds in Theorem 1 cannot be reduced, we can consider, ∗ for any integer h > 0, the language Lh ⊆ {a, b} dened as the set of strings composed of k blocks w1 w2 · · · wk each of length h, for some k > 1, such that the last block wk is the reverse of one of the rst k − 1 blocks, i.e., Lh = {w1 w2 · · · wk−1 wk | k > 1, wi ∈ {a, b}h , i = 1, . . . , k, R and ∃j, 1 ≤ j < k, s.t. wj = wk }. It can be proved that Lh is generated by a nse of size O(n), while, by a standard 2h distinguish ability argument, each dfa accepting it requires 2 many states. The statement of Theorem 1 remains true if the grammar is quasi -nse, i.e., it is allowed to contain self-embedded variables, provided that each terminal string generated by them is unary, namely it consists only of occurrences of a same symbol. In contrast, for quasi -nse grammars generating letter bounded languages, the cost of the conversion into dfas reduces to a simple exponential in the size cost. Theorem 2 ([6]). h-pdas and nse grammars are polynomially related in size. In the proof of Theorem 2, the transformation from h-pdas to nse grammars is an adaption of a standard transformation from pdas to cfgs. For the converse, a modication of a decomposition of nse grammars presented in [1] is used. Now, we consider the size relationships of nse grammars and h-pdas with 1-limited automata. Theorem 3 ([6]). For every nse grammar G, there exist a 1-state letter-to- letter nondeterministic transducer T and a 2nfa A of polynomial size, such that a wordw is generated by G if and only if A accepts an image u of w by T . As a consequence, G can be transformed into a 1-la of polynomial size. Given an input w , the transducer T nondeterministically generates a compression of a derivation tree of w . The 2nfa A veries the validity of such a guess. The resulting 1-la is a composition of T and A. The converse transformation is exponential. Actually we have a stronger result: Theorem 4 ([6]). For each n > 0, let Ln be the language of the powers of any n string of length n over {0, 1}, i.e., Ln = uk | u ∈ {0, 1} , k ≥ 0 . Then:  Ln is accepted by a 2dfa of size O(n);  each context-free grammar in Chomsky normal form needs exponentially many variables in n to generate Ln ;  the size of any pda accepting Ln is at least exponential in n. Corollary 1. The size cost of the conversion of 1-las into nse grammars and h-pdas is exponential. Proof. The lower bound derives from Theorem 4. For the upper bound, in [11] it was proved that each 1-la can be transformed into a 1nfa of exponential size from which, by standard construction, we can obtain a regular (and, so, nse) grammar, without increasing the size asymptotically. t u NSE Grammars, Constant-Height PDAs, Limited Automata 5 In [2], the question of the cost of the conversion of deterministic h-pdas into n ∗ 1nfas was raised. To this regard, we observe that the language (a2 ) is ac- cepted by a deterministic h-pda of size polynomial in n for large enough h (see, e.g., [10]) but, by a standard pumping argument, it requires at least 2n states to be accepted by 1nfas. Actually, as a consequence of state lower bound pre- n sented in [8], 2 states are also necessary to accept it on each 2nfa. Considering Theorem 4, we can conclude that both simulations from two-way automata to h-pdas and from h-pdas to two-way automata cost at least exponential. References 1. Anselmo, M., Giammarresi, D., Varricchio, S.: Finite automata and non-self- embedding grammars. In: CIAA 2002. LNCS, vol. 2608, pp. 4756 (2002) 2. Bednárová, Z., Geert, V., Mereghetti, C., Palano, B.: Removing nondeterminism in constant height pushdown automata. Inf. Comput. 237, 257267 (2014) 3. Chomsky, N.: On certain formal properties of grammars. Information and Control 2(2), 137167 (1959) 4. Chomsky, N.: A note on phrase structure grammars. Information and Control 2(4), 393395 (1959) 5. Geert, V., Mereghetti, C., Palano, B.: More concise representation of regular languages by automata and regular expressions. Inf. Comput. 208(4), 385394 (2010) 6. Guillon, B., Pighizzini, G., Prigioniero, L.: Non-self-embedding grammars, constant-height pushdown automata, and limited automata. In: CIAA 2018. LNCS (2018), to appear 7. Hibbard, T.N.: A generalization of context-free determinism. Information and Con- trol 11(1/2), 196238 (1967) 8. Mereghetti, C., Pighizzini, G.: Two-way automata simulations and unary lan- guages. Journal of Automata, Languages and Combinatorics 5(3), 287300 (2000) 9. Meyer, A.R., Fischer, M.J.: Economy of description by automata, grammars, and formal systems. In: 12th Annual Symposium on Switching and Automata Theory, East Lansing, Michigan, USA, October 13-15, 1971. pp. 188191. IEEE Computer Society (1971) 10. Pighizzini, G.: Deterministic pushdown automata and unary languages. Int. J. Found. Comput. Sci. 20(4), 629645 (2009) 11. Pighizzini, G., Pisoni, A.: Limited automata and regular languages. Int. J. Found. Comput. Sci. 25(7), 897916 (2014) 12. Pighizzini, G., Prigioniero, L.: Non-self-embedding grammars and descriptional complexity. In: NCMA 2017. pp. 197209 (2017) 13. Pighizzini, G., Shallit, J., Wang, M.: Unary context-free grammars and pushdown automata, descriptional complexity and auxiliary space lower bounds. J. Comput. Syst. Sci. 65(2), 393414 (2002) 14. Wagner, K.W., Wechsung, G.: Computational Complexity. D. Reidel Publishing Company, Dordrecht (1986)