Iterated Transduction on Unary Languages? (short paper) Martin Kutrib1 , Andreas Malcher1 , Carlo Mereghetti2 , and Beatrice Palano3 1 Institut für Informatik, Universität Giessen, Arndtstr. 2, 35392 Giessen, Germany {kutrib,andreas.malcher}@informatik.uni-giessen.de 2 Dipartimento di Fisica “Aldo Pontremoli”, Università degli Studi di Milano via Celoria 16, 20133 Milano, Italy, carlo.mereghetti@unimi.it 3 Dipartimento di Informatica “G. degli Antoni”, Università degli Studi di Milano via Celoria 18, 20133 Milano, Italy, palano@unimi.it Abstract. An iterated uniform finite-state transducer executes the same length-preserving transduction in iterative sweeps. The first sweep occurs on the input string, while any subsequent sweep processes the output of the previous one. We focus on unary inputs. We show that any unary regular language can be accepted by a deterministic iterated uniform finite-state transducer with at most max{2%, p} + 1 states, % and p being the greatest primes in the factorization of the, respectively, pre-periodic and periodic part of the language. This state cost cannot be improved by using nondeterminism, and is definitely lower than the state costs of equivalent classical models of finite-state automata. Keywords: Iterated transducers; Unary languages; State complexity 1 Introduction The notion of an iterated uniform finite-state transducer (iufst) has been in- troduced in [20, 21, 25]. It consists of a length-preserving finite-state transducer that works in iterative sweeps from left to right on its input tape. In the first sweep the input string is processed, while any further sweep operates on the output of the previous sweep. The model is uniform in that every sweep always starts from the same initial state on the leftmost tape symbol, and operates the same transduction rules at each computation step. An input string is accepted whenever the transducer halts in an accepting state at the end of a sweep. A theoretical investigation of iufsts is motivated by the fact that iterated or cascade transductions show up in numerous fields of computer science, e.g., in natural language processing [17], in compiler design [1], in the celebrated Krohn- Rhodes decomposition theorem [18]. Again, cascades of deterministic pushdown transducers as language accepting devices have been studied in [15]. In [11, 26], iterated finite-state transducers as language generating devices have been settled. ? Extended version presented at the 47th Int. Conf. SOFSEM, January 25–29, 2021, Bolzano-Bozen, Italy, [24]. Copyright © 2021 for this paper by its authors. Use per- mitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 M. Kutrib, A. Malcher, C. Mereghetti, B. Palano Notice that only these latter contributions introduce a notion of “uniformity” on iterated transduction, in that always the same transducer is iteratively applied. Deterministic and nondeterministic iufsts (the nondeterministic model de- noted by niufst) have been deeply studyied in [20, 21, 23, 25]. For a constant number of sweeps, iufsts and niufsts characterize the class of regular lan- guages. For a non-constant number of sweeps, non-regular languages can be accepted as long as an at least logarithmic number of sweeps is provided, and infinite proper language hierarchies depending on sweep complexity are shown. Recently, in [22], iufsts and niufsts have been enhanced with the possibil- ity of two-way motion, implying a sweep alternation from left to right and from right to left. Indeed, iufsts and niufsts have been investigated within the realm of descriptional complexity, where device size is under consideration (see, e.g., [2–10, 19]). In descriptional complexity, techniques from several areas are em- ployed [12, 13, 28], yielding results of both theoretical and practical interest [16]. In this paper, we construct size-efficient iterated transducers for general unary regular languages. We give our construction step by step, starting from finite unary languages. We show that finite unary languages with words of length up to ` can be accepted by an iufst with 2% states, % being the greatest prime in the factorization of `. Next, we switch to unary periodic languages, and prove that any n-periodic unary language can be accepted by an iufst with p states, p being the greatest prime in the factorization of n. By combining these two results, we get that any unary regular language is accepted by an iufst with at most max{2%, p} + 1 states. We then show that these state costs cannot be improved by using nondeterminism, and that they are definitely lower than the state costs of equivalent classical models of finite-state automata. Due to the page limit, proofs and further results (some of which may be found in [24]) have been omitted. 2 Definitions and Preliminaries By the Fundamental Qs Theorem of Arithmetic, any integer n > 1 univocally factor- izes as n = i=1 pα i i , for primes p1 < · · · < ps and integers αi > 0. For any n > 1, we let Zn = {0, 1, . . . , n − 1} be the ring of the remainders modulo n. Let Σ ∗ be the set of words on a finite alphabet Σ, including the empty word. The length of a word w is denoted by |w|. A language on Σ is any set L ⊆ Σ ∗ . A nondeterministic iterated uniform finite-state transducer (niufst) is a 8- tuple T = hQ, Σ, ∆, q0 , B, C, δ, F i, with Q the set of internal states, Σ the set of input symbols, ∆ the set of output symbols, q0 ∈ Q the initial state, B ∈ ∆ \ Σ and C ∈ ∆ \ Σ the left and right endmarkers, respectively, F ⊆ Q the set of accepting states, and δ : Q×(Σ ∪∆) → 2Q×∆ the partial transition function. The niufst T halts if the transition function is undefined or T enters an accepting state at the end of a sweep. The transduction is applied in multiple sweeps, and in any but the initial sweep it processes an output of the previous sweep. So, the transition function depends on symbols in Σ ∪ ∆. Let T (w) be the set of possible outputs yielded by T in a complete sweep on the input string w ∈ (Σ ∪ ∆)∗ . Iterated Transduction on Unary Regular Languages 3 In a computation on input w ∈ Σ ∗ , the niufst T yields a sequence of words w1 , . . . , wi , wi+1 , . . . ∈ (Σ ∪ ∆)∗ , with w1 ∈ T (BwC) and wi+1 ∈ T (wi ) for i ≥ 1. An iterated uniform finite-state transducer is deterministic (iufst) whenever |δ(p, x)| ≤ 1, for all p ∈ Q and x ∈ (Σ ∪ ∆). In this case, we simply write δ(p, x) = (q, y) instead of δ(p, x) = {(q, y)} assuming that the transition function is a mapping δ : Q × (Σ ∪ ∆) → Q × ∆. A computation is halting if there exists an r ≥ 1 such that T halts on wr , thus performing r sweeps. The input word w ∈ Σ ∗ is accepted by T if at least one com- putation on w halts at the end of a sweep in an accepting state. Otherwise it is rejected. Indeed, the output of the last sweep is not used. The language accepted by T is the set L(T ) ⊆ Σ ∗ defined as L(T ) = { w ∈ Σ ∗ | w is accepted by T }. For nondeterministic computations and some complexity bound, several lan- guage acceptance modes are usually considered in the literature. Here, we deal with the number of sweeps as complexity measure, and adopt the so-called ac- cept mode of acceptance [27]. A language is accepted in the accept mode if all accepting computations obey the complexity bound. More precisely, given a function s : N → N, an iterated uniform finite-state transducer T has sweep complexity s(n) if for all w ∈ L(T ) all accepting computations on w halt after at most s(|w|) sweeps. In this case, we add the prefix s(n)- to the notation of the device. It is easy to see that 1-iufsts (resp., 1-niufsts) are deterministic (resp., nondeterministic) finite-state automata (dfas and nfas, respectively). A language L ⊆ Σ ∗ is unary whenever built on a single-letter alphabet. In this case, we let L ⊆ 0∗ . A unary language L ⊆ 0∗ is n-periodic if there exists R ⊆ Zn such that L = { 0c·n+R | c ≥ 0 and R ∈ R }. We will always be assuming that n is the minimal value defining L. This is usually referred to as L being properly n-periodic. To emphasize periodicity and R ⊆ Zn , we express L in the form Ln,R . By pumping arguments, we get that n states are necessary and sufficient for dfas and nfas to accept Ln,R . The minimal dfa for Ln,R consists of a single cordless cycle of n states, with an initial state and final states settled according to R. On the other hand, for n factorizing as n = pα α2 αs 1 · p2 · · · · · ps , 1 we have that any two-way dfa and nfa, or isolated Ps cut-pint probabilistic finite automaton (pfa) for Ln,R must have at least i=1 pα i states [29, 30]. i Any unary regular language is the disjoint union of a finite language and an ultimately periodic language. So, it is defined by three parameters `, n, R ⊆ Zn : L`,n,R = L` ∪ { 0`+c·n+R | c ≥ 0 and R ∈ R }, The set L` , called the pre-periodic part of L`,n,R , is a finite unary language with strings of length not exceeding `. The set { 0`+c·n+R | c ≥ 0 and R ∈ R } is the periodic part of L`,n,R . As usual, we assume that the parameters ` and n are the smallest possible defining L`,n,R . Note that L`,n,R is accepted by a dfa whose state digraph features an initial path of ` states joined to a cordless cycle of n states. Clearly, the above recalled state lower bound for two-way dfas and nfas, or isolated cut-point pfas for Ln,R carries over to L`,n,R as well. By simulation results in [14, 31], if L`,n,R is accepted √ by a b-state nfa or two-way nfa, then we can assume ` = O(b2 ) and n = eΘ( b·log b) . 4 M. Kutrib, A. Malcher, C. Mereghetti, B. Palano 3 Iterated Transduction and Unary Regular Languages We present our construction of state-efficient iufsts for unary regular languages by first focusing on unary periodic languages, then considering finite unary lan- guages, and finally getting to general unary regular languages. 3.1 Unary Periodic Languages For reader’s ease of mind, we start by considering unary periodic languages of the form Ln = { 0c·n | c ≥ 0 }, for n ≥ 1. Qs αi Theorem 1. Let n ≥ 2 factorize as n = i=1 pP i , with αi > 0. The language Ln s can be accepted by a ps -state r-iufst with r = i=1 αi sweeps. The minimality – in terms of number of states – of the iufst for Ln designed in Theorem 1 is provided by a pumping argument in the following theorem, which also shows that nondeterminism cannot help in reducing state size: Qs Theorem 2. Let n ≥ 2 factorize as n = i=1 pα i , with αi > 0. Any iufst i and niufst accepting Ln must use at least ps states, regardless of the number of performed sweeps. Let us now move on to a slightly different version of Ln . Precisely, for any n ≥ 1 and a fixed remainder R ∈ Zn \ {0}, we let Ln,R = { 0c·n+R | c ≥ 0 }. The next theorem shows that this modification of Ln does not increase state and sweep complexity of acceptance on iterated transducers. Qs Theorem 3. Let n ≥ 2 factorize as n = i=1 pα i , with i Pα i > 0. The lan- s guage Ln,R can be accepted by a ps -state r-iufst with r = i=1 αi sweeps. Finally, we come to tackle the acceptance of a general unary n-periodic lan- guage Ln,R , for a fixed set R ⊆ Zn : Ln,R = { 0c·n+R | c ≥ 0 and R ∈ R }. Qs Theorem 4. Let n ≥ 2 factorize as n = i=1 pα i , with i Pαs i > 0. The lan- guage Ln,R can be accepted by a ps -state r-iufst with r = i=1 αi sweeps. We notice that from Theorem 2 one may obtain that any iufst and niufst accepting Ln,R must use at least ps states, regardless the number of performed sweeps. In addition, as recalled in Section 2, we remark thatP n states are neces- s i sary and sufficient for dfas and nfas to accept Ln,R , while i=1 pαi states are necessary for two-way dfas and nfas, and one-way isolated cut point pfas. 3.2 Unary Finite Languages For any positive integer `, let L` be any unary language whose longest word has length `. In what follows, we assume ` ≥ 2. The case ` = 1 can be trivially managed by a 2-state dfa seen as a transducer. Qr Theorem 5. Let ` ≥ 2 factorize as ` = i=1 %βi i ,P with βi > 0. The language L` r can be accepted by a (2%r )-state t-iufst with t = i=1 βi sweeps. Any classical finite-state automata for L` needs at least ` states. By a pump- ing argument as in Theorem 2, iterated transduction needs at least %r states. Iterated Transduction on Unary Regular Languages 5 3.3 General Unary Regular Languages Finally, let us put things together. As addressed in Section 2, a general (infinite) unary regular language consists of the disjoint union of a (possibly empty) fi- nite pre-periodic language and a ultimately periodic language. Hence, it can be defined by three parameters `, n, and R ⊆ Zn as L`,n,R = L` ∪ { 0`+c·n+R | c ≥ 0 and R ∈ R }, where, with a slight abuse with respect to notation in Section 3.2, we intend L` as a finite unary language that is accepted by an `-state dfa. So, differently from Section 3.2, from now on L` does not necessarily contain the unary word of length `. In the following theorem, we consider `, n ≥ 2. Otherwise, we have languages that can be trivially dealt with by our constructions in this paper. Qs Qr βi Theorem 6. Let `, n ≥ 2 factorize as n = i=1 pα i , ` = i i=1 %i , αi , βi > 0. The unary language L`,n,R can be accepted by an x-state t-iufst, Ps where Pr x = max {2%r , ps } + ξ, ξ = 0 if %r < ps and 1 otherwise, and t = i=1 αi + i=1 βi . By adapting the pumping argument in Theorem 2, we get that not less than max{%r , ps } states can be used to accept L`,n,R on iterated transducers. On the other hand, as recalled in Section 2, ` + n P states are necessary and sufficient for s a dfa to accept L`,n,R , and not less than i=1 pα i states on two-way dfas and i nfas, and on isolated cut-point pfas are needed. Acknowledgements. The authors wish to thank the anonymous referees. References 1. Aho, A.V., Lam, M.S-L., Sethi, R., Ullman, J.D.: Compilers: principles, techniques, and tools, 2 Ed., Addison-Wesley (2006) 2. Bednárová, Z., Geffert, V., Mereghetti, C., Palano, B.: Removing nondeterminism in constant height pushdown automata. In: DCFS 2012. LNCS 7386, pp. 76–88. Springer (2012) 3. Bednárová, Z., Geffert, V., Mereghetti, C., Palano, B.: Boolean language operations on nondeterministic automata with a pushdown of constant height. In: CSR 2013. LNCS 7913, pp. 100–111. Springer (2013) 4. Bednárová, Z., Geffert, V., Mereghetti, C., Palano, B.: Removing nondeterminism in constant height pushdown automata. Inf. Comp. 237, 257–267 (2014) 5. Bednárová, Z., Geffert, V., Mereghetti, C., Palano, B.: Boolean language operations on nondeterministic automata with a pushdown of constant height. J. Comp. Sys. Sci. 90, 99–114 (2017) 6. Bertoni, A., Mereghetti, C., Palano, B.: Trace monoids with idempotent generators and measure-only quantum automata. Nat. Comp. 9, 383–395 (2010) 7. Bianchi, M.P., Mereghetti, C., Palano, B.: On the power of one-way automata with quantum and classical states. In: CIAA 2014. LNCS 8587, pp. 84–97. Springer (2014) 8. Bianchi, M.P., Mereghetti, C., Palano, B.: Size lower bounds for quantum au- tomata. Th. Comp. Sci. 551, 102–115 (2014) 6 M. Kutrib, A. Malcher, C. Mereghetti, B. Palano 9. Bianchi, M.P., Mereghetti, C., Palano, B.: Quantum finite automata: Advances on Bertoni’s ideas. Th. Comp. Sci. 664, 39–53 (2017) 10. Bianchi, M.P., Mereghetti, C., Palano, B., Pighizzini, G.: On the size of unary probabilistic and nondeterministic automata. Fund. Inf. 112, 119–135 (2011) 11. Bordihn, H., Fernau, H., Holzer, M., Manca, V., Martı́n-Vide, C.: Iterated sequen- tial transducers as language generating devices. Th. Comp. Sci. 369, 67–81 (2006) 12. Choffrut, C., Malcher, A., Mereghetti, C., Palano, B.: On the expressive power of FO[+]. In: LATA 2010. LNCS 6031, pp. 190–201. Springer (2010) 13. Choffrut, C., Malcher, A., Mereghetti, C., Palano, B.: First-order logics: some characterizations and closure properties. Acta Inf. 49, 225–248 (2012) 14. Chrobak, M.: Finite automata and unary languages. Th. Comp. Sci. 47, 149–158 (1986) – Corrigendum, ibid 302, 497–498, (2003) 15. Citrini, C., Crespi-Reghizzi, S., Mandrioli, D.: On deterministic multi-pass analysis. SIAM J. Comput. 15, 668–693 (1986) 16. Feletti, C., Mereghetti, C., Palano, B.: Uniform circle formation for swarms of opaque robots with lights. In: SSS 2018. LNCS 11201, pp. 317–332. Springer (2018) 17. Friburger, N., Maurel, D.: Finite-state transducer cascades to extract named enti- ties in texts. Th. Comp. Sci. 313, 93–104 (2004) 18. Ginzburg, A.: Algebraic theory of automata. Academic Press (1968) 19. Jakobi, S., Meckel, K., Mereghetti, C., Palano, B.: Queue automata of constant length. In: DCFS 2013. LNCS 8031, pp. 124–135. Springer (2013) 20. Kutrib, M., Malcher, A., Mereghetti, C., Palano, B.: Descriptional complexity of iterated uniform finite-state transducers. In: DCFS 2019. LNCS 1612, pp. 223–234. Springer (2019) 21. Kutrib, M., Malcher, A., Mereghetti, C., Palano, B.: Iterated uniform finite-state transducers. In: ICTCS 2019. CEUR WORKSHOP PROCEEDINGS, vol. 2504, pp. 52–57. CEUR-WS.org (2019) 22. Kutrib, M., Malcher, A., Mereghetti, C., Palano, B.: Iterated uniform finite-state transducers: Descriptional complexity of nondeterminism and two-way motion. In: DCFS 2020. LNCS 12442, pp. 117–129. Springer (2020) 23. Kutrib, M., Malcher, A., Mereghetti, C., Palano, B.: Deterministic and nondeter- ministic iterated uniform finite-state transducers: Computational and descriptional power. In: CiE 2020. LNCS 12098, pp. 87–99. Springer (2020) 24. Kutrib, M., Malcher, A., Mereghetti, C., Palano, B.: Iterated uniform finite-state transducers on unary languages In: SOFSEM 2021. LNCS 12607, pp. 218–232. Springer (2021) 25. Kutrib, M., Malcher, A., Mereghetti, C., Palano, B.: Descriptional complexity of iterated uniform finite-state transducers. Inf. Comp. (In press) 26. Manca, V.: On the generative power of iterated transductions. In: Words, Semi- groups, and Transductions. pp. 315–327. World Scientific (2001) 27. Mereghetti, C.: Testing the descriptional power of small Turing machines on non- regular language acceptance. Int. J. Found. Comput. Sci. 19, 827–843 (2008) 28. Mereghetti, C., Palano, B.: The complexity of minimum difference cover. J. Discr. Alg. 4, 239–254 (2006) 29. Mereghetti, C., Palano, B., Pighizzini, G.: Note on the succinctness of determinis- tic, nondeterministic, probabilistic and quantum finite automata. Th. Inf. Appl. 35, 477–490 (2001) 30. Mereghetti, C., Pighizzini, G.: Two–way automata simulations and unary lan- guages. J. Autom., Lang. Comb. 5, 287–300 (2000) 31. Mereghetti, C., Pighizzini, G.: Optimal simulations between unary automata. SIAM J. Comput. 30, 1976–1992 (2001)