Generalized #-Rewriting Systems of Finite Index Zbyněk Křivka and Alexander Meduna Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Božetěchova 1, Brno 612 66, Czech Republic, Europe krivka@fit.vutbr.cz, meduna@fit.vutbr.cz Abstract. This paper discusses a generalized version of #-rewriting systems with context rules. It demonstrates that this context-based gen- eralization does not affect the generative power of #-rewriting systems of finite index. A new characterization of the infinite hierarchy of language families generated by programmed grammars of finite index is obtained. Keywords: #-rewriting systems, context-based generalization, infinite hierarchy, finite index. 1 Introduction In the formal language theory, there exist language-defining devices having fea- tures of both grammars and automata (see [1], [3], and [6]). Recently, this theory has introduced another device of this kind—#-rewriting systems (see [4]). In- deed, on the one hand, like grammars, they are generative devices. On the other hand, like automata, they use finitely many states. Recall that these systems of finite index characterize the well-known infinite hierarchy of language families resulting from programmed grammars of finite index (see Theorems 3.1.2i and 3.1.7 in [2]). The original version of #-rewriting systems is based upon rules of the form p i# → q γ, where p, q are states, i is a positive integer, and γ is a non-empty string. By using this rule, the system rewrites ith # with γ and, simultaneously, changes the current state p to q. In the present paper, we discuss a generalized version of #-rewriting systems that uses rules of the form p iα#β → q αγβ, where α and β are strings and the other symbols have the same meaning as above. This generalized rule is applicable to # if this # occurs in the α-β context; in other aspects, the application is analogical to the original version. As its main result, this paper demonstrates that the generalization under discussion does not affect the generative power of #-rewriting systems of finite index, so we obtain an alternative characterization of the infinite hierarchy of language families generated by programmed grammars of finite index (see [4], and Theorems 3.1.2i and 3.1.7 in [2]). This result is of some interest when compared, for instance, to a similar gen- eralization in terms of the classical Chomsky Hierarchy, in which grammars with the generalized rules are much stronger than ordinary context-free grammars. 198 Zbyněk Křivka and Alexander Meduna On the other hand, notice that the language family generated by #-rewriting systems of finite index is incomparable with the family of context-free languages. More specifically, a context-free Dyck language is not generated by any #- rewriting systems of finite index and vice versa Example 1 shows that the family of languages generated by #-rewriting systems of finite index contains at least one non-context-free language. 2 Preliminaries This paper assumes that the reader is familiar with the formal language theory (see [5], [7]). For an alphabet V , V ∗ represents the free monoid generated by V under the operation of concatenation. The identity of V ∗ is denoted by ε. Set V + = V ∗ − {ε}; algebraically, V + is thus the free semigroup generated by V under the operation of concatenation. For w ∈ V ∗ , |w| denotes the length of w, and for W ⊆ V , occur(w, W ) denotes the number of occurrences of symbols from W in w. For every i ≥ 0, suffix (w, i) is w’s suffix of length i if |w| ≥ i, and suffix (w, i) = w if i ≥ |w| + 1. Let suffixes (w) = {suffix (w, j) | 0 ≤ j ≤ |w|}. For every i ≥ 0, prefix (w, i) is w’s prefix of length i if |w| ≥ i, and prefix (w, i) = w if i ≥ |w| + 1. Let prefixes (w) = {prefix (w, j) | 0 ≤ j ≤ |w|}. Let Ψ denote the set of all non-negative integers and let m ∈ Ψ . Set I = Ψ − {0}. Let K ⊆ Ψ be a finite set. Define max(K) as the smallest integer k such that for all h ∈ K, k ≥ h. Define min(K) as the greatest integer k such that for all h ∈ K, k ≤ h. For a subset S of partially ordered set (Ψ, ≤), a supremum of S is an element p ∈ Ψ such that x ≤ p for all x ∈ S, and for any q ∈ Ψ such that x ≤ q for all x ∈ S it holds that p ≤ q. The infimum of the subset S is an element p ∈ Ψ such that p ≤ x for all x ∈ S, and for any q ∈ Ψ such that q ≤ x for all x ∈ S it holds that q ≤ p. If the supremum of a set S exists, it is denoted as sup(S); infimum is denoted by inf(S). A context-free grammar is a quadruple, G = (V, T, P, S), where V is a total alphabet, T ⊆ V is an alphabet of terminals, S ∈ (V − T ) is the start symbol, and P is a finite set of rules of the form q: A → v, where A ∈ (V −T ), v ∈ V ∗ and q is a label of this rule. If q: A → v ∈ P , x, y ∈ V ∗ , G makes a derivation step from xAy to xvy according to q: A → v, symbolically written as xAy ⇒ xvy [q] or, simply, xAy ⇒ xvy. In the standard manner, we define ⇒m , where m ≥ 0, ⇒+ , and ⇒∗ . To express that G makes x ⇒m y, where x, y ∈ V ∗ , by using a sequence of rules q1 , q2 , . . . , qm , we symbolically write x ⇒m y [q1 q2 . . . qm ]. The language of G, L(G), is defined as L(G) = {w ∈ T ∗ | S ⇒∗ w}. A language, L, is context-free if and only if L = L(G), where G is a context-free grammar. A programmed grammar (see page 28 in [2]) is a quadruple, G = (V, T, P, S), where V is a total alphabet, T ⊆ V is an alphabet of terminals, S ∈ (V − T ) is the start symbol, and P is a finite set of rules of the form q: A → v, g(q), where q: A → v is a context free rule labeled by q and g(q) is a set of rule labels Generalized #-Rewriting Systems of Finite Index 199 associated with this rule. After a derivation step, symbolically denoted by ⇒, according a rule of this form in an ordinary context-free way, in the next step a rule labeled by a label from g(q) has to be applied. In the standard manner, we define ⇒m , where m ≥ 0, ⇒+ , and ⇒∗ . The language of G, L(G), is defined as L(G) = {w ∈ T ∗ | S ⇒∗ w}. Let G be a grammar, and let T and S be its terminal alphabet and start symbol, respectively. For a derivation D: S = w1 ⇒ w2 ⇒ . . . ⇒ wr = w ∈ T ∗ , r ≥ 1, according to G, we set Ind(D, G) = max({occur(wi , V − T ) | 1 ≤ i ≤ r}), and for w ∈ T ∗ , we define Ind(w, G) = min({Ind(D, G) | D is a derivation for w in G}). The index of grammar G (see page 151 in [2]) is defined as Ind(G) = sup {Ind(w, G) | w ∈ L(G)}. For a language L in the family L(X) of languages generated by grammars of some type X, we define IndX (L) = inf {Ind(G) | L(G) = L, G is of type X}. For a family L(X), we set Ln (X) = {L | L ∈ L(X) and IndX (L) ≤ n}, n ≥ 1. 3 Definitions A generalized #-rewriting system (G#RS) is a quadruple H = (Q, Σ, s, R), where Q is a finite set of states, Σ is an alphabet containing # called a bounder, Q ∩ Σ = ∅, s ∈ Q is a start state and R ⊆ Q × I × Σ + × Q × Σ ∗ is a finite relation whose members are called rules. A rule is usually written as r: p iα#β → q αγβ ∈ R hereafter, where r is its unique label, i ∈ I, q, p ∈ Q, and α, β, γ ∈ Σ ∗ . A configuration of H is a pair from Q × Σ ∗ . Let χ denote the set of all configurations of H. Let puα#βv, quαγβv ∈ χ be two configurations, p, q ∈ Q, u, v, α, β, γ ∈ Σ ∗ , i ∈ I and occur(uα, #) = i − 1. Then, H makes a computa- tional step from puα#βv to quαγβv by using r: p iα#β → q αγβ, symbolically written puα#βv i⇒ quαγβv [r] in H or puα#βv ⇒ quαγβv [r] in H when po- sition is not relevant or simply puα#βv ⇒ quαγβv when the applied rule is immaterial. In the standard manner, we extend ⇒ to ⇒m and j ⇒ to j ⇒m , for m ≥ 0; then, based on ⇒m and j ⇒m , we define ⇒+ , ⇒∗ , j ⇒+ , and j ⇒∗ in the standard way. The language generated by G#RS H, L(H), is defined as L(H) = {w | s# ⇒∗ qw, q ∈ Q, w ∈ (Σ − {#})∗ }. As special case of G#RS, if every r: p iα#β → q αγβ ∈ R satisfies that α = β = ε then H is called context-free #-rewriting system (CF#RS). Let k be a positive integer. H is of index k if for every configuration x ∈ χ, s# ⇒∗ qy = x implies occur(y, #) ≤ k. Notice that H of index k cannot derive a string containing more than k #s. For G#RS H, maxL (H) and maxR (H) denote the maximum length of left- hand and right-hand side of rules, respectively. Precisely, let H = (Q, Σ, s, R) 200 Zbyněk Křivka and Alexander Meduna be a G#RS, maxL (H) = max({|α| | p iα → q β ∈ R}) and maxR (H) = max({|β| | p iα → q β ∈ R}). Let k be a positive integer. Lk (G#RS), Lk (CF #RS), and Lk (P, CF ) denote the families of languages generated by generalized #-rewriting systems of index k, context-free #-rewriting systems of index k, and derived by programmed grammars of index k, respectively. L(G#RS) and L(CF #RS) denote the family of languages generated by generalized #-rewriting systems and context-free #- rewriting systems, respectively. Example 1. CF#RS H1 = ({s, p, q, f }, {a, b, c, #}, s, R1 ), where R1 contains 1: s 1# → p ## 2: p 1# → q a#b 3: q 2# → p c# 4: p 1# → f ab 5: f 1# → f c For instance, H1 computes aabbcc as s# ⇒ p## [1] ⇒ qa#b# [2] ⇒ pa#bc# [3] ⇒ f aabbc# [4] ⇒ f aabbcc [5]. Example 2. G#RS H2 = ({s, p, q}, {a, b, c, #}, s, R2 ), where R2 contains 1: s 1# → s a## 2: s 2a## → p a#b#c 3: p 1a# → q aa# 4: q 2b#c → p bb#cc 5: p 1a# → p a 6: p 1b#c → p bc For instance, H2 computes aabbcc as s# ⇒ sa## [1] ⇒ pa#b#c [2] ⇒ qaa#b#c [3] ⇒ paa#bb#cc [4] ⇒ paabb#cc [5] ⇒ paabbcc [6]. Obviously, H1 and H2 are of index 2. Both systems describe the same lan- guage L(H1 ) = L(H2 ) = {an bn cn | n ≥ 1} which belongs into L(CF #RS) − L(CF ). 4 Results The first part of the section establishes an infinite hierarchy of language fami- lies resulting from the generalized #-rewriting systems defined in the previous section. Throughout this section, we only describe the construction part of the proof, leaving the complete version of this proof to the reader. Lemma 1. For every k ≥ 1, Lk (CF #RS) = Lk (G#RS). Generalized #-Rewriting Systems of Finite Index 201 Proof. We only need to prove that Lk (G#RS) ⊆ Lk (CF #RS). Construction. Let k ≥ 1 be a positive integer. Let H = (Q, T ∪ {#}, s, R) be a generalized #-rewriting system of index k, where Σ = T ∪ {#}, # ∈ / T . Let µ = max({maxL (H), maxR (H)}) and let $ be a new symbol, $ ∈ / Q∪Σ. We construct the context-free #-rewriting system of index k, H 0 = (Q0 , T ∪ {#}, s0 , R0 ), where components Q0 , s0 , and R0 are constructed as follows: 1. s0 := hs, #i; 2. Q0 := {hp, y0 #y1 # . . . yh−1 #yh i | p ∈ Q, 1 ≤ h ≤ k, yj = yj0 ∇j yj00 , yj0 , yj00 ∈ T ∗ , ∇j ∈ {ε, $}, |yj0 | ≤ 2µ, |yj00 | ≤ 2µ, 0 ≤ j ≤ h}; 3. R0 := {hp, x0 # . . . #x0j ∇j x00j # . . . #xh i 1# → hp, x0 # . . . #yj0 $yj00 # . . . #xh i # | hp, x0 # . . . #x0j ∇j x00j # . . . #xh i, hp, x0 # . . . #yj0 $yj00 # . . . #xh i ∈ Q0 , ∇j ∈ {ε, $}, yj0 ∈ prefixes (x0j ), yj00 ∈ suffixes (x00j ), xj ∈ T ∗ , 1 ≤ h ≤ k, 0 ≤ j ≤ h, where x00 = y00 = x00h = yh00 = ε}; 4. For every rule r: p iα#β → q αγβ ∈ R, add the following set to R0 : {hp, η 0 α#βη 00 i i# → hq, η 0 αγβη 00 i γ | occur(η 0 α, #) = i − 1, hp, η 0 α#βη 00 i, hq, η 0 αγβη 00 i ∈ Q0 }. Basic Idea. By several computational steps, H 0 simulates a single step in H. Inside of every state of the form hp, ηi occurring in a configuration of H 0 , it is recorded (1) p—the current state of H; (2) η—the context string of H’s current configuration that represents context of each # of length at most µ on both sides of #. The simulation is done in two steps by rules introduced in the third and forth step of the above construction as follows. (a) Each substring of terminals between two #s can be non-deterministically shortened by the rules constructed in 3 and the position of the shortening is marked by $ to reflect this in the subsequent context checks. This shortening is necessary only to make enough space in the context string for the rule application in the next step. (b) By each of H’s generalized rules is rewritten only one # with respect to its surrounding left and right context. In this step, the rules of the context- free form constructed in 4 are applied and, thereby, simulate H’s behavior thanks to context checks, which are managed by the second component of states in H 0 . The simulation is completed when the string of terminals is obtained. 202 Zbyněk Křivka and Alexander Meduna Let h be a homomorphism from a configuration of context-free #-rewriting system to corresponding configuration of generalized #-rewriting system defined as h(hp, ηiz) = pz. Claim 1. If s# ⇒m pz0 #z1 # . . . #zh in H, then hs, #i# ⇒r hp, x0 #x1 # . . . #xh i z0 #z1 . . . #zh [r10 r20 . . . rr0 ] in H 0 where xi = x0i ∇i x00i , ∇i ∈ {ε, $}, x0i ∈ prefixes (zi ), x00i ∈ suffixes (zi ), |xi | ≤ 4µ + 1, xi , zi ∈ T ∗ , 0 ≤ i ≤ h, 1 ≤ h ≤ k, for m ≥ 0. Proof. This claim is established by induction on m. Basis: Let m = 0. For s# ⇒0 s# in H there exists s0 # ⇒0 s0 # in H 0 , where s0 = hs, #i. Induction Hypothesis: Suppose that Claim 1 holds for all computational steps of length m or less for some m ≥ 0. Induction Step: Consider s# ⇒m+1 qz 0 , where z 0 ∈ Σ ∗ . Express s# ⇒m+1 qz 0 as s# ⇒m pz [r1 r2 . . . rm ], where z = z0 #z1 # . . . α#β . . . #zh , # between α and β is the ith bounder and r1 , . . . , rm , rm+1 ∈ lab(R), and pz ⇒ qz 0 [rm+1 ]. For rm+1 : p iα#β → q αγβ is z 0 of the form: z 0 = z0 #z1 # . . . αγβ . . . #zh , for z0 , . . . , z h ∈ T ∗ . Based on the induction hypothesis, there exists hs, #i# ⇒r hp, x0 #x1 # . . . α#β . . . #xh i z0 # . . . α#β . . . zh−1 #zh [r10 r20 . . . rr0 ] ⇒∗ hp, y0 #y1 # . . . α#β . . . #yh i z0 # . . . α#β . . . zh−1 #zh [ρ] ⇒ hq, y0 #y1 # . . . αγβ . . . #yh i z0 # . . . αγβ . . . zh−1 0 #zh [rr+1 ], where ρ is a sequence of rules constructed in step 3, |ρ| ≥ 0, r ≥ 1, rj ∈ lab(R0 ), 1 ≤ j ≤ r + 1, xj , yj ∈ T ∗ {ε, $}T ∗ , zj ∈ T ∗ , 0 ≤ j ≤ h, 0 occur(y0 #y1 # . . . α, #) = occur(z0 #z1 # . . . α, #) = i−1, α = x̄i−ᾱ #xi−ᾱ+1 . . . xi−2 #xi−1 , ᾱ = occur(α, #) + 1, x̄i−ᾱ ∈ suffixes (xi−ᾱ ), β = xi #xi+1 . . . xi+β̄−1 0 #x̄i+β̄ , β̄ = occur(β, #), x̄i+β̄ ∈ prefixes (xi+β̄ ), and rr+1 : hp, y0 #y1 # . . . α#β . . . 0 #yh i i# → hq, y0 #y1 # . . . αγβ . . . #xh i γ ∈ R is introduced by step 4. Therefore, h(hp, y0 #y1 # . . . α#β . . . #yh i z0 # . . . α#β . . . zh−1 #zh ) = pz and h(hq, y0 #y1 # . . . αγβ . . . #yh i z0 # . . . αγβ . . . zh−1 #zh ) = qz 0 , so the claim holds. Claim 2. If s# ⇒z pw in H, then hs, #i# ⇒∗ hp, ηiw in H for some z ≥ 0, w ∈ T ∗. Proof. Consider Claim 1 for h = 0. At this point, if s# ⇒z pz0 , then hs, #i# ⇒∗ hp, x0 iz0 and so z0 = w. t u Theorem 1. For every k ≥ 1, Lk (G#RS) ⊂ Lk+1 (G#RS). Proof. Lk (G#RS) = Lk (CF #RS) follows from Lemma 1. Recall that Lk (P, CF ) = Lk (CF #RS) (see [4]) and Lk (P, CF ) ⊂ Lk+1 (P, CF ) for every k ≥ 1 (see Theorems 3.1.2i and 3.1.7 in [2]). Thus, Theorem 1 holds. t u Generalized #-Rewriting Systems of Finite Index 203 5 Conclusion The present paper has discussed generalized language-defining device that repre- sents a combination of both automata and grammars. This device characterizes well-known infinite hierarchy of formal language families in a very natural way. Consequently, it is obviously closely related to some classical results about formal languages, on which it sheds light in an alternative way. Therefore, this paper suggests its further investigation in the future. Specifically, this investigation should pay a special attention to the following open problem areas: Unrestricted Rules. Observe that Theorem 1 does not hold if the rules are of the form p iα → q β, where p, q ∈ Q, i ∈ I, α, β ∈ Σ ∗ , occur(α, #) ≥ 1 and computation step is defined as follows: Let puα0 #α00 v, quβv be two configurations, p, q ∈ Q, u, v ∈ Σ ∗ , i ∈ I and occur(uα0 , #) = i − 1. Then, #-rewriting system with unrestricted rules makes a computational step from puα0 #α00 v to quβv by using the unrestricted rule r: p iα0 #α00 → q β. Let us demonstrate this observation on the following example. Example 3. In [2], Theorem 3.1.7 proves that Lk ∈ Lk (CF #RS) and Lk ∈ / Lk−1 (CF #RS), where Lk = {b(ai b)2k | i ≥ 1}. Consider G#RS with unrestricted rules H = ({s, s0 , p, q, r, r0 , f }, {a, b, #}, s, R), where R contains 1: s 1# → s0 ##a## 2: s0 2# → s0 #a 3: s0 1# → p # 4: p 2#a → q a# 5: q 3# → p #a 6: p 2## → r ## 7: r 1# → r0 b 8: r0 3# → p ## 9: p 1## → f b 10: f 1## → f b For instance, H computes baabaabaab = b(a2 b)3 as s# ⇒ s0 ##a## [1] ⇒ s0 ##aa## [2] ⇒ p##aa## [3] ⇒ q#a#a## [4] ⇒ p#a#a#a#[5] ⇒ q#aa##a# [4] ⇒ p#aa##aa# [5] ⇒ r#aa##aa# [6] ⇒ r0 baa##aa# [7] ⇒ pbaa##aa## [8] ⇒ qbaa#a#a## [4] ⇒ pbaa#a#a#a# [5] ⇒ qbaa#aa##a# [4] ⇒ pbaa#aa##aa# [5] ⇒ rbaa#aa##aa# [6] ⇒ r0 baabaa##aa# [7] ⇒ pbaabaa##aa## [8] ⇒ f baabaabaa## [9] ⇒ f baabaabaab [10]. It is obvious that H is of index 4 and L(H) = {b(ai b)j | i, j ≥ 1} ∈ L4 (G#RS with unrestricted rules). Thus, this example shows that for every k ≥ 1, there is a language L such that L ∈ / Lk (CF #RS) and L is generated by a generalized #-rewriting system with unrestricted rules. 204 Zbyněk Křivka and Alexander Meduna We believe that there is no infinite hierarchy based on finite index in case of generalized #-rewriting systems with unrestricted rules. More specifically, every generalized #-rewriting system with unrestricted rules can be transformed into equivalent generalized #-rewriting system with unrestricted rules of index 1. However, we have not proved this conjecture so far. Unlimited Index. Consider #-rewriting systems that are of unlimited index. What is the language family defined by them. What is the relation between L(CF #RS) and L(G#RS)? Determinism. This paper has discussed a general version of #-rewriting systems, which work non-deterministically. Of course, these systems can be restricted so they work deterministically. What is the language family defined by these deterministic systems? Formally, a generalized #-rewriting system M is deterministic if for every left-hand side of M ’s rule r, lhs(r), holds that card({lhs(r) | r ∈ R}) ≤ 1. Acknowledgement. This work was supported by the Czech Science Foundation 201/07/0005 grant. References 1. H. Borodihn, H. Fernau, Accepting grammars and systems: an overview. In: Proc. of Development in Language Theory Conf., Magdeburg, 1995, pp. 199-208. 2. J. Dassow, G. Păun, Regulated Rewriting in Formal Language Theory. Springer, New York, 1989, 308 p., ISBN 0-38751-414-7. 3. T. Kasai, A Hierarchy Between Context-Free and Context-Sensitive Languages. In: Journal of Computer and System Sciences 4, 1970, pp. 492-508. 4. Z. Křivka, A. Meduna, R. Schönecker, Generation of Languages by Rewriting Sys- tems that Resemble Automata. In: International Journal of Foundations of Com- puter Science Vol. 17, No. 5, 2006, pp. 1223-1229. 5. A. Meduna, Automata and Languages: Theory and Applications. Springer, London, 2000, 916 p., ISBN 1-85233-074-0. 6. E. Moriya, D. Hofbauer, M. Huber, F. Otto, On state-alternating context-free gram- mars. In: Theoretical Computer Science 337, 2005, p. 183-216. 7. G. Rozendberg, A. Salomaa (eds.), Handbook of Formal Languages: Word, Lan- guage, Grammar, Volume 1. Springer, Berlin, 1997, 873 p., ISBN 3-540-60420-0.