Generalized #-Rewriting Systems of Finite
                       Index

                       Zbyněk Křivka and Alexander Meduna

     Department of Information Systems, Faculty of Information Technology, Brno
    University of Technology, Božetěchova 1, Brno 612 66, Czech Republic, Europe
                     krivka@fit.vutbr.cz, meduna@fit.vutbr.cz


        Abstract. This paper discusses a generalized version of #-rewriting
        systems with context rules. It demonstrates that this context-based gen-
        eralization does not affect the generative power of #-rewriting systems of
        finite index. A new characterization of the infinite hierarchy of language
        families generated by programmed grammars of finite index is obtained.

        Keywords: #-rewriting systems, context-based generalization, infinite
        hierarchy, finite index.


1     Introduction
In the formal language theory, there exist language-defining devices having fea-
tures of both grammars and automata (see [1], [3], and [6]). Recently, this theory
has introduced another device of this kind—#-rewriting systems (see [4]). In-
deed, on the one hand, like grammars, they are generative devices. On the other
hand, like automata, they use finitely many states. Recall that these systems of
finite index characterize the well-known infinite hierarchy of language families
resulting from programmed grammars of finite index (see Theorems 3.1.2i and
3.1.7 in [2]).
    The original version of #-rewriting systems is based upon rules of the form
p i# → q γ, where p, q are states, i is a positive integer, and γ is a non-empty
string. By using this rule, the system rewrites ith # with γ and, simultaneously,
changes the current state p to q. In the present paper, we discuss a generalized
version of #-rewriting systems that uses rules of the form p iα#β → q αγβ,
where α and β are strings and the other symbols have the same meaning as
above. This generalized rule is applicable to # if this # occurs in the α-β context;
in other aspects, the application is analogical to the original version.
    As its main result, this paper demonstrates that the generalization under
discussion does not affect the generative power of #-rewriting systems of finite
index, so we obtain an alternative characterization of the infinite hierarchy of
language families generated by programmed grammars of finite index (see [4],
and Theorems 3.1.2i and 3.1.7 in [2]).
    This result is of some interest when compared, for instance, to a similar gen-
eralization in terms of the classical Chomsky Hierarchy, in which grammars with
the generalized rules are much stronger than ordinary context-free grammars.
198      Zbyněk Křivka and Alexander Meduna

    On the other hand, notice that the language family generated by #-rewriting
systems of finite index is incomparable with the family of context-free languages.
More specifically, a context-free Dyck language is not generated by any #-
rewriting systems of finite index and vice versa Example 1 shows that the family
of languages generated by #-rewriting systems of finite index contains at least
one non-context-free language.


2     Preliminaries
This paper assumes that the reader is familiar with the formal language theory
(see [5], [7]). For an alphabet V , V ∗ represents the free monoid generated by V
under the operation of concatenation. The identity of V ∗ is denoted by ε. Set
V + = V ∗ − {ε}; algebraically, V + is thus the free semigroup generated by V
under the operation of concatenation. For w ∈ V ∗ , |w| denotes the length of
w, and for W ⊆ V , occur(w, W ) denotes the number of occurrences of symbols
from W in w. For every i ≥ 0, suffix (w, i) is w’s suffix of length i if |w| ≥ i, and
suffix (w, i) = w if i ≥ |w| + 1. Let suffixes (w) = {suffix (w, j) | 0 ≤ j ≤ |w|}. For
every i ≥ 0, prefix (w, i) is w’s prefix of length i if |w| ≥ i, and prefix (w, i) = w
if i ≥ |w| + 1. Let prefixes (w) = {prefix (w, j) | 0 ≤ j ≤ |w|}.
     Let Ψ denote the set of all non-negative integers and let m ∈ Ψ . Set I = Ψ −
{0}. Let K ⊆ Ψ be a finite set. Define max(K) as the smallest integer k such
that for all h ∈ K, k ≥ h. Define min(K) as the greatest integer k such that for
all h ∈ K, k ≤ h.
     For a subset S of partially ordered set (Ψ, ≤), a supremum of S is an element
p ∈ Ψ such that x ≤ p for all x ∈ S, and for any q ∈ Ψ such that x ≤ q for all
x ∈ S it holds that p ≤ q.
     The infimum of the subset S is an element p ∈ Ψ such that p ≤ x for all
x ∈ S, and for any q ∈ Ψ such that q ≤ x for all x ∈ S it holds that q ≤ p.
     If the supremum of a set S exists, it is denoted as sup(S); infimum is denoted
by inf(S).
    A context-free grammar is a quadruple, G = (V, T, P, S), where V is a total
alphabet, T ⊆ V is an alphabet of terminals, S ∈ (V − T ) is the start symbol,
and P is a finite set of rules of the form q: A → v, where A ∈ (V −T ), v ∈ V ∗ and
q is a label of this rule. If q: A → v ∈ P , x, y ∈ V ∗ , G makes a derivation step
from xAy to xvy according to q: A → v, symbolically written as xAy ⇒ xvy [q]
or, simply, xAy ⇒ xvy. In the standard manner, we define ⇒m , where m ≥ 0,
⇒+ , and ⇒∗ . To express that G makes x ⇒m y, where x, y ∈ V ∗ , by using a
sequence of rules q1 , q2 , . . . , qm , we symbolically write x ⇒m y [q1 q2 . . . qm ]. The
language of G, L(G), is defined as L(G) = {w ∈ T ∗ | S ⇒∗ w}. A language, L,
is context-free if and only if L = L(G), where G is a context-free grammar.
    A programmed grammar (see page 28 in [2]) is a quadruple, G = (V, T, P, S),
where V is a total alphabet, T ⊆ V is an alphabet of terminals, S ∈ (V − T )
is the start symbol, and P is a finite set of rules of the form q: A → v, g(q),
where q: A → v is a context free rule labeled by q and g(q) is a set of rule labels
                            Generalized #-Rewriting Systems of Finite Index       199

associated with this rule. After a derivation step, symbolically denoted by ⇒,
according a rule of this form in an ordinary context-free way, in the next step a
rule labeled by a label from g(q) has to be applied. In the standard manner, we
define ⇒m , where m ≥ 0, ⇒+ , and ⇒∗ . The language of G, L(G), is defined as
L(G) = {w ∈ T ∗ | S ⇒∗ w}.
     Let G be a grammar, and let T and S be its terminal alphabet and start
symbol, respectively. For a derivation D: S = w1 ⇒ w2 ⇒ . . . ⇒ wr = w ∈
T ∗ , r ≥ 1, according to G, we set Ind(D, G) = max({occur(wi , V − T ) | 1 ≤
i ≤ r}), and for w ∈ T ∗ , we define Ind(w, G) = min({Ind(D, G) | D is a
derivation for w in G}). The index of grammar G (see page 151 in [2]) is defined
as Ind(G) = sup {Ind(w, G) | w ∈ L(G)}. For a language L in the family L(X)
of languages generated by grammars of some type X, we define IndX (L) =
inf {Ind(G) | L(G) = L, G is of type X}. For a family L(X), we set Ln (X) =
{L | L ∈ L(X) and IndX (L) ≤ n}, n ≥ 1.


3    Definitions

A generalized #-rewriting system (G#RS) is a quadruple H = (Q, Σ, s, R),
where Q is a finite set of states, Σ is an alphabet containing # called a bounder,
Q ∩ Σ = ∅, s ∈ Q is a start state and R ⊆ Q × I × Σ + × Q × Σ ∗ is a finite
relation whose members are called rules. A rule is usually written as r: p iα#β →
q αγβ ∈ R hereafter, where r is its unique label, i ∈ I, q, p ∈ Q, and α, β, γ ∈ Σ ∗ .
    A configuration of H is a pair from Q × Σ ∗ . Let χ denote the set of all
configurations of H. Let puα#βv, quαγβv ∈ χ be two configurations, p, q ∈ Q,
u, v, α, β, γ ∈ Σ ∗ , i ∈ I and occur(uα, #) = i − 1. Then, H makes a computa-
tional step from puα#βv to quαγβv by using r: p iα#β → q αγβ, symbolically
written puα#βv i⇒ quαγβv [r] in H or puα#βv ⇒ quαγβv [r] in H when po-
sition is not relevant or simply puα#βv ⇒ quαγβv when the applied rule is
immaterial.
    In the standard manner, we extend ⇒ to ⇒m and j ⇒ to j ⇒m , for m ≥ 0;
then, based on ⇒m and j ⇒m , we define ⇒+ , ⇒∗ , j ⇒+ , and j ⇒∗ in the standard
way.
    The language generated by G#RS H, L(H), is defined as

               L(H) = {w | s# ⇒∗ qw, q ∈ Q, w ∈ (Σ − {#})∗ }.

   As special case of G#RS, if every r: p iα#β → q αγβ ∈ R satisfies that
α = β = ε then H is called context-free #-rewriting system (CF#RS).
    Let k be a positive integer. H is of index k if for every configuration x ∈ χ,
s# ⇒∗ qy = x implies occur(y, #) ≤ k. Notice that H of index k cannot derive
a string containing more than k #s.
   For G#RS H, maxL (H) and maxR (H) denote the maximum length of left-
hand and right-hand side of rules, respectively. Precisely, let H = (Q, Σ, s, R)
200     Zbyněk Křivka and Alexander Meduna

be a G#RS, maxL (H) = max({|α| | p iα → q β ∈ R}) and maxR (H) =
max({|β| | p iα → q β ∈ R}).

    Let k be a positive integer. Lk (G#RS), Lk (CF #RS), and Lk (P, CF ) denote
the families of languages generated by generalized #-rewriting systems of index
k, context-free #-rewriting systems of index k, and derived by programmed
grammars of index k, respectively. L(G#RS) and L(CF #RS) denote the family
of languages generated by generalized #-rewriting systems and context-free #-
rewriting systems, respectively.

Example 1. CF#RS H1 = ({s, p, q, f }, {a, b, c, #}, s, R1 ), where R1 contains

1: s 1# → p ##
2: p 1# → q a#b
3: q 2# → p c#
4: p 1# → f ab
5: f 1# → f c

   For instance, H1 computes aabbcc as s# ⇒ p## [1] ⇒ qa#b# [2] ⇒
pa#bc# [3] ⇒ f aabbc# [4] ⇒ f aabbcc [5].

Example 2. G#RS H2 = ({s, p, q}, {a, b, c, #}, s, R2 ), where R2 contains

1: s 1# → s a##
2: s 2a## → p a#b#c
3: p 1a# → q aa#
4: q 2b#c → p bb#cc
5: p 1a# → p a
6: p 1b#c → p bc

   For instance, H2 computes aabbcc as s# ⇒ sa## [1] ⇒ pa#b#c [2] ⇒
qaa#b#c [3] ⇒ paa#bb#cc [4] ⇒ paabb#cc [5] ⇒ paabbcc [6].
   Obviously, H1 and H2 are of index 2. Both systems describe the same lan-
guage L(H1 ) = L(H2 ) = {an bn cn | n ≥ 1} which belongs into L(CF #RS) −
L(CF ).


4     Results

The first part of the section establishes an infinite hierarchy of language fami-
lies resulting from the generalized #-rewriting systems defined in the previous
section.
    Throughout this section, we only describe the construction part of the proof,
leaving the complete version of this proof to the reader.

Lemma 1. For every k ≥ 1, Lk (CF #RS) = Lk (G#RS).
                                  Generalized #-Rewriting Systems of Finite Index                 201

Proof.
We only need to prove that Lk (G#RS) ⊆ Lk (CF #RS).
Construction. Let k ≥ 1 be a positive integer. Let H = (Q, T ∪ {#}, s, R) be a
generalized #-rewriting system of index k, where Σ = T ∪ {#}, # ∈    / T . Let µ =
max({maxL (H), maxR (H)}) and let $ be a new symbol, $ ∈    / Q∪Σ. We construct
the context-free #-rewriting system of index k, H 0 = (Q0 , T ∪ {#}, s0 , R0 ), where
components Q0 , s0 , and R0 are constructed as follows:

1. s0 := hs, #i;
2. Q0 := {hp, y0 #y1 # . . . yh−1 #yh i
             | p ∈ Q, 1 ≤ h ≤ k,
               yj = yj0 ∇j yj00 , yj0 , yj00 ∈ T ∗ , ∇j ∈ {ε, $},
               |yj0 | ≤ 2µ, |yj00 | ≤ 2µ, 0 ≤ j ≤ h};
3. R0 := {hp, x0 # . . . #x0j ∇j x00j # . . . #xh i 1# → hp, x0 # . . . #yj0 $yj00 # . . . #xh i #
            | hp, x0 # . . . #x0j ∇j x00j # . . . #xh i, hp, x0 # . . . #yj0 $yj00 # . . . #xh i ∈ Q0 ,
              ∇j ∈ {ε, $}, yj0 ∈ prefixes (x0j ), yj00 ∈ suffixes (x00j ),
              xj ∈ T ∗ , 1 ≤ h ≤ k, 0 ≤ j ≤ h, where x00 = y00 = x00h = yh00 = ε};
4. For every rule r: p iα#β → q αγβ ∈ R, add the following set to R0 :
       {hp, η 0 α#βη 00 i i# → hq, η 0 αγβη 00 i γ
                  | occur(η 0 α, #) = i − 1,
                    hp, η 0 α#βη 00 i, hq, η 0 αγβη 00 i ∈ Q0 }.

Basic Idea. By several computational steps, H 0 simulates a single step in H.
Inside of every state of the form hp, ηi occurring in a configuration of H 0 , it is
recorded

 (1) p—the current state of H;
 (2) η—the context string of H’s current configuration that represents context
     of each # of length at most µ on both sides of #.

   The simulation is done in two steps by rules introduced in the third and forth
step of the above construction as follows.

 (a) Each substring of terminals between two #s can be non-deterministically
     shortened by the rules constructed in 3 and the position of the shortening is
     marked by $ to reflect this in the subsequent context checks. This shortening
     is necessary only to make enough space in the context string for the rule
     application in the next step.
 (b) By each of H’s generalized rules is rewritten only one # with respect to its
     surrounding left and right context. In this step, the rules of the context-
     free form constructed in 4 are applied and, thereby, simulate H’s behavior
     thanks to context checks, which are managed by the second component of
     states in H 0 .

The simulation is completed when the string of terminals is obtained.
202       Zbyněk Křivka and Alexander Meduna

   Let h be a homomorphism from a configuration of context-free #-rewriting
system to corresponding configuration of generalized #-rewriting system defined
as h(hp, ηiz) = pz.

Claim 1. If s# ⇒m pz0 #z1 # . . . #zh in H, then hs, #i# ⇒r hp, x0 #x1 # . . .
#xh i z0 #z1 . . . #zh [r10 r20 . . . rr0 ] in H 0 where xi = x0i ∇i x00i , ∇i ∈ {ε, $}, x0i ∈
prefixes (zi ), x00i ∈ suffixes (zi ), |xi | ≤ 4µ + 1, xi , zi ∈ T ∗ , 0 ≤ i ≤ h, 1 ≤ h ≤ k,
for m ≥ 0.
Proof. This claim is established by induction on m.
Basis: Let m = 0. For s# ⇒0 s# in H there exists s0 # ⇒0 s0 # in H 0 , where
s0 = hs, #i.
Induction Hypothesis: Suppose that Claim 1 holds for all computational steps of
length m or less for some m ≥ 0.
Induction Step: Consider s# ⇒m+1 qz 0 , where z 0 ∈ Σ ∗ . Express s# ⇒m+1 qz 0
as s# ⇒m pz [r1 r2 . . . rm ], where z = z0 #z1 # . . . α#β . . . #zh , # between α
and β is the ith bounder and r1 , . . . , rm , rm+1 ∈ lab(R), and pz ⇒ qz 0 [rm+1 ].
For rm+1 : p iα#β → q αγβ is z 0 of the form: z 0 = z0 #z1 # . . . αγβ . . . #zh , for
z0 , . . . , z h ∈ T ∗ .
      Based on the induction hypothesis, there exists hs, #i# ⇒r hp, x0 #x1 # . . .
α#β . . . #xh i z0 # . . . α#β . . . zh−1 #zh [r10 r20 . . . rr0 ] ⇒∗ hp, y0 #y1 # . . . α#β . . . #yh i
z0 # . . . α#β . . . zh−1 #zh [ρ] ⇒ hq, y0 #y1 # . . . αγβ . . . #yh i z0 # . . . αγβ . . . zh−1
            0
#zh [rr+1        ], where ρ is a sequence of rules constructed in step 3, |ρ| ≥ 0, r ≥ 1,
rj ∈ lab(R0 ), 1 ≤ j ≤ r + 1, xj , yj ∈ T ∗ {ε, $}T ∗ , zj ∈ T ∗ , 0 ≤ j ≤ h,
 0

occur(y0 #y1 # . . . α, #) = occur(z0 #z1 # . . . α, #) = i−1, α = x̄i−ᾱ #xi−ᾱ+1 . . .
xi−2 #xi−1 , ᾱ = occur(α, #) + 1, x̄i−ᾱ ∈ suffixes (xi−ᾱ ), β = xi #xi+1 . . . xi+β̄−1
                                                                    0
#x̄i+β̄ , β̄ = occur(β, #), x̄i+β̄ ∈ prefixes (xi+β̄ ), and rr+1        : hp, y0 #y1 # . . . α#β . . .
                                                          0
#yh i i# → hq, y0 #y1 # . . . αγβ . . . #xh i γ ∈ R is introduced by step 4.
      Therefore, h(hp, y0 #y1 # . . . α#β . . . #yh i z0 # . . . α#β . . . zh−1 #zh ) = pz and
h(hq, y0 #y1 # . . . αγβ . . . #yh i z0 # . . . αγβ . . . zh−1 #zh ) = qz 0 , so the claim holds.

Claim 2. If s# ⇒z pw in H, then hs, #i# ⇒∗ hp, ηiw in H for some z ≥ 0,
w ∈ T ∗.
Proof. Consider Claim 1 for h = 0. At this point, if s# ⇒z pz0 , then hs, #i# ⇒∗
hp, x0 iz0 and so z0 = w.
                                                                               t
                                                                               u

Theorem 1. For every k ≥ 1, Lk (G#RS) ⊂ Lk+1 (G#RS).

Proof.
Lk (G#RS) = Lk (CF #RS) follows from Lemma 1. Recall that Lk (P, CF ) =
Lk (CF #RS) (see [4]) and Lk (P, CF ) ⊂ Lk+1 (P, CF ) for every k ≥ 1 (see
Theorems 3.1.2i and 3.1.7 in [2]). Thus, Theorem 1 holds.               t
                                                                        u
                            Generalized #-Rewriting Systems of Finite Index        203

5    Conclusion
The present paper has discussed generalized language-defining device that repre-
sents a combination of both automata and grammars. This device characterizes
well-known infinite hierarchy of formal language families in a very natural way.
Consequently, it is obviously closely related to some classical results about formal
languages, on which it sheds light in an alternative way. Therefore, this paper
suggests its further investigation in the future. Specifically, this investigation
should pay a special attention to the following open problem areas:
Unrestricted Rules. Observe that Theorem 1 does not hold if the rules are of
the form p iα → q β, where p, q ∈ Q, i ∈ I, α, β ∈ Σ ∗ , occur(α, #) ≥ 1 and
computation step is defined as follows:
    Let puα0 #α00 v, quβv be two configurations, p, q ∈ Q, u, v ∈ Σ ∗ , i ∈ I and
occur(uα0 , #) = i − 1. Then, #-rewriting system with unrestricted rules makes
a computational step from puα0 #α00 v to quβv by using the unrestricted rule
r: p iα0 #α00 → q β.
    Let us demonstrate this observation on the following example.

Example 3. In [2], Theorem 3.1.7 proves that Lk ∈ Lk (CF #RS) and Lk ∈               /
Lk−1 (CF #RS), where Lk = {b(ai b)2k | i ≥ 1}.
   Consider G#RS with unrestricted rules H = ({s, s0 , p, q, r, r0 , f }, {a, b, #}, s,
R), where R contains
 1: s 1# → s0 ##a##
 2: s0 2# → s0 #a
 3: s0 1# → p #
 4: p 2#a → q a#
 5: q 3# → p #a
 6: p 2## → r ##
 7: r 1# → r0 b
 8: r0 3# → p ##
 9: p 1## → f b
10: f 1## → f b
For instance, H computes baabaabaab = b(a2 b)3 as
s# ⇒ s0 ##a## [1] ⇒ s0 ##aa## [2] ⇒ p##aa## [3]
    ⇒ q#a#a## [4] ⇒ p#a#a#a#[5]
    ⇒ q#aa##a# [4] ⇒ p#aa##aa# [5]
    ⇒ r#aa##aa# [6] ⇒ r0 baa##aa# [7] ⇒ pbaa##aa## [8]
    ⇒ qbaa#a#a## [4] ⇒ pbaa#a#a#a# [5]
    ⇒ qbaa#aa##a# [4] ⇒ pbaa#aa##aa# [5]
    ⇒ rbaa#aa##aa# [6] ⇒ r0 baabaa##aa# [7] ⇒ pbaabaa##aa## [8]
    ⇒ f baabaabaa## [9] ⇒ f baabaabaab [10].
    It is obvious that H is of index 4 and L(H) = {b(ai b)j | i, j ≥ 1} ∈ L4 (G#RS
with unrestricted rules). Thus, this example shows that for every k ≥ 1, there is
a language L such that L ∈    / Lk (CF #RS) and L is generated by a generalized
#-rewriting system with unrestricted rules.
204     Zbyněk Křivka and Alexander Meduna

   We believe that there is no infinite hierarchy based on finite index in case of
generalized #-rewriting systems with unrestricted rules. More specifically, every
generalized #-rewriting system with unrestricted rules can be transformed into
equivalent generalized #-rewriting system with unrestricted rules of index 1.
However, we have not proved this conjecture so far.

Unlimited Index. Consider #-rewriting systems that are of unlimited index.
What is the language family defined by them. What is the relation between
L(CF #RS) and L(G#RS)?

Determinism. This paper has discussed a general version of #-rewriting systems,
which work non-deterministically. Of course, these systems can be restricted
so they work deterministically. What is the language family defined by these
deterministic systems?
    Formally, a generalized #-rewriting system M is deterministic if for every
left-hand side of M ’s rule r, lhs(r), holds that card({lhs(r) | r ∈ R}) ≤ 1.

Acknowledgement. This work was supported by the Czech Science Foundation
201/07/0005 grant.


References
 1. H. Borodihn, H. Fernau, Accepting grammars and systems: an overview. In: Proc.
    of Development in Language Theory Conf., Magdeburg, 1995, pp. 199-208.
 2. J. Dassow, G. Păun, Regulated Rewriting in Formal Language Theory. Springer,
    New York, 1989, 308 p., ISBN 0-38751-414-7.
 3. T. Kasai, A Hierarchy Between Context-Free and Context-Sensitive Languages. In:
    Journal of Computer and System Sciences 4, 1970, pp. 492-508.
 4. Z. Křivka, A. Meduna, R. Schönecker, Generation of Languages by Rewriting Sys-
    tems that Resemble Automata. In: International Journal of Foundations of Com-
    puter Science Vol. 17, No. 5, 2006, pp. 1223-1229.
 5. A. Meduna, Automata and Languages: Theory and Applications. Springer, London,
    2000, 916 p., ISBN 1-85233-074-0.
 6. E. Moriya, D. Hofbauer, M. Huber, F. Otto, On state-alternating context-free gram-
    mars. In: Theoretical Computer Science 337, 2005, p. 183-216.
 7. G. Rozendberg, A. Salomaa (eds.), Handbook of Formal Languages: Word, Lan-
    guage, Grammar, Volume 1. Springer, Berlin, 1997, 873 p., ISBN 3-540-60420-0.