A Hierarchy of Languages with Catenation and
                    Shuffle

                      Nils Erik Flick and Manfred Kudlek

         Fachbereich Informatik, MIN-Fakultät, Universität Hamburg, DE
              email: {flick,kudlek}@informatik.uni-hamburg.de


      Abstract. We present basic structures, definitions, normal forms, and
      a hierarchy of languages based on catenation, shuffle and their itera-
      tions, defined by algebraic closure or least fix point solution of systems
      of equations.


Keywords: shuffle catenation languages, hierarchy, algebraic characterization


1   Introduction
In this paper, we establish a hierarchy of languages expressing possibilities of
iterated sequential and parallel compositions of basic events, based on extending
the construction principles behind the well-known regular and context-free lan-
guages with another operation known as the shuffle [8, 9]. Related investigations,
in particular on shuffle languages, are given in [1, 5, 6]. There only certain com-
binations of catenation, shuffle and their iterations have been considered. Such
combinations of both operators are especially useful for modelling some areas
of concurrency, and in particular the behaviour of client/server systems [2], and
also for semantics of Petri nets, such as interleaving semantics.
    In section 2 we introduce or recall the basic definitions and structures needed
for further investigation, such as monoids, semirings, bi-monoids and bi-semirings,
furthermore systems of equations and their least fix point solutions, and normal
forms for them, as well as algebraic closure of finite sets under certain language
operators. In section 3 we investigate the complete hierarchy of language classes
defined as algebraic closures of union, catenation, shuffle and their iterations ap-
plied on the class of finite languages, as well as classes defined by least fix point
solutions of systems of equations, and their relation to the Chomsky hierarchy.
Section 4 offers an outlook for further research in the area such as closure of
language classes under certain operators, or decidability problems.


2   Definitions and Basic Structures
Formal language theory normally deals with subsets of Σ ∗ , all words over a finite
alphabet, using as basic binary operator catenation, denoted by in the sequel.
This gives a basic monoid with and λ, the empty word. Other binary operators
have also been considered, as e.g. shuffle, denoted by , which we define below
under “basic structures”.
   In contrast to catenation      is also commutative; like catenation, it can be
used to define another monoid with the finite subsets of Σ ∗ as domain. Another
possibility is to combine both operators. Extending      and the domain of to
languages gives rise to a basic bi-monoid with {λ} as common neutral element,
and the class of finite subsets of Σ ∗ as domain.

2.1   Basic Structures
In what follows we present, partially recalling, the definitions of such structures
more precisely. For w ∈ Σ ∗ let ||w|| denote the length of w. If A ⊆ Σ ∗ then |A|
denotes the cardinality of A which also can be infinite. In particular, ||λ|| = 0.
For A ⊆ Σ ∗ let ||A|| = max{||w|| | w ∈ A} the norm of A.
    Based on the monoid M = (Σ ∗ ; λ, ), where Σ is a (finite) alphabet,
the binary operation catenation, and λ is the neutral element for the operator
             ∗
  , S = (2Σ ; ∅, {λ}, ∪, ) is an ω-complete semiring since         is associative, ∪
is commutative, associative and distributive with , and
                  [        [              [               [
             A       Bj = (A Bj ) , ( Bj ) A = (Bj A)
                      j       j               j            j
                  ∗
for all A, Bj ∈ 2Σ where the union can also be infinite. Elements (words) w ∈ Σ ∗
or singletons {w} can be seen as basic elements (atoms). At a somehow higher
                                                          ∗
                                                                    Σ∗
level also finite sets can serve as such. For that let 2Σ
                                                        1 =∗ {α ∈ 2    | |α| = 1},the
                                              Σ∗            Σ
class of singletons, and consider FIN = 2f = {α ∈ 2 | |α| < ∞}, the class of
finite sets of words. This can be generated from the first one by usual extension
to sets. For a general treatment of semirings and related structures see [12].
                                                                                   ∗
    A similar construct holds for the shuffle operator          : Σ ∗ × Σ ∗ → 2Σ ,
which can be defined recursively as follows: For all a, b ∈ Σ and v, w ∈ Σ ∗ ,
λ w = w λ = {w}; aw bv = {a} (w bv) ∪ {b} S                   (aw v) and extended
                               ∗      ∗      ∗
to sets from now on: : 2Σ × 2Σ → 2Σ , A B =                       w v.
                                                        w∈A,v∈B
                                          ∗                           ∗
    Here the basic monoid is M = (2Σ  f ; {λ}, ). Then S = (2
                                                                Σ
                                                                  ; ∅, {λ}, ∪, )
is an ω-complete semiring as well since is associative, ∪ is commutative, asso-
ciative and distributive with , and
                  [        [             [             [
             A       Bj = (A Bj ) , ( Bj ) A = (Bj A)
                      j       j               j            j
                          ∗
   for all A, Bj ∈ 2Σ .
   A bi-monoid is a structure D = (D; 1, , ⊗) where        and ⊗ are associative
binary operations on D, and 1 is the common neutral element.
   A bi-semiring is a structure B = (B; 0, 1, ⊕, , ⊗) where (B; 0, 1, ⊕, ) and
(B; 0, 1, ⊕, ⊗) are semirings, i.e. ⊕ is a commutative and associative binary op-
eration on B,      and ⊗ are associative binary operations on B, 0 is the neu-
tral element of ⊕, and 1 the common neutral element of                 and ⊗. Further-
more, ⊕ is distributive with    and ⊗, i.e. x ⊕ (y z) = (x             y) ⊕ (x z) and
x ⊕ (y ⊗ z) = (x ⊗ y) ⊕ (x ⊗ z) for all x, y, z ∈ B.
    A bi-semiring B is ω-complete if
                 M        M               M             M
            x       yj =     (x yj ) , (       yj ) x =   (yj          x) ,
                     j          j             j                 j

                     M          M              M            M
                x⊗       yj =    (x ⊗ yj ) , (   yj ) ⊗ x =   (yj ⊗ x)
                     j           j             j                j
                                                                L
      for x, yj ∈ B and arbitrary (finite or infinite) ’sums’       with ⊕.
                                                 ∗
    Let Σ be an alphabet. Then D           = (2Σ
                                               1 ; {λ},
                                                  , ) is the basic bi-monoid
for formal languages using both operations,  and , similar to B for formal
languages with only.
                    ∗
    Then B = (2Σ ; ∅, {λ}, ∪, , ) is a bi-semiring since ∪ distributes with
and . This bi-semiring is also ω-complete.


2.2     Systems of Equations

It is well known that the class CF of context-free languages can be characterized
by least fix point solutions of systems of equations using structures based on
catenation . Similar characterizations can be defined for languages based on
structures with or with both, and , respectively.
    Let V be a finite set of variables, standing for subsets X ⊆ Σ ∗ , and C a finite
                           ∗            ∗
set of constants α ∈ 2Σ  1   or α ∈ 2Σ
                                     f , thus elements of the basic structure. Thus
V = {X1 , · · · , Xm } and C = {α1 , · · · , αn }.
    A monomial is a finite expression m(X̄) on V ∪ C using binary operations ,
or , or both        and , where X̄ denotes the tuple of (ordered) variables, e.g.
(X1 α1 ) (X2 X3 ).
    A polynomial p(X̄) is a finite union of monomials.
   A system of equations is a system Xi = pi (X̄) (1 ≤ i ≤ m), or in compact
form X̄ = p̄(X̄).
   A system of equations is called algebraic if the monomials occurring in the
system of equations are arbitrary, linear if all monomials have one of the forms
(A◦X)◦B, A◦(X ◦B), or A with X ∈ V, A, B expressions of constants only,
and ◦ ∈ { , }, rational if all monomials have the form X ◦A or A.
     If the underlying semiring or bi-semiring is ω-complete such a system has a
solution as least fix point. This can be constructed by iteration, starting with
X̄ (0) = ¯∅, and iterating X̄ (j+1) = p̄(X̄ (j) ).
     Clearly, X̄ (j) ⊆ X̄ (j+1) for 0 ≤ j, where ⊆ is meant componentwise for
all 0 ≤ i ≤ m, thus a monotone sequence. This is shown by induction, the
basis ¯ ∅ ⊆ X̄ (1) being trivial, and with induction hypothesis X̄ (j) ⊆ X̄ (j+1) and
   (j+1)
X̄        = p̄(X̄ (j) ) ⊆ p̄(X̄ (j+1) ) = X̄ (j+2) .
    Since X̄ (j) ⊆ Σ¯∗ , a total upper bound, there exists limj→∞ X̄ (j) = Ȳ which
is the least fix point solution, i.e. Ȳ = p̄(Ȳ ).
    Each component of the least fix point solution defines a formal language of
the system’s kind (algebraic, linear, rational), e.g the component belonging to
the first variable X1 .
    Corresponding to the underlying semirings, different language classes can be
defined. These are
    ALG( ) = CF, LIN ( ) = LIN , RAT ( ) = REG,
    ALG( ) = LIN ( ) = RAT ( ) = SHUF,
    ALG( , ), LIN ( , ), RAT ( , ).
Example 1. An example of a rational system of equations is given by X = X
α ∪ X β ∪ γ with singletons {α, β, γ} = C from disjoint alphabets. The least
fix point solution is a set of languages defined by the following terms in prefix
notation, where h : { , }∗ → C is a homomorphism defined by h( ) = α,
h( ) = β, and R denotes the reverse.
                          [
                                uγh(u)R = (γ α ) β
                       u∈{ , }

    In case of one single commutative operator the classes of algebraic, linear,
and rational languages coincide [12], such as e.g for .
    Whereas in a system of equations one gets least fix points solutions for all
variables, grammars just produce the solution of a distinguished variable, the
initial variable. The equations are written as basic derivation steps, e.g. for
    X = α (Y β) ∪ (Y Z) X ∪ γ gives the productions X → α (Y β),
X → α (Y β), X → γ.

2.3   Algebraic Closures
Another characterization of languages is achieved by application of some lan-
guage operators on a basic class of languages. The operators can be applied
either in arbitrary, prescribed, or somehow mixed order. In this way one gets
algebraic closures under the language operators, i.e the least class of languages
closed under such operators.
    Here we consider the operators ∪, , and , as well as their iterations and
 , applied on members of the basic class FIN of finite languages. Using the
operator ∪ one could also take Σ1∗ , the class of singletons as basic class. But for
convenience we start with FIN .
    (∪, , )(FIN ) means that the operators ∪, , and are applied in arbitrary
order and arbitrary often on finite sets, whereas (∪, )( )( )(FIN ) means that
at first is applied arbitrary often, then arbitrary often, and finally ∪ and
arbitrary often and in arbitrary order. Note the non-application of corresponding
operators is always understood, i.e. each algebraic closure also contains FIN .
    Note that for any set A the following facts hold:
    (A ) = (A ) = (A ) = A , A ⊆ A , and (A ) = A .
Important classes are (∪, , )(FIN ) = SHUF, (∪, , , )(FIN ) = ER,
(∪, , , )(FIN ) = ES, and (∪, , , , )(FIN ) = SE where ES stands for
extended shuffle expression, analogous to ER for extended regular expression.

2.4    Normal Forms
For any system of equations an equivalent one with possibly more variables
can be constructed such that the solution of the new system coincides with the
solution of the old system on the variables of the old system, and the monomials
of the new system are of a simple form. This will be shown for systems with
operators , . Since ∪ is idempotent, i.e. e ∪ e = e for any expression, wlog e
occurs only once as a monomial in a polynomial.
    In case of an algebraic system consider a monomial. If it is of the form Y ◦Z,
Y , or α, where Y, Z ∈ V, ◦ ∈ { , }, α ∈ C, leave it unchanged. If the form is
Y ◦α or α◦Y , add a new variable Z, replace the monomial by Y ◦Z or Z ◦Y ,
respectively, and add a new equation Z = α to the system.
    If it has the form e1 ◦e2 where e1 , e2 are expressions such that it is not of
the form above, add new variables Z1 , Z2 , replace e1 ◦e2 by Z1 ◦Z2 and add new
equations Z1 = e1 , Z2 = e2 to the system. Repeat the procedure until all (also
new) monomials are of the form above.
    Note that also expressions A consisting only of constants are reduced.
    Thus only forms Y ◦Z, Y , or α are achieved.
    In case of a linear system of equations leave unchanged monomials of the
form Y , Y ◦α, α◦Y , and α. Otherwise proceed as for algebraic systems.
    Thus there are only monomials of the form Y , α◦Y , Y ◦α, or α.
    In case of a rational system of equations leave unchanged monomials of the
form Y , Y ◦α, or α. Otherwise proceed as for algebraic systems. Only expressions
of constants are processed. Thus one gets the normal form Y , Y ◦α, or α.
    Another reduction is the eliminations of polynomials of the form Y .
    If X occurs in its own polynomial then it can be removed. To show that let
X = pX (X̄) = qX (X̄) ∪ X be the component for X in the system of equations
X̄ = p̄(X̄). Consider also the system X̄ 0 = p̄0 (X̄ 0 ) which is identical to the first
one except for pX (X̄) replaced by qX (X̄ 0 ). Then X̄ (j) = X̄ 0(j) for all j ≥ 0 in
the fix point approximation. This is shown by induction.
    X̄ (0) = ¯
             ∅ = X̄ 0(0)
    With the induction hypothesis X̄ (j) = X̄ 0(j) follows
    X̄ (j+1) = p̄(X̄ (j) ) = p̄(X̄ 0(j) )
    where pX (X̄ (j) ) = qX (X̄ (j) ) ∪ X̄ (j) = qX (X̄ 0(j) ) ∪ X̄ 0(j) = qX (X̄ 0(j) )
    since X̄ 0(j) ⊆ p̄(X̄ 0(j) ) = X̄ 0(j+1) ,
    and therefore p̄(X̄ 0(j) ) = p̄0 (X̄ 0(j) ) = X̄ 0(j+1) , yielding X̄ (j+1) = X̄ 0(j+1) .
    Hence both systems have the same least fix point solution.
Therefore:

Lemma 1. To any system of equations there exists an equivalent one with re-
spect to least fixpoint, with following normal forms of the monomials:
      algebraic: Y Z, Y Z, α
      linear: Y α, Y α, α Y , α              Y, α
      rational: Y α, Y α, α
                             ∗
      where Y, Z ∈ V, α ∈ 2Σ
                           1 .


3      Hierarchies

In this section we present two language hierarchies, a lower and an upper one.


3.1     The Lower Hierarchy

The first hierarchy we will establish is one of families of languages (in the sense
of [8]) which are obtained as the closure of the family of finite languages under
some of the operations ∪, , , , , extended in the obvious way to families of
languages. It is shown in Figure 1 (with (FIN ) understood). By Lemma 11 in
subsection 3.2, all of these are subsets of RAT ( , ).
 Two classes coincide since REG is closed under [3] which we recall here:

Proposition 1. (∪, , , )(FIN ) ⊆ (∪, , )(FIN ), i.e. REG is closed un-
der .

Proof. Let R1 , R2 ∈ REG. Then R1 , R2 are accepted by deterministic finite
automata A1 = (Q1 , Σ1 , δ1 , q01 , Qf 1 ) and A2 = (Q2 , Σ2 , δ2 , q02 , Qf 2 ). Construct
a NFA A = (Q, Σ, δ, q0 , Qf ) by Q = Q1 × Q2 , Σ = Σ1 ∪ Σ2 , q0 = (q01 , q02 ),
Qf = Qf 1 × Qf 2 , ((q, s), x, (q 0 , s)) ∈ δ if δ1 (q, x) = q 0 or ((q, s), y, (q, s0 )) ∈ δ if
δ2 (s, y) = s0 . Then A accepts the language R1 R2 .                                          

      From this follows

Theorem 1. (∪, , , )(FIN ) = (∪, , )(FIN ) = REG.

    All of the inclusions in the diagram of Figure 1 are proper; to show this, it
is sufficient to prove the following lemmata (by counterexamples):

 – ( , )(FIN ) ∩ ( , )(FIN ) 6⊆ ES (Lemma 2)
 – (∪, )(FIN ) ∩ (∪, )(FIN ) 6⊆ ( , , , )(FIN ) (Lemma 4)
 – ( , )(FIN ) 6⊆ ER (Lemma 8)
 – ( , )(FIN ) 6⊆ (∪, , )(FIN ) (Lemma 6)
 – ( , )(FIN ) 6⊆ ( , , )(FIN ) (Lemma 5)
 – ( )(FIN ) 6⊆ (∪, , , )(FIN ) (Lemma 10)
 – ( )(FIN ) 6⊆ (∪, , , )(FIN ) (Lemma 9)

Lemma 2. {a}            {b} = {a}         {b} 6∈ ES.
                                                    (∪                    )
                                                       P
                                                       i
                                                      I PPP
                                                    6@
                                                        @   PP
                                                              PP
                                                         @      P
                                  (∪             ) (                  )       (∪        )    (∪         )
                                              6BMBQ
                                                   k  CO AK         @
                                                                            I
                                                                            @     6AK
                                                       Q
                                                   B Q C A                 @        A
                                                     B  QCQA                @        A
       (∪       )=(∪              )                   B       C QA          @          A
                                                      B        C AQ Q           @       A
                6@
                 I                                    B      C     A Q              @ A
                      @                              B      C  A QQ                   @ A
                          @                             B       C  A        Q            @A
            (∪        ) (                 )(       ) (∪               ) (      ) (          ) (∪        ) (∪       )
                    
                    *
                      Y
                       H: X y
                             XXX
                             *   Y
                                 H
           I  6
            @     
                    IHH
                      @    6   H 6 @
                                  XHX  I   I  6
                                            @
              
             @      @ H H      HXHX@
                                        XX    @
          
         @      @       HH      HH@ XXX @
(∪ )        (     )           (       )                  (        )                (    )     (     )            (∪ )
                          1P
                          
                           i
                            PP
       I
       @    6              PP   I
                                    @    6                                                                   
         @                    PP
                                    P @
          @                        PP@
                ( )                                                                               ( )
                      PP
                      i                                                                  1
                                                                                          
                        PP                                                             
                          PP                              
                                                 PP     
                                                    P                                                     Figure 1
                                                             ()


Proof. ∀L ∈ ES ∃m ∈ N : ((k > 1, ` > 1, k + ` > m, ak b` ∈ L) ⇒ ∃ubav ∈ L).
But this is not true for {a}    {b} .
    Proof by structural induction: A language in ES A ∪ B, A B, A or A
with A, B ∈ L. For any finite language L let m = kLk. If A does not contain
infinitely many words of the shape am bn but A or A does, then this is due
to A containing ak b` words up to a certain length or ak and b` words. In any
case, wrongly ordered words result. If A and B have the property, then A ∪ B
obviously has it and for A B, a contradiction is also reached if one of them,
wlog A, contains a word uav and B contains a word u2 bv2 .                  
Lemma 3. Let ψ : L → NΣ be the Parikh mapping that takes a word w to the
vector ψ(w) ∈ NΣ with components identical to the multiplicities of symbols from
Σ, extended to languages.
    For any language L ∈ ( , , , )(FIN ), we have that

                      (∃w ∈ L ∃ξ ∈ NΣ ∀k ∈ N : ψ(w) + k · ξ ∈ ψ(L))
                 ⇒ (∀w ∈ L ∃ξ 0 ≥ ξ ∀k ∈ N : ψ(w) + k · ξ 0 ∈ ψ(L)) .
Proof. By structural induction over an ( , , , )-term for L. If A, B have the
property with vector ξA (w) depending on w, resp. ξB (w), then in A B, the
same vectors can be used (ξA (u) is good for any word uv ∈ A B and by
assumption some ξ 0 ≥ ξA (u) is good for any other word u2 v ∈ A B and vice
versa). For A B, the same holds. For A and A , any new word can be suffixed
with formerly possible iterations.                                         
Lemma 4. {a} ∪ {b} = {a} ∪ {b} 6∈ ( , , , )(FIN ).
Proof. Applying Lemma 3 to w = {a} and ξ = {(b, 1)}.                                 
Lemma 5. ( , )(FIN ) 6⊆ ( , , )(FIN ).
Proof. Consider L = {ab} {cd, ef } ∈ ( , )(FIN ). Assume L ∈ ( , , )(FIN ).
     Let L = A with A 6= ∅, A 6= {λ}. Now cd ∈ L. Either cd ∈ Ai for some
i yielding cdcd ∈ L, a contradiction, or c ∈ Ai , d ∈ Aj for some i, j yielding
dc ∈ L, also a contradiction. L = A gives the same contradiction.
     Let L = A B with A 6= {λ}, B 6= {λ}.
     If there exists xcy ∈ A then there exists neither ucv ∈ B nor u0 ev 0 ∈ B nor
u00 f v 00 ∈ B since otherwise xcyuev ∈ L or xcvu0 ev 0 ∈ L or xcyu00 f v 00 ∈ L, all
contradictions. Therefore either xcydz ∈ A or udv ∈ B possible. But then there
would be no xeyf z ∈ L, a contradiction.                                           
Lemma 6. ( , )(FIN ) 6⊆ (∪, , )(FIN ).
                                                              S
Proof. Any language of L2 = (∪, , )(FIN ) is a finite union i Li of languages
which are either finite or Li = Ai or Ai for languages Ai . If w ∈ A, ww ∈ A
and ww ∈ A . Suppose L1 = {a} {b} ∈ L2 . Then L1 is infinite and a ∈ L1 ,
and a word an with n > the longest word in any of the finite languages, must
be in one of the languages Ai so we conclude aa ∈ L1 , which is wrong.      
Lemma 7. Every language L ∈ ER can be written as a union as follows, with
I a finite set and all K(i) ∈ N, for a finite number of sets Aik ∈ ER which are
all either finite or Cik or Cik for some Cik ∈ ER.

                                           [ K(i)
                                             K
                                    L=               Aik
                                           i∈I k=0

This follows from the distributivity of over ∪: A (B∪C) = A B∪A C. Such
a representation is of course not unique. Note that below any iterated catenation
or iterated shuffle the term Cik might be arbitrarily complex.
Proof. Proceed by structural induction. If L is already finite, or the shuffle or
iteration closure of some language in ER, then I is a singleton set and the union
contains one trivial product. If L = M ∪N , with IM and IN wlog disjoint possible
                                                   S K(i)J
index sets of the respective unions, then L =                Aik . If L = M N ,
                                                       i∈IM ∪IN k=0
               S       K(i)
                       J            K(j)
                                    J
then L =                      Aik          Ajk , with |IM | × |IN | such products.   
           (i,j)∈IM ×IN k=0         k=0
Lemma 8. ( , )(FIN ) 6⊆ ER.

Proof. Consider L = {abc}       {bc} 6∈ ER. The following property holds:
    (1) ∀w ∈ L : ||w||a + 1 = ||w||b = ||w||c .
    Using a representation from lemma 7, L can be written as a finite union
indexed by I such that for each i ∈ I, there is a maximum number m(i) of
letters contributed by the finite languages:
                                            X
                         m(i) =                              ||Aik ||
                                  k∈{0,··· ,K(i)},|Aik |<∞
.
    Let m = max{m(i)|i ∈ I}.
    Furthermore, L has the property that w ∈ Aik ⇒ |w|a = |w|b = |w|c for all
infinite sets Aik (otherwise, property (1) will be destroyed by one of the words
w1 obtained by selecting an arbitrary element of each finite set and λ from all
infinite sets and w2 obtained in the same way save for Aik , where we choose a
nonbalanced word, as can be seen via the Parikh mapping).
    A word w = an bbn ccn with n > m, which is in L for any n > m, must be
generated by a concatenation of sets Ak such that at least one b is taken from a
finite set, say Akb , likewise one c is taken from a finite set Akc .
    By virtue of the catenation operation’s order preserving property, the word
w is generated such that
                   K                        K                               K
             w∈(         Ak )   Akb     (           Ak )     Akc        (       Ak )
                  k<kb                   kb <k<kc                       kc <k
.
    Obviously, if n > m, not all a can come from finite Ak . Hence there must
exist some infinite set Aa to the left of Akb such that a` ∈ Aa for some ` > 0. A
contradiction.
                                                                                

   L0 = {ab}     {c} 6∈ ER is a simpler language with a similar property. It can
be shown in a similar way, using ∀w ∈ L0 | ||w||a = ||w||b and words an cbn ∈ L0 .

Lemma 9. ( )(FIN ) 6⊆ ( , ∪, , )(FIN ) = REG.

Proof. {ab} is not regular because {ab} ∩ ({a}                  {b} ) is not.          

Lemma 10. ( )(FIN ) 6⊆ ( , ∪, , )(FIN ).

Proof. Consider L = {ab} ∈ ( )(FIN ). L 6∈ (∪, , , )(FIN ).
    Assume the contrary. Since ||L|| = ∞ the operation , the only one producing
infinite sets, has to be used at least once, A = B . Let it be the last one in the
structural tree representing L. Clearly, B 6= ∅, B 6= {λ}, and ∃w ∈ B : w = uav.
But then uuaavv ∈ {w} {w} ⊆ A. Neither              nor    will erase such a word
u0 aav 0 . A contradiction.                                                     
3.2   The Upper Hierarchy
In this part we investigate higher important language classes, in particular those
defined by systems of equations, and their relations to well known classes. This
is illustrated in Figure 2.


                                                 CS
                                                      6


                                              ALG( , )
                                         : 6
                                         
                                        
                                   
                       
                    CF =
                   ALG( )                      LIN ( , )
                                 :
                                 
                     6         6
                          
                              RAT ( , )
                   LIN =             6
                   LIN ( )                     SE =
                                          (∪, , , , )(FIN )
                      AA
                      K
                                                          H
                                                           Y
                                                           HH
                        A                                    H
                         A
                             A           ER =                  ES =
                              A    (∪,  , , )(FIN )        (∪, , , )(FIN )
                               A                                6
                                 A
                           REG =                               SHUF =
                          RAT ( ) =            Figure 2       RAT ( ) =
                        (∪, , )(FIN )                       (∪, , )(FIN )


Lemma 11. SE ⊆ RAT ( , )
Proof. Construction of a system of equations by structural induction. It suffices
to start with singletons.
    α ∈ Σ1∗ . X1 = α.
    A ∪ B with Y, Z variables for A, B. Add X1 = Y1 , X1 = Z1 .
    A B. Add X1 = Z1 and Z = Y1 β if Z = β is an equation for A. Similar
for A B with replaced by .
    A with Y variables for A. Add X1 = Y1 and Y = X1 β if Y = β is an
equation for A. Similar for A with replaced by .                               
Lemma 12. RAT ( , ) 6⊆ SE.
Proof. We provide a counterexample.
Example 2. Another example of a rational system of equations is given by
   X = Y α ∪ {λ} Y = Z β Z = U γ U = X δ
   with α = {a}, β = {b}, γ = {c}, δ = {d}.
The least fix point solution X fulfills X ⊆ {w | ||w||a = ||w||b = ||w||c = ||w||d }
whose words of length 4k, k ∈ N can be easily calculated: some of them are
dk bk (ca)k , (dcba)k . Some words not in any solution X are any words ending in
d, or words with aa or cc infixes.

   We prove that no infinite subset of X containing infinitely many dk bk (ca)k
words is in SE.

    1) X = A or 2) X = A are out, because for 2) no word with an ar-
bitrary number of consecutive c or a is in X; and for 1) eventually some di
would have to be in A, which can then be added to the end, yielding a word not
in X. So any language whose minimal SE term has depth 1, does not provide
the solution X. All other possibilities reduce the question of finding such a SE
language to the question of finding one with a minimal term depth reduced by 1:

   X = A ∪ B still means that one of A and B still contains infinitely many
such words and none contains a word not in X, begging the question.

   X = A B, with non-trivial A and B, means all dk bk (ca)k words must come
from A and B wholesale because otherwise the balance between the numbers of
occurrences of a, b, c and d can necessarily be upset by pumping.

   X = A B, with non-trivial A and B, means that if A contains any word
ucv, B contains neither u2 av2 nor u3 cv3 . Neither can it contain any word with
b nor d since no word in X ends with either of these.                         
Lemma 13. (also [4]) SHUF 6⊆ CF
Proof. L = {abc} ∈ ( )(FIN ). But since CF is closed under intersection with
regular sets, L ∩ ({a} {b}    {c} ) = {an bn cn | n ≥ 0} 6∈ CF.           
   To prove the following lemma iteration lemmata for the classes RAT ( , ),
LIN ( , ) and ALG( , ), similar to such for REG, LIN and CF are applied.
For lack of space they and the following counterxexamples will be presented in
another article. For general iteration lemmata see [10].
Lemma 14. L1 = {an bn | n ≥ 0} ∈ LIN , L1 6∈ RAT ( , ),
  L2 = {am bm cn dn | m, n ≥ 0} ∈ CF , L2 6∈ LIN ( , ),
  L3 = {an bn cn | n ≥ 0} ∈ CS , L3 6∈ ALG( , ).
    Putting together the last lemmata as well as such known for the Chomsky
hierarchy and from Figure 1, we get the complete diagram shown in Figure 2.
4    Outlook

In another paper we have investigated structural, closure and decidability prop-
erties of language classes presented in this paper, as well as iteration lemmata
for them, and their relation to semilinear sets.
    It would also be interesting to know for each of the proper inclusions in the
diagrams whether it is decidable if a language of the higher family, given by a
term of the corresponding type, is also a member of the lower one ([6], p. 104)
(e.g. decidability of whether shuffle closure of a regular language is still regular).
Also other decidability problems as equivalence should be investigated, as well
as the complexity of the language classes considered.


References
 1. Câmpeanu, Cezar; Salomaa, Kai; Vágvölgyi, Sándor: Shuffle Quotient and Decom-
    position. Springer LNCS 2295, pp. 186–196, 2002.
 2. Czaja, Ludwik; Kudlek, Manfred: Language Theoretic Properties of Client/Server
    Systems. Proceedings of CS&P 2011, pp. 79-84, 2011.
 3. Ginsburg, Seymour: The Mathematical Theory of Context-free Languages.
    McGraw-Hill, 1966.
 4. Gischer, Jay: Shuffle Languages, Petri Nets, and Context-sensitive Grammars.
    CACM 24 (9), pp. 597–605, 1981.
 5. Ito, Masami: Shuffle Decomposition of Regular Languages. Journal of Universal
    Computer Science, Vol. 8, No. 2, pp. 257–259, 2002
 6. Ito, Masami: Algebraic Theory of Automata and Languages. World Scientific, 2004.
 7. Jantzen, Matthias: The Power of Synchronizing Operations on Strings. Technical
    Report, FB Informatik, Univ. Hamburg, IfI-HH-B-67/80, 1980.
 8. Jantzen, Matthias: Extending Regular Expressions with Iterated Shuffle. Technical
    Report, FB Informatik, Univ. Hamburg, IfI-HH-B-99/84, 1984.
 9. Jantzen, Matthias: Extending Regular Expressions with Iterated Shuffle. TCS 38,
    pp. 223–247, 1985.
10. Kudlek, Manfred: On General Iteration Lemmata for Certain Classes of Word,
    Trace and Graph Languages. FI 37 (4), pp. 413–422, 1999.
11. Kudlek, Manfred: On Semilinear Sets over Commutative Semirings. FI 79 (3-4),
    pp. 447–452, 2007.
12. Kuich, Werner; Salomaa, Arto: Semirings, Automata, Languages. Springer, 1986.