String Assembling Systems and Watson-Crick Finite Automata∗

                                                       András Murvai, György Vaszil

                          Faculty of Informatics, University of Debrecen, Kassai út 26, 4028 Debrecen, Hungary
                                                   halimifoliumlycium@gmail.com
                                                   vaszil.gyorgy@inf.unideb.hu

Abstract: We explore the relationship of Watson-Crick au-                    The other model interesting for our investigations is
tomata and string assembling systems. In the general case,                called Watson-Crick finite automaton, and it was intro-
Watson-Crick automata are more powerful, so restricting                   duced in [2]. Such automata are similar to ordinary finite
the study to the stateless variant is of interest. We show                automata in the sense that they read strings written on their
that the class of languages of stateless Watson-Crick au-                 tape, and either accept or reject them, thus defining a for-
tomata are strictly included in the class of languages of                 mal language as the set of strings that are accepted. Sim-
automata with at least two states, then compare string as-                ilarly to sticker systems, Watson-Crick automata work on
sembling systems with the stateless variant. It turns out,                double stranded strings, thus, they have a double stranded
that there are languages that can be generated by string                  tape that contains two complementary strings which are
assembling systems, but not accepted by stateless Watson-                 read by two separate reading heads being able to move in-
Crick automata, but the question concerning the exact re-                 dependently of each other, one on the upper and one on the
lationship of the language classes remains open.                          lower strand of the tape. Concerning their computational
                                                                          power, these types of automata can describe more compli-
                                                                          cated language classes than ordinary finite automata. As
1     Introduction                                                        we will see examples of this later, even some non-context-
                                                                          free context-sensitive languages can be accepted.
The double stranded structure of DNA motivated the in-
troduction and study of double stranded strings with the
tools and techniques of formal language theory, that is, the              2     Preliminaries
definition of string operations which model the biochem-
ical processes that the corresponding strings can undergo,                For a finite alphabet of symbols V , let V ∗ denote the set
see the monograph [7] for more details. Two such models,                  of all strings over V , and let V + = V ∗ \ {λ } where λ
related to the topic of this paper are sticker systems and                denotes the empty string. We consider two languages
Watson-Crick automata.                                                    L1 , L2 ⊆ V ∗ equal if they differ in at most the empty string,
   Sticker systems were introduced in [3], they use dou-                  L1 \ {λ } = L2 \ {λ }. The length of a word w ∈ V ∗ is de-
ble stranded string “pieces” as building blocks to assemble               noted by |w|, the number of occurrences of a symbol x ∈ V
longer double stranded strings when their single stranded                 in w is denoted by |w|x .
ends stick together and form double strands according to                     A double stranded string over V is a pair         of strings
the Watson-Crick complementarity relation. Such systems                                                                         w1
                                                                          (w1 , w2 ) ∈ V ∗ ×V ∗ , it can also be written as          , while
describe formal languages, sets of double stranded strings                                                                 ∗  w2
which can be “stick” together starting from a set of initial                                                                V
                                                                          the set of pairs V ∗ ×V ∗ can be written as            .
pieces. Depending on how we specify the details of the                                                                      V∗
functioning of the system, simple languages (like regular                    We  denote
                                                                                         the 
                                                                                              lastletters and the
                                                                                                                 first letters of a string
languages which can be described by finite automata), or                          u             u                u
                                                                          pair        as end(        ) and bgn(      ):
more complicated languages (like recursively enumerable                           v             v                v
languages which can be described by Turing machines)
can be generated by sticker systems. As we will see later,                    1. If u = u0 x,
                                                                                            v0 =
                                                                                                    0 for some x, y ∈ V, u0 , v0 ∈ V ∗ ,
                                                                                                  vy     
                                                                                              ux         x
string assembling systems (one of the computing models                           then end( 0 ) =            . Similarly, the first letters
studied in this paper) are similar to this, from a certain                                    v y
                                                                                                        y
                                                                                               u
point of view they can be thought of as special cases of                         of the pair       for u = xu0 and v = yv0 are denoted
sticker systems.                                                                          0 v      
                                                                                          xu          x
    ∗ Research supported by the construction EFOP-3.6.3-VEKOP-16-                as bgn(         )=       where x, y ∈ V, u0 , v0 ∈ V ∗ .
                                                                                          yv0         y
2017-00002 supported by the European Union, co-financed by the Eu-
ropean Social Fund, and by grant K 120558 of the National Research,           2. Otherwise,if 
                                                                                               one of the
                                                                                                         strings is  empty, we
                                                                                                                  u0 x
                                                                                                                                 
Development and Innovation Office of Hungary (NKFIH), financed un-                           λ        λ                        x
der the K 16 funding scheme.                                                     have end( 0 ) =          , end(        ) =       ,
      Copyright c 2021 for this paper by its authors. Use permitted un-                v y  y              0 λ       λ
der Creative Commons License Attribution 4.0 International (CC BY                      λ         λ             xu          x
                                                                                 bgn(      )=      , and bgn(        )=         for
4.0).                                                                                 yv0        y              λ          λ
                                                                                              
      x, y ∈ V, u0 , v0 ∈ V ∗ .                                     if and only if, q0 ∈ δ (q,
                                                                                              u2
                                                                                                 ). We may also write
                                                                                              v2
  The complementarity relation is a symmetric relation                                                 
on the letters of the alphabet ρ ⊆ V ×V . We call the set of                               u1      u1 u2
                                                                                               ⇒
sequences of pairs of complementary symbols a Watson-                                      v1       v1 v2
Crick domain and denote it with W Kρ (V ), more formally,
                                                                    if we are not interested in the states of the automaton and
                         
                          x                                         in the rest of the input to be read.
         W Kρ (V ) = {      | x, y ∈ V, (x, y) ∈ ρ}∗ .                 If the reflexive and transitive closure of ⇒ is denoted by
                          y
                                                                    ⇒∗ , then the language accepted by M is
                     
                  x1 x2          xn
                                      ∈ W Kρ (V ) can also be
                                                                                                    
The string pair             ...                                                               w       w
             1  y     y2       yn                                   L(M) = {w ∈ V ∗ | q0        ⇒∗    qf ,
             w                                                                                w       w
written as 0 where w = x1 x2 . . . xn , w0 = y1 y2 . . . yn . We                                   
                                                                                                   w
            w                                                                               where     ∈ W Kρ (V ), q f ∈ F}.
also denote the complement of x ∈ V by x, that is, (x, x) ∈                                        w
ρ, and the complement of xis x,so x =x, and  (x, x)  ∈ ρ.
                                w1          w1              w1      2.2   String assembling systems
   The difference between            and         is that
                                w2          w2              w2
only gives  a different  notation  for (w  ,
                                          1 2w ), while   in  the   Now we recall the definition of string assembling systems
                                                                    from [5]. These also work with double stranded strings,
         
         w1
case of        ∈ W Kρ (V ), |w1 | = |w2 | and w2 is the com-        but in their case, the complementarity relation is the iden-
         w2
plement of w1 , w2 = w1 (also w2 = w1 ).                            tity relation. Similarly to sticker systems mentioned in the
                                                                    introduction, they use a set of double stranded string pieces
                                                                    as building blocks to generate a language. They are able
2.1   Watson-Crick automata                                         to “glue” two double stranded strings together if the last
                                                                    letters of the first string and the first letters of the second
Now we recall the definition of Watson-Crick automata               string coincide, and these letters overlap in the resulting
from [2], see also the monograph [7] for more details.              string.
   The construct M = hV, ρ, Q, q0 , F, δ i is a Watson-Crick           The 4-tuple S = hΣ, A, T, Ei is a string assembling sys-
automaton (WK automaton in short), where                            tem (SAS in short), where
  • V is the input alphabet,                                          • Σ is a finite alphabet,
  • ρ ⊆ V ×V is the complementarity relation,                         • A    +   +
                                                                         ⊂Σ ×Σ is the finite set of axioms of the form
                                                                          uv     u
  • Q is the nonempty finite set of states,                                  or      with u ∈ Σ+ , v ∈ Σ∗ ,
                                                                          u     uv
  • q0 ∈ Q is the initial state,                                      • T ⊂ Σ+ × Σ+ is the finite set of assembly units,

  • F ⊆ Q is the set of accepting states, and                         • E ⊂ Σ+ ×Σ+is the
                                                                                        finite
                                                                                             set of ending assembly units
                                                                                    uv         v
              ∗
              V                                                         of the form      or        with u ∈ Σ∗ , v ∈ Σ+ .
  • δ : Q×           → 2Q is a finite relation, the state tran-                      v        uv
              V∗
    sition relation.                                                   A derivation step of the SAS S as above is denoted by
                                                                    ⇒,
                                                                     and  defined as  follows.
                                                                                               We say that we added the unit
                                                                      xu2                  u1
 A configuration    of sucha WK  automaton    is denoted as             to the unit       , denoted as
                                      V∗
                                       
  w1      w3                w1                                        yv2                  v1
       q         where           ∈         is the part of the in-
  w2      w4                w2        V∗                                                                 
                                                                                            u1        u1 u2
put which is already
                 read,     ∗q∈ Q is the state of the WK au-                                  ⇒           ,
                  w3         V                                                              v1        v1 v2
tomaton, and            ∈         is the part of the input which
                  w4         V∗                                                                   
is not read yet.                                                                      x             u1
                                                                    if and only if,        = end(        ) for some x, y ∈ Σ, and
   The transition between two configurations is denoted by                          y             v1
⇒, and defined     as follows.                                        xu2
                                  ∗                                ∈ T ∪ E. Moreover, if |u1 u2 | ≥ |v1 v2 |, then u1 u2
            u1       u2      u3          V           u1 u2 u3         yv2
   Let           ,        ,        ∈           ,                ∈   can be written as u1 u2 = v1 v2 u0 for some u0 ∈ Σ∗ , or the
            v1       v2      v3          V∗          v1 v2 v3
               0
W Kρ (V ), q, q ∈ Q. Then                                           other way around, if |u1 u2 | ≤ |v1 v2 |, then u1 u2 v0 = v1 v2
                                                                    for some v0 ∈ Σ∗ .
                                 
               u1   u2 u3    u1 u2 0 u3                                A sequence of derivation steps is a derivation. A deriva-
                  q        ⇒         q                              tion is successful, if after choosing an axiom from A, we
               v1   v2 v3     v1 v2    v3
add assembling units from T , then close the derivation by              and L(M) = L(M 0 ). Based on this result, from now on we
adding a unit from E, in such a way that the upper and                  assume that ρ = ρid , the identity relation.
lower string of the string pair which is produced is identi-               We can also find similarities in the functioning of the
cal.                                                                    two models. Adding building units to the generated string
  If the reflexive and transitive closure of ⇒ is denoted by            by a SAS from A, T , or E, corresponds to reading sub-
⇒∗ , then the language generated by S is                                strings of the input by a WK automaton with a transi-
                                                                    tion from the initial state, a transition between arbitrary
                    +u   ∗ w                                            states, and a transition leading to and accepting state, re-
    L(S) = {w ∈ Σ |     ⇒     is a
                     v     w                                            spectively. While the states of WK automata are explicitly
                                      successful derivation}.           given, “states” of a SAS are implicit, they are “hidden” in
                                                                        the overlapping pairs of letters of the building units.
   Now we introduce some additional notation that will
be useful later. The string pair generated in k deriva-                 3.1   Known results on WK automata and SAS
tion steps, or in the case of WK automata,the          string pair
                                                    u(k)                As we have seen, SAS can basically add three types of
read in k transition steps, is denoted by                  , that is,   units to the generated strings, while WK automata can
                                                    v(k)
 
  u0
            
               u0 u1 . . . uk−1 uk
                                     
                                         u
                                                              
                                                               u(0)     have an arbitrary number of states, so it seems to be rea-
        ⇒∗                           = (k) . The pair                   sonable to review results related to the number of states of
  v0           v0 v1 . . . vk−1 vk       v(k)                  v(0)
denotes  an   axiom,      or  in the case  of  WK   automata,     the   WK automata. In [6] the authors show that from the point
                                                                        of view of state complexity, WK automata are more ef-
      
      λ
pair      , representing the fact that nothing has been read            ficient than ordinary finite automata, there is a sequence
      λ
from the input yet.                                                     Lk , k ≥ 1 of regular languages, such that the number
   Let ∆k denote the difference of the        length of upper and       of states of the automata sequence accepting Lk is un-
                                                                        bounded, while all languages of the sequence can be ac-
                                           
                                            u
lower element of the string pair                 added in the kth       cepted with WK automata having a fixed number of states.
                                            v
derivation step, or read in the kth state transition, that              Building on these results, in [1] it is shown that arbitrary
is,∆k = |u| − |v|, and let ∆(k) denote the difference of the            finite languages can be accepted with WK automata hav-
generated strings or the strings read on the upper and lower            ing two states, and that arbitrary unary regular languages
pert of the inputtape in k derivation or state transition             can be accepted by WK automata having three states.
                        u                                                  Concerning the power of SAS, already in the introduc-
steps, that is, for (k) , let ∆(k) = |u(k) | − |v(k) |.
                        v(k)                                            tory paper [5], the authors show that there are unary regu-
   Let us also denote the class of languages accepted or                lar languages that cannot be generated by SAS, but all fi-
generated by a computational model X by L (X), so let                   nite languages can be. From our point of view, especially
L (SAS) and L (W K) denote the classes of languages gen-                interesting is the result concerning the relationship of lan-
erated by SAS and accepted by WK automata, respec-                      guages accepted by stateless two head finite automata (that
tively.                                                                 is, two head automata with one state) and languages gen-
                                                                        erated by SAS. While the languages accepted by stateless
                                                                        WK automata strictly include the languages accepted by
3    The computational power of                                         stateless two-head automata, Theorem 1 which we are go-
     Watson-Crick automata and string                                   ing to prove in the next section, can be considered as the
     assembling systems                                                 extension of this result.

In the following, we would like to point out some of the                3.2   Comparing the power of WK automata and SAS
similarities of WK automata and SAS, and then investigate
whether these similarities help us to establish a relation-             Let us start by establishing the relationship between lan-
ship between their computational power.                                 guages of (unrestricted) WK automata and SAS.
   Consider the WK automata M = hV, ρ, Q, q0 , F, δ i and
                                                                        Proposition 1.
the SAS S = hΣ, A, T, Ei. They both work with double
stranded strings, and there is a relation between the upper                                L (SAS) ⊂ L (W K).
and lower strings in both cases. In the case of SAS, this re-
lation is the identity relation, in WK automata, on the other           Proof. According to Theorem 3.1 of [5], languages gener-
hand, the relation can be an arbitrary symmetric relation.              ated by SAS can be accepted by nondeterministic one-way
This might seem to be an important difference, but it is                two-head finite automata, and since one-way two-head fi-
not. It is known from [4], that the computational power                 nite automata can be simulated by WK automata, it is clear
of WK automata is not influenced by the complementar-                   that L (SAS) ⊆ L (W K).
ity relation. For any WK automata M, we can construct                      On the other hand, L (W K) 6= L (SAS), since according
an M 0 , such that it uses the relation ρ = {(x, x) | x ∈ V },          Theorem 3.6 in [5] the language L = {a} ∪ {a2n | n ≥ 2}
cannot be generated by any SAS, but can be accepted by                      strings simultaneously. All string pairs corresponding to
the following WK automaton.                                                 some unit in T have an appropriate transition in δ , so S
  Let M = h{a}, ρid , {q0 , q f , q f 0 }, q0 , {q f , q f 0 }, δ i where   can finish the string generation by adding some unit from
                                                                        T , which must contain all units that can end the generation,
                   a                           aaaa
        δ (q0 ,       ) = qf ,    δ (q0 ,                ) = qf0,           so
                  a                          aaaa                                                          
                                                                                                               $
                   aa                                                                             E = T ∪{        }.
       δ (q f 0 ,      ) = qf0.                                                                                $
                   aa
                                                                              Now we show that any $w ∈ L(S) can be accepted by
                                                                            M. For all $w, we have a derivation
   As we have seen, in the general case, the computational
power of WK automata is greater than the power of SAS.                            
                                                                               $   $u1   $u1 u2
                                                                                                         
                                                                                                           $u1 u2 . . . ut
                                                                                                                            
Now we continue by restricting the power of WK automata                          ⇒     ⇒          ⇒ ... ⇒                    ,
                                                                               $   $v1   $v1 v2             $v1 v2 . . . vt
by considering its variants with reduced number of states.
First, we look at the relationship between stateless WK                                                                     
                                                                                                                            xui
automata and SAS.                                                           where $w = $u1 u2 . . . ut = $v1 v2 . . . vt ,        ∈ T ∪E, 1 ≤
                                                                                                                             yvi
   In the following, we denote the classes of WK automata                   i ≤ t. Since we created the building units in such a way
with m states by W K|Q|=m and by L (W K|Q|=m ) the class                    that their initial pairs
of languages they accept.                                                                        of letters contain      all possible com-
                                                                                                  x             u(i−1)
                                                                            binations, even if        = end(                ) holds, the ability
Theorem 1. Let M be a nondeterministic stateless WK                                               y             v(i−1)
                                                                                                                                        
automaton. Then there exists an SAS S generating the                                       xui                                             ui
                                                                            to add unit          depends only on the string pair                .
language accepted by M with an initial marker, that is,                               yvi                                                vi
$L(M) = L(S) where $ 6∈ V .                                                            ui
                                                                            As δ (q,       ) = q, and M is stateless, only the read string
Proof. Consider M = hV, ρid , {q}, q, {q}, δ i, we construct                      vi
                                                                                   ui
the SAS S = hΣ, A, T, Ei as follows.                                        pair        determines the success of the reading a word.
                                                                                   vi
   • Σ = V ∪ {$} ($ 6∈ V ),                                                 The derivation above (besides the leading $ sign) is a suc-
                                                                          cessful reading of the word w
              $
   • A={         },
              $                                                                        
                                                                                      u1   u1 u2
                                                                                                          
                                                                                                            u1 u2 . . . ut
                                                                                                                            
                                                                                         ⇒         ⇒ ... ⇒                   ,
             
              xu
                             
                               x                                                      v1   v1 v2             v1 v2 . . . vt
   • T ={          | for all      ∈ Σ × Σ pairs of letters and
              yv         y                                                thus w ∈ L(M).
                         u
     transitions δ (q,       ) = q},
                         v
                                                                             Although formally the above theorem does not say
                  $                                                         anything about the containment relationship betweem
   • E = T ∪{         }.
                  $                                                         L (SAS) and L (W K|Q|=1 ), it still establishes some kind of
                                                                            connection between the two languages classes. Moreover,
   First we show that any w ∈ L(M) can be generated by S.
                                                                            our statement is “stronger” then the similar statement in
  accepts
   M         the
                 empty word, so S must generate $, thus                    [5] which states that for all languages L accepted by state-
  $           $
      ∈ A,        ∈ E.                                                      less two-head finite automata, a corresponding SAS gen-
  $           $
                                                                            erating L0 can be constructed in such a way that all w ∈ L
   Let su and sl denote the already read part of the upper
                                                                            corresponds to w0 ∈ L0 with h(w0 ) = w for a homomor-
and lower strands
                 ofthe input
                              of M.  For
                                          all possible pairs
                  $su         x        u                                    phism deleting four symbols from w0 (and not just one, as
of letters end(         )=       , if     can be read by M,                 in our case).
                  $sl  y             v
                          xu                                                   Let us now relax the requirement of statelessness, and
there has to be a unit         which can be added by S. We                  consider WK automata with two states
                          yv
have added all possible building units for the reading of

  u                                                                         Lemma 1. There is a WK automaton with two states
       to S, so if a string pair can be read by M, then the                 which accepts the language L = {w | w ∈ {a, b}∗ , |w|a =
  v
corresponding building unit can be added by S.                              2(1 + 2i), i ≥ 0}.
   M is always in an accepting state, so the only require-
ment for a string to be accepted is the property that it can                Proof. The language can be accepted by the following
be                                                                          WK automaton.
 read
     completely. In the last reading step, reading some
  u                                                                           Let M = h{a, b}, ρid , {q0 , q f }, q0 , {q f }, δ i with δ :
     , the two heads reach the end of the upper and lower
  v
                                            
                   a                          a                          head reaches the end of the word after 1 + 2k, that is, after
          δ (q0 ,     ) = q0 ,       δ (q f ,    ) = qf ,
                   λ
                                            λ
                                                                       an odd number of reading steps. Therefore, M enters q f
                   b                          b                          when started in q0 , so w is accepted.
          δ (q0 ,     ) = q0 ,      δ (q f ,     ) = qf ,
                  x                        x
                   λ                          λ                          Corollary 1. The language L = {w | w ∈ {a, b}∗ , |w|a =
          δ (q0 ,     ) = qf ,      δ (q f ,      ) = q0 ,               (1 + 2i)k, i ≥ 0} can be accepted for any k ≥ 2 by a WK
                   xy                         xy
                                                                         automaton similar to the above, with
where x, y ∈ {a, b}.                                                                                                    
  We first show that L(M) ⊆ L. For any w ∈ L(M) we                                       a                                a
                                                                                δ (q0 ,        ) = q0 ,         δ (q f ,       ) = qf ,
have a transition sequence                                                               λ
                                                                                                                        λ
                                                                                                                          
                                                                                         b                                b
                                                                      δ (q0 ,       ) = q0 ,          δ (q f ,       ) = qf ,
       w   u0   w1  u0 u1 0 w2                                                            x                           x
   q0    ⇒    q 0 ⇒        q         ⇒ ...                                                                                      
       w   v0   w1  v0 v1     w0                                            δ (q0 ,
                                                                                         λ
                                                                                                  ) = q f , δ (q f ,
                                                                                                                          λ
                                                                                                                                   ) = q0 ,
                              2                                                    x1 x2 ...xk                      x1 x2 ...xk
                              u0 u1 ...ut
                     ... ⇒                 qf
                              v0 v1 ...vt                                where x, x1 , x2 , ..., xk ∈ {a, b}.
where in the first and last configuration M is in states q0                Thus, L = {w | w ∈ {a, b}∗ , |w|a = 2(1 + 2i), i ≥ 0} can
and q f , respectively, and in     oneof the two states
                                                            in  the   be accepted by a WK automaton with two states, but it
                                      ui         a       b      λ        cannot be accepted by stateless WK automata, as we will
intermediate configurations,               ∈{        ,       ,       |
                                      vi        λ        x      xy       show in the following.
x, y ∈ {a, b}}, 0 ≤ i ≤ t, correspond       to the
                                                 reading steps.
                                                                         Lemma 2. If M is a stateless WK automaton, then L(M) =
                                   
                                    ui        b
    In any case, by reading               =       , ∆i = 0, and M        L(M)∗ .
                                    vi       x   
                                                ui         a
remains in the same state, by reading                =         , ∆i =    Proof. Let M = hV, ρid , {q}, q, {q}, δ i be a stateless WK
                                                vi         λ           automaton. It is clear that L(M) ⊆ L(M)∗ . To show the
                                                               ui                                                       ∞
1 and M remains in the same state, by reading                       =    converse inclusion, consider L(M)∗ =
                                                                                                                        S
                                                                                                                             L(M)i , and a
                                                             vi                                                      i=0
   λ
         , ∆i = −2, and M changes to the other state.                    word w ∈ L(M)∗ . There are three possible cases.
   xy
    It is clear that ∆(i) can only increase after a reading step           1. If w ∈ L(M)0 , then w = λ ∈ L(M),
                                                  
      a                                               λ
if          was read, and can only decrease if             was read.       2. if w ∈ L(M)1 , then w ∈ L(M),
      λ                                            xy
Therefore, in order for ∆(i) = 0, some
                                                   λ
                                                         need to be        3. if w ∈ L(M)k , k ≥ 2, then w can be written as w =
                                                   xy
                                                                            w1 w2 ...wk , where wi ∈ L(M), 1 ≤ i ≤ k.
                                                       a
read, together with reading twice as many                  . Thus, if    The  first
                                                       λ                         two cases are clear.            In the third case,
∆(i) = 0, the already read part of the upper string contains               w1     w2 ...wk
                                                                               q             is a possible configuration, since w1 ∈ L,
an even number of as.                                                      w1     w2 ...wk
    If w is accepted, then ∆(i) = 0 and M must be in state q f .         so the two reading heads   can be positioned
                                                                                                                         after
                                                                                                                              w1 .
                                                                                                    w1 ...wi−1      wi ...wk
In order for Mto be   in state q f , an odd number (1 + 2k, k ≥           If a configuration                   q             can occur,
                   λ                                                                             w1 ...wi−1       wi ...wk
0) of reading           had to happen, which means that 2(1+
                                                                                                             
                   xy                                                                 w1 ...wi     wi+1 ...wk
                                                                         then also              q               is a possible configura-
                             
                              a                                                       w  ...w
                                                                                       1 i       wi+1 ...wk
2k) number of reading               had to happen. Thus, M can                         wi
                              λ                                          tion, since        can be read (as wi ∈ L). Therefore, w can
only accept words containing 2(1 + 2k) number of as.                                   wi
                                                                         be read in such a way that both reading heads reach the
    Now we show    that M can  accept any w ∈ L. First it            end of the tape, and since there is only one state, M ac-
                    a          b
needs to read            and        (choosing the appropriate x)         cepts w.
                    λ          x
until it reaches the end of the upper string. We are in state
                                                                            Using the above lemma, we can show the following.
q0 , since we have started in q0 and remained in q0 all the
way. The reading head of the lower string is behind the                  Corollary 2. L = {w | w ∈ {a, b}∗ , |w|a = 2 + 4i, i ≥ 0}
upper head by the number of as, by 2(1 + 2k). In               order
                                                                        cannot be accepted by any stateless WK automaton.
                                                                λ
to catch up with the upper head, we need to read                     a
                                                               xy        Proof. If L = L(M) for a stateless WK automaton M, then
number of times. At each such reading step, M changes                    L = L∗ , according to the lemma above. Let w ∈ L be such,
its state. Since the number if as in w is 2(1 + 2k) and we               that |w|a = 2. Then w2 ∈ L∗ , but w2 ∈
                                                                                                              / L, so L 6= L∗ , there-
read two letters from the lower string, the lower reading                fore L 6= L(M), which is a contradiction.
   Now we show that the language that we found not to be                 Using that a ≡ b (mod m) if and only if m|(a − b), that
in L (W K|Q|=1 ) can be generated by SAS.                             is, if a − b ≡ 0 (mod m), in the remaining cases we show
                                                                      the congruence (2 − |u(k) |a ) − ∆(k) ≡ 0 (mod 4) holds.
Lemma 3. L = {w | w ∈ {a, b}∗ , |w|a = 2 + 4i, i ≥ 0} can                                                  
be generated by SAS.                                                                 xabb       xbab      xbba
                                                                         • If                ,         ,          , then |u(k+1) |a =
                                                                                       y          y          y
Proof. Let S = hΣ, A, T, Ei with
                                                                             |u(k) |a + 1 and ∆(k+1) = ∆(k) + 3 holds, so 2 −
  • Σ = {a, b},                                                              |u(k+1) |a − ∆(k+1) = 2 − (|u(k) |a + 1) − (∆(k) + 3) =
                                                                   2 − |u(k) |a − ∆(k) − 4 ≡ 0 (mod 4) also holds;
             aa         ab      ba       bba          bbb
  • A={              ,        ,      ,             ,       },                    
             aa          a       b         bb          b                          xaa
                                                                         • if            , then |u(k+1) |a = |u(k) |a + 2 and ∆(k+1) =
                                                                           y
               x1 b        xabb       xbab           xbba     xaa            ∆(k) +2 hold, so 2−|u(k+1) |a −∆(k+1) = 2−(|u(k) |a +
  • T ={               ,          ,              ,         ,      ,
             x2 x3          y          y             y       y             2) − (∆(k) + 2) = 2 − |u(k) |a − ∆(k) − 4 ≡ 0 (mod 4)
       x1 a                                                                  also holds;
                  } where x, x1 , x2 , x3 , x4 , y ∈ Σ,
      x2 x3 x4                                                                            
                                                                                    x1 a
             
              a        b                                                 • if               , then |u(k+1) |a = |u(k) |a + 1 and ∆(k+1) =
  • E ={           ,       }.                                                     x2 x3 x4
              a        b                                                     ∆(k) −1 hold, so 2−|u(k+1) |a −∆(k+1) = 2−(|u(k) |a +
                                                                             1) − (∆(k) − 1) = 2 − |u(k) |a − ∆(k) ≡ 0 (mod 4) also
   We have chosen the assembly units of the SAS S in such
                                                                             holds.
a way that the difference between the length of the upper
and lower string indicates how many as must be added to                  Thus, all words generated by S belong to the language L.
the upper string in order to obtain a correct number. The                Now we show that arbitrary words of L can be generated
following four cases are possible.                                    by S. The assembling units of S are chosen in such a way
                                                                      that as and bs can follow each other in arbitrary order in
  1. At least 3 as are missing: ∆(i) = 3 + 4n,
                                                                      the upper string, and the lower string of each unit can be
  2. at least 2 as are missing: ∆(i) = 2 + 4n,                        arbitrary (only the length of the strings are fixed), so it is
                                                                      possible to add to the lower strings the appropriate letters.
  3. at least 1 a is missing: ∆(i) = 1 + 4n,                             Axioms are chosen as follows. We create the least num-
  4. at least 0 a is missing: ∆(i) = 0 + 4n,                          ber of axioms which satisfy the above mentioned congru-
                                                                      ence and provide the arbitrary     letter order. The idea is to
where n ∈ Z and we are after the ith derivation steps.                                                   u
                                                                      create axioms of the form                with upper strings of a
   More formally we canexpress the above as follows.                                                   v
                          u(k)                                        fixed length at most n ∈ N, and the lower strings being at
Consider the string pair         obtained in k derivation
                          v(k)                                        most as long as the upper, thus |u| = i, i = 1 . . . n, 1 ≤
steps. Then the congruence                                            |v| ≤ |u|. Then we discard those which do not satisfy the
                                                                      congruence, or which are not necessary for the initiation
                2 − |u(k) |a ≡ ∆(k)    (mod 4)                        of the generation.                                       
                                                                                                                     a             b
must hold during the whole generating process: |u(k) |a is               This way, for i = 1 we have                        and       as
the number of as in the upper string. In case of |u(k) |a +                                                          a             b
                                                                      axioms, but we discard these, because they do not
2 = 4i + j, (i ≥ 0, 1 ≤ j ≤ 4), (4i + 4) − (4i + j) = (4i +
                                                                      satisfy
                                                                            the congruence.
                                                                                          For       i   = 2 we  have
                                                                                                                            the units
4) − |u(k) |a − 2 indicates the least number of as that we            
                                                                        aa        aa    ab       ab      ba        ba        bb
need to add in order for the number to be a multiple of 4.                   ,       ,       ,       ,         ,         ,      ,    and
                                                                         a        aa      a         b
                                                                                                 ab       b        ba
∆(k) = (4 − j) + 4n, n ∈ Z indicates this too, so                      
                                                                        bb                        aa        ab      ba
                                                                             .      We keep            ,         ,         , so we can
  (4i + 4) − |u(k) |a − 2 ≡ 2 − |u(k) |a ≡                              bb                        aa         a       b
                                                                      begin the generation of words starting with aa, ab,
                            ∆(k) = (4 − j) + 4n      (mod 4).         ba. We cannot generate words starting with bb, so
                                                                      for
                                                                       i  =   3 weonly consider
                                                                                                  the    unitsstarting
                                                                                                                           with bb
   The axioms fulfill this condition.                                   bba        bba      bba      bbb        bbb         bbb
   Let us assume, that the congruence
                                    holds  after k deriva-                   ,        ,         ,          ,         ,         .   We
                                                                          b  bb            bba       b          bb         bbb
                                      uk+1                                           
tion steps. After adding the unit            in the (k + 1)th                  bba      bbb
                                      vk+1                            keep           ,         , so we are able to start the gen-
step, the congruence still holds, since                                          bb       b
                                                                eration of words beginning with bb. We can start the
           x1 b      a     b                                          generation with all possible combinations of letters, so no
   • if           ,      ,   , then |u(k+1) |a = |u(k) |a so
           x2 x3     a     b                                          more axioms are needed.
      ∆(k) = ∆(k+1) , thus 2−|u(k+1) |a = 2 −|u(k) |a ≡ ∆(k) =           We proceed with units in T in a similar manner: we
      ∆(k+1) also hold.                                               create the least possible number of elements that satisfy
the congruence, but we also allow that lower strings are        References
longer than the upper strings. Let us suppose that the cur-
rent string pair satisfies the congruence. If we would like     [1] Elena Czeizler, Eugen Czeizler, Lila Kari, and Kai Salomaa.
to continue the upper string with b, then                           On the descriptional complexity of watson-crick automata.
                                                                    Theor. Comput. Sci., 410(35):3250–3260, 2009.
    • we need to add an appropriate letter also
                                             tothe lower      [2] Rudolf Freund, Gheorghe Păun, Grzegorz Rozenberg, and
                                              x1 b                  Arto Salomaa. Watson-crick finite automata. In Harvey Ru-
      string, so we need a unit of the form        .
                                             x2 x3                  bin and David Harlan Wood, editors, DNA Based Computers,
                                                                    Proceedings of a DIMACS Workshop, Philadelphia, Penn-
If we would like to continue the upper string with a, then          sylvania, USA, June 23-25, 1997, volume 48 of DIMACS
    • we need to add                                                Series in Discrete Mathematics and Theoretical Computer
                     two letters
                               to the lower string, so we          Science, pages 297–327. DIMACS/AMS, 1997.
                       x1 a
      have the unit            , or                             [3] Lila Kari, Gheorghe Păun, Grzegorz Rozenberg, Arto Salo-
                     x2 x3 x4
                                                                    maa, and Sheng Yu. DNA computing, sticker systems, and
    • we need to add further letters to the upper string. If        universality. Acta Informatica, 35(5):401–420, May 1998.
      to the upper string                                       [4] Dietrich Kuske and Peter Weigel. The role of the com-
                                                                    plementarity relation in watson-crick automata and sticker
         – we would like to add an a, then weneed     to          systems. In Cristian Calude, Elena Calude, and Michael J.
                                                   xabb             Dinneen, editors, Developments in Language Theory, 8th In-
           add three letters, so we have the units       ,
                                                 y              ternational Conference, DLT 2004, Auckland, New Zealand,
             xbab          xbba                                     December 13-17, 2004, Proceedings, volume 3340 of Lec-
                     and          ;                                 ture Notes in Computer Science, pages 272–283. Springer,
               y              y
                                                                    2004.
         – we would
                 like to add two as, then we have the
                 xaa                                            [5] Martin Kutrib and Matthias Wendlandt. String assembling
           unit        .                                            systems. RAIRO Theor. Informatics Appl., 46(4):593–613,
                   y
                                                                    2012.
The number of letters needed to be added to the upper or        [6] Andrei Păun and Mihaela Păun. State and transition com-
lower strings are determined by the congruence.                     plexity of watson-crick finite automata. In Gabriel Ciobanu
   With these units, we can generate arbitrary strings of as        and Gheorghe Paun, editors, Fundamentals of Computation
and bs on the upper string, while the lower can only be             Theory, 12th International Symposium, FCT ’99, Iasi, Ro-
the same length as the upper if the congruence is satisfied.        mania, August 30 - September 3, 1999, Proceedings, volume
                                                                    1684 of Lecture Notes in Computer Science, pages 409–420.
In thiscase
           thegeneration can be finished with the ending
                                                                   Springer, 1999.
        a       b
units       or     .                                            [7] Gheorghe Păun, Grzegorz Rozenberg, and Arto Salomaa.
        a       b
                                                                    DNA Computing - New Computing Paradigms. Texts in
    Based on the above, we have the next statement.                 Theoretical Computer Science. An EATCS Series. Springer,
                                                                    1998.
Theorem 2.

                 L (SAS) 6⊆ L (W K|Q|=1 ).

Proof. The statement is clear by combining Lemma 3 and
Corollary 2.


4     Conclusion
We have started to investigate the relationship between the
computational power of WK automata and string assem-
bling systems. We know that in the general case, WK au-
tomata are stronger than SAS, so the comparison with WK
automata having a restricted number of states is of interest.
   First we have shown that WK automata with at least
two states are strictly more powerful than stateless WK
automata, then constructed a language that can be gener-
ated by SAS, but cannot be accepted by stateless WK au-
tomata. The exact relationship of the two language classes,
that is the question whether there are languages accepted
by stateless WK automata which cannot be generated by
SAS, remains open.