<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Searching All Approximate Covers and Their Distance using Finite Automata</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ondˇrej Guth</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boˇrivoj Melichar</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miroslav Bal´ık</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cˇ esk´e vysok´e uˇcen´ı technick´e v Praze</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Praha</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>gutho</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>melichar</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>balikm}@fel.cvut.cz</string-name>
        </contrib>
      </contrib-group>
      <issue>201</issue>
      <abstract>
        <p>Cover is a type of a regularity of strings. A restricted approximate cover w of string T is a factor of T such that every position of T lies within some approximate occurrence of w in T . In this paper, the problem of all restricted smallest distance approximate covers of a string is studied and a polynomial time and space algorithm for solving the problem is presented. It searches for all restricted approximate covers of a string with given limited approximation using Hamming distance and it computes the smallest distance for each found cover. The solution is based on a finite automata approach, that provides a straightforward way to design algorithms to many problems in stringology. Therefore it is shown that the set of problems solvable using finite automata includes the one studied in this paper.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Preliminaries</title>
      <p>An alphabet is a nonempty finite set of symbols,
denoted by A. A string over an alphabet is a finite
sequence of symbols of the alphabet. Empty string is an
empty sequence of symbols, denoted by ε. An effective
alphabet of a string T is a set of symbols that really
occur in T . Only effective alphabet is considered in
this paper. A language is a set of strings. A set of all
strings over alphabet A is denoted by A∗. The length
of a string w is denoted by |w|, the i–th symbol of w is
denoted by w[i]. An operation concatenation is defined
in this way: x, y ∈ A∗, concatenation of x and y is xy,
may be denoted by x.y. An operation superposition is
defined in this way: x = pu, y = us, superposition of
x and y is pus. Suppose u, w, x, T ∈ A∗. w is a prefix
1 Introduction of T if T = wu, w is a suffix of T if T = uw, and w
is a factor (also called a substring) of T if T = uwx.</p>
      <p>A set of all prefixes of T is denoted by Pref (T ), a set
Searching regularities of strings is used in a wide area of all suffixes of T is denoted by Suff (T ), and a set of
of applications like molecular biology and computer– all factors of T is denoted by Fact (T ).
assisted music analysis. One of typical regularities is A deterministic finite automaton (also called a
decover. terministic finite state machine, denoted by DFA) is a</p>
      <p>Finding exact covers is not sufficient in some appli- quintuple (Q, A, δ, q0, F ), where Q is a nonempty finite
cations, thus approximate covers have to be computed. set of states, A is an input alphabet, δ is a transition
In this paper, the Hamming distance is considered. function, δ : Q × A 7→ Q, q0 ∈ Q is an initial state and
F ⊆ Q is a set of final states.</p>
      <p>
        Exact covers were introduced in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], an algorithm A nondeterministic finite automaton without ε–
for computation of all exact covers in linear time was transitions is a quintuple (Q, A, δ, q0, F ), where Q is a
presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. An algorithm using finite automata nonempty finite set of states, A is an input alphabet,
approach to computation of all exact covers was intro- δ is a transition function, where δ : Q × A 7→ P (Q),
duced in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. q0 ∈ Q is an initial state and F ⊆ Q is a set of final
      </p>
      <p>
        The algorithm presented in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] searches for one re- states. It is denoted by NFA.
stricted smallest approximate cover (i.e. cover with the A state q is a successor of state p of a deterministic
smallest distance), using dynamic programming. An finite automaton (Q, A, δ, q0, F ) if q = δ(p, a) for some
algorithm using finite automata approach to compu- a ∈ A. A state qN is a successor of a state pN of a
tation all restricted approximate covers for Hamming,
Levenshtein, and Damerau distance was introduced in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>NFA (QN , A, δN , q0N , FN ) if q ∈ δN (pN , a).</p>
      <p>String w = a1a2 . . . a|w| is said to be accepted by a</p>
      <p>DFA (Q, A, δ, q0, F ) if there exists a sequence</p>
      <p>This paper is organized as follows. In Section 2, δ(q0, a1) = q1, δ(q1, a2) = q2, . . . , δ(q|w|−1, a|w|) ∈ F .
some notations and definitions used in this paper are String w = a1a2 . . . a|w| is said to be accepted by a
described. In Section 3, the algorithm for the problem NFA (Q, A, δ, q0, F ) if there exists a sequence
is presented. In Section 4, the complexities of the al- δ(q0, a1) = Q1, δ(q1, a2) = Q2, . . . , δ(q|w|−1, a|w|) ⊆ F
gorithm are proven. In Section 5, experimental results for some q1 ∈ Q1, . . . , q|w|−1 ∈ Q|w|−1. A language
are shown. accepted by a finite automaton M is denoted by L(M ).</p>
      <p>A left language of a state q of a nondeterminis- there exists string s ∈ Suff (T ) such that DH (w, s) ≤
tic finite automaton (Q, A, δ, q0, F ) is a set of strings k.
w = a1a2 . . . a|w|, where for each w exists a sequence A nondeterministic Hamming suffix automaton M
δ(q0, a1) = Q1, δ(q1, a2) = Q2, . . . , δ(q|w|−1, a|w|) = for a string T and distance k is such nondeterministic
Q|w|, q ∈ Q|w| for some q1 ∈ Q1, . . . , q|w|−1 ∈ Q|w|−1. finite automaton without ε–transitions, that L(M ) =
A left language of a state q of a DFA (Q, A, δ, q0, F ) {w : DH(w, s) ≤ k, s ∈ Suff (T )}. Such an automaton
is a set of strings w = a1a2 . . . a|w|, where for each w M = (Q, A, δ, q0, F ) may be constructed in this way:
exists a sequence
δ(q0, a1) = q1, δ(q1, a2) = q2, . . . , δ(q|w|−1, a|w|) = q. 1. Create a layer of |T | + 1 states:</p>
      <p>A maxfactor of a state q of a DFA (Q, A, δ, q0, F ) (a) each state qi0 corresponds to a position i in T
is the longest string of left language of q, denoted by (plus initial state q0, thus 0 &lt; i ≤ |T |),
maxfactor (q). A depth of a state q of a DFA is the (b) for each state qi0 (but the last q|0T |) define
tranlength of maxfactor (q), denoted by depth(q).</p>
      <p>A DFA MD = (Q, A, δ, q0, F ) is equivalent to a
NFA MN = (QN , A, δN , q0N , FN ) if L(MN ) = L(MD).</p>
      <p>Subset construction may be used:
sition δ(qi0, T [i]) = qi0+1,
(c) define the last state q|0T | final (note that until</p>
      <p>now such automaton accepts exactly T ).
2. Similarly, create a layer for each “number of
errors” l, 1 ≤ l ≤ k (only exception: we do not need
any state qil for l &gt; i).
3. For each state qil (but the last q|T | in each layer
and but the last layer) and for each symbol a ∈
A, a 6= T [i] (not occurring in T at position i),
define transition δ(qil, T [i]) = qil++11.
4. Create “long” transitions from q0: δ(q0, a) = {qi0 :
a = T [i], a ≤ i ≤ |T |}∪{qi1 : a 6= T [i], 1 ≤ i ≤ |T |}.
1. Set Q = {{q0}} will be defined, state q0 = {q0N }</p>
      <p>will be treated as unmarked.
2. If each state in Q is marked then continue with</p>
      <p>step 4.
3. Unmarked state q will be chosen from Q and the
following operations will be executed:
(a) δ(q, a) = S δN (pN , a) for pN ∈ q and for all</p>
      <p>a ∈ A,
(b) Q = Q ∪ δ(q, a) for all a ∈ A,
(c) state q ∈ Q will be marked,
(d) continue with step 2.
4. F = {q : q ∈ Q, pN ∩ FN 6= ∅, pN ∈ q}.</p>
      <sec id="sec-1-1">
        <title>For example of a transition diagram of a nondetermin</title>
        <p>istic Hamming suffix automaton see Fig. 1.</p>
        <p>A level of a state of a nondeterministic Hamming
suffix automaton corresponds to the number of errors,
a depth of a state of this automaton is equal to the
Using subset construction of MD equivalent to MN , corresponding position in T .
every state qD ∈ Q corresponds to some subset of QN .</p>
        <p>This subset is called a d–subset, denoted by d(qD). Definition 1 (Restricted approximate cover). Let
Each element of the d–subset corresponds to some T and w be strings. We say, that w is a restricted
apstate of QN . Where no confusion arises, depth of a proximate cover of T with Hamming distance k if w is
state corresponding to an element rj ∈ d(qD) of d– a factor of T and there exist strings s1, s2, . . . , sr (all
subset d(qD) is simply denoted by rj, as numeric rep- some substrings of T ) such that:
resentation of rj corresponds to the depth. In the
algorithms below, d–subset is supposed to be implemented 1. DH (w, si) ≤ k for all i where 1 ≤ i ≤ r,
as a list, preserving order of its elements. An element 2. T can be constructed by superpositions and
conof the d–subset is denoted by ri, where the subscript catenations of copies of the strings s1, s2, . . . , sr.
i means an index (order) of the element ri within the
d–subset. Note 1. An approximate cover is more general
regu</p>
        <p>A distance is the minimum number of editing op- larity than restricted approximate cover, because
(unerations that are necessary to convert a string x into restricted) approximate cover of T needs not be a
faca string y. The maximum allowed distance is denoted tor of T . In this paper, only restricted approximate
by k. cover is considered.</p>
        <p>The Hamming distance between strings x and y is
equal to the minimum number of editing operations Definition 2 (Restricted smallest distance
apreplace that are necessary to convert x into y. The proximate cover). Let T and w be strings. We say,
Hamming distance function is denoted by DH . that w is a restricted smallest distance approximate</p>
        <p>String w ∈ A∗ is an approximate prefix of a string cover of T with distance k if w is a restricted
approxT ∈ A∗ with the maximum Hamming distance k if imate cover of T with the distance k and there exists
there exists string p ∈ Pref (T ) such that DH (w, p) ≤ no l &lt; k such that w is a restricted approximate cover
k. String w is an approximate suffix of the string T if of T with the distance l.</p>
        <p>Problem 1 (All restricted smallest distance approximate computed. When q represents an approximate prefix,
covers of a string). Given string T over alphabet A, its successors are recursively generated and processed.
Hamming distance function DH and distance k, find Note that the set of final states of the deterministic
all restricted approximate covers of T and their small- approximate cover candidate automaton is not needed
est distances. A set of all restricted smallest distance (it would contain all states having d–subsets
containapproximate covers of string T under Hamming dis- ing element corresponding to some final state of the
tance k is denoted by coversH k (T ). nondeterministic Hamming suffix automaton).
Definition 3 (Approximate cover candidate
automaton). An approximate cover candidate
automaton (Q, A, δ, q0, F ) for string T ∈ A∗, Hamming
distance function DH and the maximum distance k
accepts set W = {w1, w2, . . . , wl} of factors of T , where
for each wi ∈ W holds:
Distance l of each cover w = maxfactor (q) may</p>
        <p>
          As any approximate cover of a string T under Ham- vary between 0 and k. Moreover, it cannot be less than
ming distance is an approximate prefix and an approx- level of the first or the last element of d(q), because
imate suffix of T (proven in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]), an automaton accept- each cover must be an approximate prefix and suffix.
ing only such strings can be used. Of course, it cannot be more than the maximum level
of elements of d(q). The Algorithm 1 removes all the
elements having the maximum level but the first and
the last element of d(q), and tries whether w covers T
without those removed positions.
        </p>
        <sec id="sec-1-1-1">
          <title>1. there exists p ∈ Pref (T ) such that DH (p, wi) ≤ k,</title>
          <p>and
2. there exists s ∈ Suff (T ) such that DH (s, wi) ≤ k.</p>
          <p>
            In [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], a construction of an automaton accepting
intersection of approximate prefixes and approximate
suffixes is used for construction of a deterministic
approximate cover candidate automaton. Although this
is a straightforward idea, specialized method (more
effective) is presented for Hamming distance in the
following section.
3
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Problem solution</title>
      <p>The principle of the solution is following: first, we
perform a subset construction of a deterministic cover
candidate automaton from a nondeterministic
Hamming suffix automaton for string T and k, as every
d(q) represents a set of positions of w = maxfactor (q)
within T . If we treat with d(q) as with a sorted list
(ordered by depths of its elements), each pair of
subsequent elements represents positions of subsequent
occurrences of w within T . When for such positions
i, j, i &lt; j holds j − i &gt; |w|, we know that w cannot
be a cover of T . The distance of w is the minimum l
such that it is possible to remove all elements r ∈ d(q)
having level (r) &gt; l and the previous condition holds.</p>
      <p>
        In fact, it is not necessary to save complete
deterministic automaton. Unlike in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we do not make
construction of the deterministic cover candidate
automaton and subsequent computation of covering. A
depth–first search algorithm is used to perform subset
construction and computation of covering and of the
distance of each cover: in Algorithm 2, for each state
and symbol, a successor q is generated, it is
determined whether it represents a cover and the distance is
      </p>
      <sec id="sec-2-1">
        <title>Algorithm 1 Smallest distance of a cover of T .</title>
        <p>Input: d–subset d(q) representing a cover w of T .
Output: The smallest distance l of w.
1: lmin ← max{level (r1), level (r|d(q)|)}
2: lmax ← maxr∈d(q){level (r)}
3: l ← lmax
4: repeat
5: for all r ∈ d(q) \ {r1, r|d(q)|} : level (r) = l do
6: remove r from d(q)
7: end for
8: l ← l − 1
9: until l ≥ lmin and for all i = 2, 3, . . . , |d(q)| : ri−ri−1 ≤
depth (q)
10: l ← l + 1.</p>
        <p>Example 1. Let us have a string T = aabccccb over
alphabet A = {a, b, c} and let us compute a set of all
restricted smallest distance approximate covers of T
under Hamming distance k = 2 using Algorithm 3.</p>
        <p>Because of the distance 2, we are interested in
covers of length at least 3 or having distance less than 2.
We construct a nondeterministic Hamming suffix
automaton MS (see Fig. 1), then an approximate cover
candidate automaton M is analysed (see Fig. 2).</p>
        <p>Looking at the d–subset {3, 4′′, 8′′}, it represents
an approximate prefix and suffix aab of length 3, but
for its positions holds 8 − 4 3, thus the factor aab is
not an approximate cover of T with Hamming distance
2. Looking at the other d–subset {3′′, 5′′, 6′, 7′, 8}, it
represents factor ccb, that covers T with Hamming
distance 2. It is checked whether it covers T with distance
1 (Alg. 1). As the first element of the d–subset has level
equal to 2, lmin is equal to 2. The resulting set of the
covers is coversaabccccbH (2 ) = {(ccb, 2), (aabccccb, 0)}.
b
c
1′2′34′5′6′7′8
1′2′3′45678′
c
b
c
2′′3′45′6′7′8′′
2′′3′4′′5′6′7′8
2′′3′′4′5678′
b
3′′5′′6′7′8
a 123′4′5′6′7′8′ ab 23′4′′5′′6′′7′′8′′ b
c
c
56′′ c 67′′ c
Proof. The automaton consists of layers of states q(i)
for each level i. The layer of states q0 contains |T | +
1 states. Each layer of states q(i) contains one state
less in comparison with layer of states q(i−1), thus it
contains |T | − i + 1 and layer of states q(k) contains
|T | − k + 1 states.</p>
        <p>The automaton contains |A| transitions from each
state, with some exceptions. There are k+1 final states
having no successor. In the layer of states q(k), each
state has only one successor. From the initial state,
there are |T | transitions defined to the states q(0)
having level (q(0)) = 0 and |T | · (|A| − 1) transitions to the
states q(1) having level (q(1)) = 1. Thus in MS there are
|Q|·|A|+ |T |·|A|− (k + 1)·|A|−(|T |−k + 1)·(|A|−1) =
|A| · (|Q| − 2) + |T | − k + 1 transitions.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Note 2. As restricted approximate covers of string T are exact factors of T , it is meaningful to consider effective alphabet A only, thus |A| ≤ |T | always holds.</title>
        <p>It is also meaningless to consider large k, because every (QS, A, δS, q0S, FS), there is no need for any additional
factor of T having length less or equal to k is always data structures. For the purpose of the construction of
approximate cover of T . Thus k ≤ |T | always holds. the deterministic cover candidate automaton M , only</p>
        <p>Usually, k ≪ |T | and |A| ≪ |T | (e.g. in DNA anal- the set of states and transitions from q0S need to be
ysis, A = {a, c, g, t}). Therefore k and |A| may be con- preserved, because the rest may be computed later in
sidered as small constants independent of |T |. O(1) time and space using knowledge of a depth and
a level of a state, k, and T . Thus the space complexity
Lemma 2. The deterministic approximate cover can- of this construction is O((k + |A|) · |T |).
didate automaton M for string T and the Hamming During the computation of the smallest distance
distance contains at most |T |22+|T | + 1 states. (Algorithm 1), only O(1) additional data is needed.
During the processing of states of M (Algorithm 2),
Proof. Each d–subset d(q) of M contains at least one r the needed space is limited by the number of elements
such that level (r) = 0, thus maxfactor (q) ∈ Fact (T ). of all d–subsets (Lemma 4) preserved in a memory and
The number of possible factors of length depth(q) is by the number of all approximate covers (the result,
at most |T | − depth(q) + 1, thus the maximum num- limited by the number of all factors of T – at most
ber of states of M having equal depth is also |T | − O(|T |2)).
depth(q) + 1. The automaton M also contains an
initial state. Therefore, the number of states of M is at
most (|T |−1+1)+(|T |−|T |+1) · |T | + 1.</p>
        <p>2</p>
        <sec id="sec-2-2-1">
          <title>Lemma 5. Using Algorithms 2, and 3 for construc</title>
          <p>tion of a deterministic cover candidate automaton M =
(Q, A, δ, q0, F ) from a nondeterministic Hamming
sufLemma 3. During the construction of the determin- fix automaton MS = (QS, A, δS, q0S, FS), all d–subsets
istic cover candidate automaton M for string T , Al- are sorted in ascending order by depths within MS.
gorithms 2, 3 need to hold at most |T | + 2 states at a
time.</p>
          <p>Proof. Having p, q ∈ Q\{q0} such that q is a successor
of p, suppose that d(p) is sorted in order by depths
Proof. Algorithm 2 works as a depth–first search al- within MS. It holds that for any pS, qS ∈ QS such
gorithm. For each state and symbol it generates at that qS is a successor of pS, depth(qS) &gt; depth(pS).
most one state – possible successor. Thus it holds at Therefore d(q) constructed from already sorted d(p) is
most |T | + 1 states of M (|T | states having d–subsets also sorted.
representing exact prefixes of T plus initial state) and For p = q0, it is supposed that δS(q0S , a) is
cona state generated for a final state, having empty d– structed as sorted in order by depths within MS.
subset.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Lemma 6. Time complexity of Algorithm 1 is O(k · |T |) for each state.</title>
        </sec>
        <sec id="sec-2-2-3">
          <title>Lemma 4. During the construction of the determin</title>
          <p>istic cover candidate automaton M for string T ,
Algorithms 2, 3 need to hold at most |T |2+|T | + 1 elements
2
of d–subsets at a time.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Proof. Algorithm 1 may remove some elements of a</title>
        <p>d–subset in each iteration, thus the iteration may take
O(|T |) time. The number of iterations may be at most
k.</p>
        <p>Proof. Alg. 2 needs at most |T | + 2 states in a memory Lemma 7. Time complexity of Algorithm 2 (from the
at a time (Lemma 3). The deterministic cover candi- initial state) is O((k + |A|) · |T |3).
date automaton M = (Q, A, δ, q0, F ) is constructed Proof. Algorithm 2 constructs for all states q and all
by subset construction from a nondeterministic Ham- a ∈ A the d–subsets of all possible successors of q.
ming suffix automaton MS = (QS, A, δS, q0S, FS ). In The number of states is O(|T |2) (Lemma 2) and the
MS, each state but q0S has at most one successor number of elements of each d–subset is O(|T |). For
for each symbol, q0S has |T | successors for each sym- each state, the computation of covering is performed
bol. For each state pS and its successor qS in MS (it takes O(|T |)), and for each cover (their number is
holds: depth(qS ) &gt; depth(pS). The longest possible O(T 2)), the computation of the smallest distance is
d–subset d(p) contains r|T | having depth(r|T |) = |T |, performed (it takes O(k · |T |) for each cover – Lemma
and r1 having depth(r1) = 1. As |δS(r1, a)| ≤ 1 and 6).
δS(r|T |, a) = ∅ for every a ∈ A, for state p and its
successor q in M holds: |d(q)| ≤ |d(p)| for p 6= q0 and Theorem 2. Time complexity of Alg. 3 is O((k+|A|)·
|d(q)| ≤ |T | for p = q0. |T |3).</p>
        <p>Theorem 1. Space complexity of Alg. 3 is O(|T |2). Proof. It clearly holds that construction of the
nondeterministic Hamming suffix automaton takes O((k +
Proof. It clearly holds that for construction of the |A|) · |T |). Construction of the deterministic cover
cannondeterministic Hamming suffix automaton MS = didate automaton takes O((k + |A|) · |T |3) (Lemma 7).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental results</title>
      <sec id="sec-3-1">
        <title>The algorithm was implemented in C++ using STL,</title>
        <p>the program was compiled using the GNU C++
compiler with O3 optimizations level. The dataset used to
test the algorithm is the nucleotide sequence of
Saccharomyces cerevisiae chromosome IV1. The string T
consists of the first |T | characters of the chromosome.</p>
        <p>The first set of tests was run on a AMD Athlon
64 3200+ (2200 MHz) system, with 2.5 GB of RAM,
under Fedora Linux operating system (see Figs. 3, 4).</p>
        <p>Athlon64, for k=11 and k=31
12
10
8
]
c
e
[se 6
m
i
T 4
2
00
1400
1200
1000
]cse 800
[
ieTm 600
400
300 400</p>
        <p>Text length
100
200
500
600
700
Fig. 3. Time consumption with respect to the text size
(solid line for k = 11, dotted one for k = 31)</p>
        <p>Athlon64, for |T|=1162 and |T|=1550</p>
        <p>400</p>
        <p>Text length
200
300
500
600
700
Fig. 5. Time consumption with respect to the text size
(solid line for k = 101, dotted one for k = 201)</p>
        <p>Athlon, for |T|=114 and |T|=153
2.5
2.0</p>
      </sec>
      <sec id="sec-3-2">
        <title>In this paper, we have shown that an algorithm de</title>
        <p>
          200 sign based on a determinisation of a suffix
automa00 200 400 60M0aximu8m00distanc1e000 1200 1400 1600 taopnpriosxai mppartoepcroiavteersfoorf aallstrreisntgripctreodblsemmaflolerstHdaimstma nincge
distance. The presented algorithm is straightforward,
Fig. 4. Time consumption with respect to the distance easy to understand and to implement and its
theoreti(solid line for |T | = 1162, dotted one for |T | = 1550) cal and experimental time requirements are
comparable to the existing approach ([
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]).
        </p>
        <p>The algorithm may be extended to work with other
distance functions, possibly using the idea presented</p>
        <p>
          The second set of tests was run on a AMD Athlon in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Theoretical and experimental analysis similar
(1400 MHz) system, with 1.2 GB of RAM, under Gen- to one presented here may be accomplished. The
algotoo Linux operating system (see Figs. 5, 6). rithm may be also extended to use parallelism.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Note 3. In comparison with experimental results pre</title>
        <p>
          sented in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the algorithm presented in this paper
runs a bit faster for the same data, even on a slightly
slower computer (1.3 seconds in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] for text length 100
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>This research was partially supported by the Ministry</title>
        <p>1 The Saccharomyces cerevisiae chromosome IV dataset of Education, Youth, and Sport of the Czech
Repubcould be downloaded from http://www.genome.jp/. lic under research program MSM 6840770014, by the</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Apostolico</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Iliopoulos</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          <article-title>Optimal superprimitivity testing for strings</article-title>
          .
          <source>Inf. Process. Lett. 39</source>
          ,
          <issue>1</issue>
          (
          <year>1991</year>
          ),
          <fpage>17</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Christodoulakis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iliopoulos</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sim</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          <article-title>Implementing approximate regularities</article-title>
          .
          <source>Mathematical and Computer Modelling</source>
          <volume>42</volume>
          (
          <year>October 2005</year>
          ),
          <fpage>855</fpage>
          -
          <lpage>866</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Guth</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <article-title>Searching approximate covers of strings using finite automata</article-title>
          .
          <source>In Proceedings of POSTER</source>
          (
          <year>2008</year>
          ), Faculty of Electrical Engineering, Czech Technical University in Prague.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>W. F.</given-names>
          </string-name>
          <article-title>Approximate periodicity in strings</article-title>
          .
          <source>Utilitas Mathematica</source>
          <volume>51</volume>
          (
          <year>1997</year>
          ),
          <fpage>125</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Vora´cˇek,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Melichar</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>Searchig for regularities in generalized strings using finite automata</article-title>
          .
          <source>In Proceedings of the International Conference on Numerical Analysis and Applied Mathematics</source>
          (
          <year>2005</year>
          ), WILEY - VCH Verlag, pp.
          <fpage>809</fpage>
          -
          <lpage>812</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>