<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>F. Franek, C. G. Jennings, and W. F. Smyth. A simple fast hybrid pattern-
matching algorithm. Journal of Discrete Algorithms</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>On-line String Matching in Highly Similar DNA Sequences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nadia Ben Nsira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thierry Lecroq</string-name>
          <email>Thierry.Lecroq@univ-rouen.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mourad Elloumi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LITIS EA 4108, Normastic FR3638, University of Rouen</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LaTICE, University of Tunis El Manar</institution>
          ,
          <country country="TN">Tunisia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2007</year>
      </pub-date>
      <volume>5</volume>
      <issue>4</issue>
      <fpage>682</fpage>
      <lpage>695</lpage>
      <abstract>
        <p>We consider the problem of on-line exact string matching of a pattern in a set of highly similar sequences. This can be useful in cases where indexing the sequences is not feasible. We present a preliminary study by restricting the problem for a specific case where we adapt the classical Morris-Pratt algorithm to consider borders with errors. We give an original algorithm for computing borders at Hamming distance 1. We exhibit experimental results showing that our algorithm is much faster than searching for the pattern in each sequences with a very fast on-line exact string matching algorithm.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>High-throughput sequencing or Next Generation Sequencing (NGS) technologies
allow to produce a great amount of DNA sequences with a high rate of similarity.
For instance, the 1000 genomes project1 aimed at sequencing a large amount of
individual whole human genomes. This generates massive amounts of sequences (3
billion letters A, C, G, T) which are identical more than 99% to the reference human
genome. The generated data form a collection of sequences where each di↵ers from
another by a few number of di↵erences such as substitutions or single nucleotide
variants (SNVs), indels, copy number variations (CNVs) or translocations to name
a few.</p>
      <p>With the large mass of available data, storing, indexing and support for fast
pattern matching have become important research topics.</p>
      <p>Pattern matching can be carried out in two ways: o↵-line by using an index or
on-line when indexing is not possible. Although the first kind of solutions seems
Copyright c by the paper’s authors. Copying permitted only for private and academic purposes.
to be more suitable, the index issue might not seem significant in some cases even
if it is compressed. The main problem we can face is to have insucient storage
space to build the index. Thus one may have to scan the whole sequence rather
than index it.</p>
      <p>In this paper, we focus on o↵ering a preliminary study that allows to find the
exact occurrences of a given pattern of length m in a set of highly similar sequences.
We propose a solution that follows a tight analysis of the Morris-Pratt algorithm.
We point out occurrences of the pattern by performing a left to right traversal
over the reference sequence and at the same time we take into account variations
contained in other sequences. Our approach makes a simplistic assumption that
sequences include variations only of type substitutions and that there exists at most
only one variation in a window of length m.</p>
      <p>The rest of the paper is organized as follows. Section 2 presents related works.
We set up notations and formalize the problem in Sect. 3. We give our new
algorithm in Sect. 4 while experimental results are exhibited in Sect. 5. Finally we
give our conclusions in Sect. 6.
2</p>
      <p>Related work
Storing genetic sequences of many individual of the same species is a major
challenge in biological research. Basic structures usually store redundant information
which lead to a memory requirement proportional to the total length of input data.
Recently, several works focusing on indexing similar sequences were implemented
to allow building data structures taking advantage from high similarity between
the considered data. These works aim to reduce the memory requirement from the
length of all input sequences to the length of a single sequence (reference sequence)
plus the number of variations.</p>
      <p>
        Huang et al. [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ] propose a solution which assumes that the input set of DNA
sequences can be divided into common segments and non-common segments. Their
solution assumes that every sequence di↵ers on m0 positions from the reference
and the designed data structure requires O(n log + m0 log m0) bits where n is the
length of the reference sequence and is the size of the alphabet. Though this data
structure greatly reduces the memory usage and allows fast pattern matching, the
adopted model is restricted to a specific type of similar sequences. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] a solution
based on the use of word level operations on bit vectors is presented. In similar way
as [
        <xref ref-type="bibr" rid="ref9">10</xref>
        ], the general scheme of this technique store the entire of a reference sequence
with only di↵erences between the remaining sequences. The authors build a sux
array together with an Aho-Corasick automaton [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to store identical segments and
the non-common segments are converted into a binary word using 2 bits per base.
Due to the use of the Aho-Corasick the memory usage depends on a log n factor. In
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] a compressed index is proposed based on the Lempel-Ziv compression scheme
[
        <xref ref-type="bibr" rid="ref11">12</xref>
        ]. Both [
        <xref ref-type="bibr" rid="ref2 ref6">6, 2</xref>
        ] propose 2 level indexes for highly repetitive sequences. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
the authors implement an index based on sux tree and traditional q-grams. The
concept of the sux tree of alignment was proposed by [
        <xref ref-type="bibr" rid="ref13">14</xref>
        ]. It satisfies the same
properties as the classical generalized sux tree by adding a new one: common
suxes of two sequences are stored in an identical leaf. This result has been
extended to the sux array of an alignment [
        <xref ref-type="bibr" rid="ref14">15</xref>
        ].
      </p>
      <p>All these results are concerned with o↵-line string matching but to the best of
our knowledge there exists no result for on-line string matching in a set of highly
similar sequences.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>In what follows, we consider a finite alphabet ⌃= {A, C, T, G} for DNA sequences. A
string or a sequence is a succession of zero or more symbols of the alphabet. The
empty string is denoted by ". The set of all non empty strings over ⌃ is denoted
by ⌃ +. All strings over the alphabet ⌃ are element of ⌃ ⇤ = ⌃ + [ { ✏}. The string
w of length m is represented by w[0 . . m 1] where w[i] 2 ⌃ and 0  i  m 1.
The length of w is denoted by |w|.</p>
      <p>A string x is a factor (substring) of y if there exist u and v such y = uxv,
where u, v, x, y 2 ⌃ ⇤ . Let 0  j  m 1 be the starting position of x in y, thus
x = y[j . . j + |x| 1]. A factor x is a prefix of y if y = xv, v 2 ⌃ ⇤ . Similarly a
factor x is a sux of y if y = ux, for u 2 ⌃ ⇤ . A factor u is a border of x, if it is
both a prefix and a sux of x, then there exist v, w 2 ⌃ ⇤ such x = vu = uw. The
reverse of the string x is denoted by x⇠ . The longest common prefix between two
strings u and v is denoted by lcp(u, v).</p>
      <p>
        The exact pattern matching problem consists in finding all the occurrences of a
pattern x in a string y. That is, all possible j such that y[j + i] = x[i] holds for
all 0  i  m 1. This problem can be extended in a very interesting way by
considering a set of sequences and find whether a given pattern occurs distributed
horizontally where di↵erent parts of the pattern can be located in consecutive
positions of di↵erent texts. More formally, given a set of sequences Y = {y0, . . . , yr 1}
of equal length n, point out all positions 0  j  n m + 1, such that for
0  i  m 1 we have x[i] = yg[j + i] for some g 2 [0; r 1]. This latter problem
is known as distributed pattern matching [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ].
      </p>
      <p>
        The problem we focus on in this paper is formally defined as follows. Let
y0, y1, . . . , yr 1 be r highly similar sequences with the same length n defined over
the alphabet ⌃. Let y0 be the reference sequence. The sequences y1, y2, . . . , yr 1
are represented by variations over y0. Thus, we consider the set Z = {(G, j, c)},
such that c = yg[j] 6= y0[j] for all 0  j  n 1, g 2 G where 1  g  r 1
and c 2 ⌃. Furthermore, for ( G, j, c), (G0, j0, c0) 2 Z we have |j j0| &gt; M for some
integer M . We wish to find all occurrences of an arbitrary pattern x of length
m  M in yg where 0  g  r 1. This problem can be viewed as an hybrid
between distributed pattern matching and approximate string matching with k
mismatches [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>A new algorithm</title>
      <p>
        We o↵er an algorithm to solve the problem described above in the same fashion as
the Morris-Pratt (MP) algorithm [
        <xref ref-type="bibr" rid="ref12">13</xref>
        ] using a sliding window mechanism to scan
the text. Hence we need to preprocess the query pattern before the search phase.
We adopt the same strategy of forward prefix scan presented by MP by extending
the problem to the search in highly similar data. For that we need to consider
borders at Hamming distance 0 (as in MP) and borders at Hamming distance 1.
      </p>
      <p>Given the pattern x of length m, we consider three cases when a prefix x[0 . . i]
for 0  i  m 1, is recognized when scanning the r sequences at position j:
Case 1 x[0 . . i] = y0[j . . j + i] and 6 9 (G, k, c) 2 Z such that j  k  j + i.</p>
      <p>This means that x[0 . . i] matches on y0 and there is no variation in all others
sequences in the current window then x[0 . . i] matches equally in all sequences.
Case 2 x[0 . . i] = y0[j . . j + i] and 9 (G, k, c) 2 Z such that j  k  j + i. This
means that x[0 . . i 1] matches all sequences except yg.</p>
      <p>Case 3 x[0 . . i] = yg[j . . j + i] and 9 (G, k, c) 2 Z such that j 
g 2 G . Then x[0 . . i 1] matches only sequence yg.
k  j + i and
4.1</p>
      <sec id="sec-3-1">
        <title>Preprocessing phase</title>
        <p>The preprocessing phase consists in precomputing arrays storing positions of
borders for each prefix of the pattern. Borders at Hamming distance 0 are computed
as in the MP algorithm and stored in an array called mpNext. For borders at
Hamming distance 1 we will use the two following arrays.</p>
        <p>The classical array prefx is the array of prefixes of the pattern x: prefx[i] is the
length of the longest prefix of x starting at position i, for 0  i  m 1.</p>
        <p>The array pref ⇠x stores for each position i, the length of the longest
common prefix starting at position i when reading the pattern from right to left
with each position i0 &lt; i where lcp(x[0 . . i]⇠ , x[0 . . i0]⇠ ) 6= 0. It is defined for
all 0  i  m 1 and i0 &lt; i by pref ⇠x [i] = {(i0, `) | i0 &lt; i, x[i] = x[i0] and 0 &lt;
` = |lcp(x[0 . . i]⇠ , x[0 . . i0]⇠ |)}. Then pref ⇠x [i + 1] can be easily computed from
pref ⇠x [i].</p>
        <p>For 0  i  m, B[i] contains the sux starting positions of borders of x[0 . . i]
with one change. More formally B[i] = {i0 | Ham(x[0 . . i i0], x[i0 . . i]) = 1}.
Proposition 4.1 For 1  i  m 1, if i0 2 B[i] then x[0 . . i
x[0 . . i0 + prefx[i0] 1]x[prefx[i0]]x[i0 + prefx[i0] + 1 . . i].
i0] is a border of</p>
        <p>Then B[i] = {i0 | 9 (i i0, `) 2 pref ⇠x [i] and i i0 = prefx[i0]+`}[{ i | prefx[i] = 0}.
Proposition 4.2 The number of elements in the array B is O(m).
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Searching phase</title>
        <p>The searching phase consists in scanning the sequences from beginning to end. A
general situation is the following: a prefix x[0 . . i] of the pattern matches at least
one sequence of Y at position j. Then position j + 1 is scanned and a decision has
to be made in accordance with the three cases.</p>
        <p>Case 1 If x[i + 1] = y0[j + i + 1] then if there is no variation at position j + i + 1
in the other sequences we remain in Case 1 otherwise we move to case 2. If
x[i + 1] 6= y0[j + i + 1] then if there is no variation in the other sequences at
2.8
2.6
2.4
2.2
1.6
1.4
1.2
)s 2
(
e
iTm 1.8
position j + i + 1 then a shift is performed as in the MP algorithm otherwise
if the variant symbol c is equal to x[i + 1] we move to Case 3 otherwise a shift
has to be performed using mpNext and B.</p>
        <p>Case 2 If x[i + 1] = y0[j + i + 1] then we remain in Case 2. If x[i + 1] 6= y0[j + i + 1]
then a shift has to be performed using B.</p>
        <p>Case 3 If x[i + 1] = y0[j + i + 1] then we remain in Case 3. If x[i + 1] 6= y0[j + i + 1]
then a shift has to be performed using B.</p>
        <p>When a shift is performed using B, the case can change according to the length of
the shift and the position of the variation.</p>
        <p>Theorem 4.3 The searching phase runs in O(n) time.
5</p>
        <p>
          Experiments
We use simulated data of length 150 MB and mutation rate 0.25% among them 30%
to 50% are at the same position in di↵erent sequences. We first generate a reference
text, then mutate it at random positions with random nucleotides. We compare
our solution with FJS [9] which is one of the most ecient exact string matching
algorithm in this setting (see [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). For the patterns, we randomly select them from
the reference text with varying length from 8 to 128. For each pattern length, we
repeat tests 100 times and compute the average searching time. Experiments were
conducted on a machine with 12 GB RAM and 4-core CPU with 2.27 GHz.
        </p>
        <p>As mentioned above our algorithm takes into account the reference sequence
with positions of variations. We consider variations every 500 positions. For FJS
algorithm we launch the execution texts one by one. We ran the FJS algorithm on
each sequence successively. Our goal is to demonstrate that from a certain number
of sequences, it is more ecient to use our solution than a classical exact string
matching solution. Results are shown in Figure 1 and show that our solution is
faster (in our settings) when considering more that 3 highly similar sequences.
6</p>
        <p>
          Discussion and conclusions
We have presented a new algorithm for searching in a set of similar sequences. The
design of the algorithm follows a tight analysis of the Morris and Pratt algorithm.
Recall that we use a suitable representation of data such that we take into account
a reference sequence with only positions of variations of the other sequences. The
searching time of our algorithm depends on the size of the reference text. However,
our solution works with a particular model, since we are limited to a certain gap
between consecutive variations. We will to improve our algorithm to overcome this
point. On the other hand, we aim to use variants of the Boyer-Moore algorithm [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
for greater eciency. The next steps include to take into account any kind of
variation between the reference sequence and the other sequences and to be able
to perform approximate pattern matching.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Aho</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Corasick</surname>
          </string-name>
          .
          <article-title>Ecient string matching: an aid to bibliographic search</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>18</volume>
          (
          <issue>6</issue>
          ):
          <fpage>333</fpage>
          -
          <lpage>340</lpage>
          ,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alatabbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Barton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Iliopoulos</surname>
          </string-name>
          .
          <article-title>On the repetitive collection indexing problem</article-title>
          .
          <source>In IEEE International Conference on Bioinformatics and Biomedicine Workshops</source>
          , pages
          <fpage>682</fpage>
          -
          <lpage>687</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alatabbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Barton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Iliopoulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Mouchard</surname>
          </string-name>
          .
          <article-title>Querying highly similar structured sequences via binary encoding and word level operations</article-title>
          .
          <source>In Artificial Intelligence Applications and Innovations</source>
          , pages
          <fpage>584</fpage>
          -
          <lpage>592</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Amir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewenstein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Porat</surname>
          </string-name>
          .
          <article-title>Faster algorithms for string matching with k mismatches</article-title>
          .
          <source>Journal of Algorithms</source>
          ,
          <volume>50</volume>
          (
          <issue>2</issue>
          ):
          <fpage>257</fpage>
          -
          <lpage>275</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Boyer</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Moore</surname>
          </string-name>
          .
          <article-title>A fast string searching algorithm</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>20</volume>
          (
          <issue>10</issue>
          ):
          <fpage>762</fpage>
          -
          <lpage>772</lpage>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Tung</surname>
          </string-name>
          .
          <article-title>Indexing DNA sequences using q-grams</article-title>
          .
          <source>In Database Systems for Advanced Applications</source>
          , pages
          <fpage>4</fpage>
          -
          <lpage>16</lpage>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Do</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jansson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sadakane</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.-K.</given-names>
            <surname>Sung</surname>
          </string-name>
          .
          <article-title>Fast relative lempelziv self-index for similar sequences</article-title>
          .
          <source>Theoretical Computer Science</source>
          ,
          <year>2014</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Faro</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Lecroq</surname>
          </string-name>
          .
          <article-title>The exact online string matching problem: a review of the most recent results</article-title>
          .
          <source>ACM Computing Surveys</source>
          ,
          <volume>45</volume>
          (
          <issue>2</issue>
          ):
          <fpage>13</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tam</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Yiu</surname>
          </string-name>
          .
          <article-title>Indexing similar DNA sequences</article-title>
          .
          <source>In Algorithmic Aspects in Information and Management</source>
          , pages
          <fpage>180</fpage>
          -
          <lpage>190</lpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Iliopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mouchard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Rahman</surname>
          </string-name>
          .
          <article-title>A new approach to pattern matching in degenerate DNA/RNA sequences and distributed pattern matching</article-title>
          .
          <source>Mathematics in Computer Science</source>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ):
          <fpage>557</fpage>
          -
          <lpage>569</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kuruppu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Puglisi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Zobel</surname>
          </string-name>
          .
          <article-title>Relative lempel-ziv compression of genomes for large-scale storage and retrieval</article-title>
          .
          <source>In String Processing and Information Retrieval</source>
          , pages
          <fpage>201</fpage>
          -
          <lpage>206</lpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Morris</surname>
          </string-name>
          , Jr and
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Pratt</surname>
          </string-name>
          .
          <article-title>A linear pattern-matching algorithm</article-title>
          .
          <source>Report 40</source>
          , University of California, Berkeley,
          <year>1970</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Na</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crochemore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Holub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Iliopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mouchard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>Sux tree of alignment: An ecient index for similar data</article-title>
          .
          <source>In Internat. Workshop On Combinatorial Algorithms</source>
          , pages
          <fpage>337</fpage>
          -
          <lpage>348</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Na</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lecroq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mouchard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>Suffix array of alignment: A practical index for similar data</article-title>
          .
          <source>In String Processing and Information Retrieval</source>
          , pages
          <fpage>243</fpage>
          -
          <lpage>254</lpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>