<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Structured Motifs Recognition in DNA sequences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuridia P. Mej´ıa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Olmos</string-name>
          <email>ivanoprklg@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesus A. Gonzalez</string-name>
          <email>jagonzalez@inaoep.mx</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>4 sur y Av. San Claudio</institution>
          ,
          <addr-line>Ciudad Universitaria, Puebla, Me ́xico</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Instituto Nacional de Astrof ́ısica</institution>
          ,
          <addr-line>O ́ptica y Electro ́nica, Luis Enrique Erro No. 1, Sta. Mar ́ıa Tonantzintla, Puebla, Me ́xico</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper is presented a methodology for structured motifs recognition (SMR) in DNA sequences. The SMR problem consists of finding all instances of a triple-pattern PL PC PR in a DNA sequence, where PL, PC and PR are based on the IUPAC alphabet, and PL and PR are both separated from PC by a distance no greater than ”n” characters, which is provided as input. In this problem an inexact association between PL, PC , PR and the DNA sequence is allowed, which is limited by an error based on the number of insertions, deletions, and substitutions operations. In this paper we propose a methodology for finding SMR patterns based in two stages: first, an automaton is used to search all PC instances (where only substitutions are only allowed); second, a dynamic programming technique is proposed to find the PL and PR patterns (where substitutions, insertions and deletions, are allowed) based on the Levenshtein algorithm. This methodology is useful to biologists in real DNA patterns recognition tasks, where it is necessary to find DNA regions with a biological meaning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The DNA motif search problem consists of finding a pattern P (the motif to search
for) from a text T (the DNA database created from the nucleotides alphabet: ”A” for
adenine, ”C” for cytosine, ”G” for guanine, and ”T” for thymine) where we output the
positions in T where P appears, the instances of P in T . In this problem, P is formed
from an extended alphabet defined under the IUPAC (International Union of Pure and
Applied Chemistry) rule where a character in this alphabet may represent more than
one nucleotide, this means that P may represent a set of patterns to search for in T .</p>
      <p>In Biology, the motif search problem is very important, because it allows us to
search or discover biological segments from an organism that is being analyzed. In
general, such motifs to search for are restricted to single motifs. However, there exist
biological problems where it is necessary to find motifs that are formed by two or more
single motifs (structured motifs, or SM for short), which are separated by a distance
defined by the biologist. In this work, we consider that a SM is formed by two or three
single motifs, represented by PL PC or PL PC PR respectively. In the biological
field, PC is called a ”central pattern or motif”, and PL, and PR are called satellites,
mini-satellites or micro-satellites, depending of their length. In both cases, and based on
biological restrictions, the SM recognition problem always start search of the PC motif,
which has an important meaning for the organism that is being analyzed. Because of its
importance, at the moment of searching PC in a DNA database, it is only possible to
allow minimal inexact associations, limited by a threshold in the number of substitutions
of characters considering the IUPAC alphabet. The number of such substitutions (the
threshold) is defined by the biologist. On the other hand, at the moment of searching a
satellite, more flexibility is allowed. This means that a satellite could be found after a
number of insertions, substitutions, and deletions in the DNA database has been applied
(considering a larger threshold).</p>
      <p>The problem addressed in this paper consists of finding motifs with the structure
PL PC PR, where PL is located at the left of PC and PR at the right of PC , and
both of them are located at a distance (#bases) no longer than d1 and d2 respectively,
which are provided as input parameters. As a solution to this problem, we propose a two
phases methodology: at the beginning we build an automaton that searches all instances
of PC in a DNA database, reporting their positions; after that, we propose a dynamic
programming strategy based on the Levenshtein algorithm, which search possible PL
and PR satellites in the DNA database restricted to a distance d1 and d2 respectively.</p>
      <p>The paper has the following structure: in section 2 we introduce important notation
used in this work. In section 3 we present the problem to be addressed. Our proposal for
finding the structured motifs is described in section 4. Finally, conclusions and future
work is presented in section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Definitions and notation</title>
      <p>In this section, we introduce the notation on strings, sequences, and automata, that we
use along this work.</p>
      <p>An alphabet, denoted by is a finite nonempty set of letters. A string on alphabet
is a finite subset of elements on , one letter after another. As an example, if
= fA; C; G; T g then, examples of strings on this alphabet are AAC,ACC, and AAG.
We denote by " the zero letter sequence, called the empty string. The length of a string
x is defined as the number of letters on x, denoted by jxj. With x[i], i = 1; : : : ; jxj, we
denote the letter at position i on x. A substring of a string is a string where: j j j j,
and [i] = [i + k], i = 1; : : : ; j j and 0 k j j j j.</p>
      <p>The concatenation of two strings x and y is the string composed of the letters of x
followed by the letters of y, denoted by xy. A string is a prefix of the string x, denoted
by @ x, if x = y. On the other hand, is a su x of x, denoted by A x, if x = y .
The empty string " is both a prefix and a su x of every string (not empty).</p>
      <p>In this work we denote a finite automaton M as a 5-tuple (Q; q0; A; ; ) where: Q is
a finite set of states, q0 2 Q is the start state, A Q is a distinguished set of accepting
states, is a finite input alphabet, and : Q x ! Q is the transition function of M.</p>
      <p>Let P be a pattern (string) to search for. In biochemistry, P is a string called a motif,
wherein all of its letters are defined by the union of two alphabets: a main alphabet</p>
      <p>B = fA; C; G; T g (every letter represents a nuceotide as described before), and the
IUPAC alphabet, (denoted as the extended alphabet in this work) used to represent
ambiguities in the pattern. The IUPAC established an ambiguity alphabet, represented
by E = fR; Y; K; M; S ; W; B; D; H; V; Ng, where R = fG; Ag, Y = fT; Cg, K = fG; T g,
M = fA; Cg, S = fG; Cg, W = fA; T g, B = fG; T; Cg, D = fG; A; T g, H = fA; C; T g,
V = fG; C; Ag, and N = fA; G; C; T g.</p>
      <p>Let X be a string, where X = AW B, A and B are substring of X separated by
the string W. The distance between two strings A and B is denoted by d(A;B), where
d(A;B) = jWj.</p>
      <p>A match between two strings A and B is a process where each character of A is
associated with a character of B. In this paper, we define three di erent associations
(matchings) between two strings: exact matching, exact set matching, and inexact
matching (for the sake of simplicity, consider the strings X; Y, and Z, where X and
Y are defined in B, and Z is defined in B S E):
T CAAG and Y = T CAG (jXj &gt; jY j), it is necessary to remove a character from
X. If X = , = T C, = A and = AG, then it is possible to remove from
X, generating a new string X0, such that X0 = Y .</p>
      <p>It is usual that the total number of insertions / deletions / substitutions in the inexact
matching process is limited to a percentage (permissible error) defined by the user. In
this work, we define the error of an inexact matching between two strings X and Y ,
denoted by (X;Y), as follows: (X;Y) = (#insY + #subY + #delY )=jXj, where Y is a string
derived from X through insertions (#insY ), substitutions (#subY ) and deletions (#delY ).</p>
      <p>Finally, we introduce the concept of a structured motif. This term is introduced
in this work because it is based on the composition of simple motifs. Formally, a
structured motif is a string X = PLW PC Z PR, where jWj = d1, jZj = d2 (this means
that PL and PR are located at a distance from PC no longer than d1 and d2 respectively),
PL is the left satellite, PC is the central pattern, PR is the right satellite, and PL, PC and,
PR are defined in B S E .</p>
      <p>As an example, consider the string X = A G T G A C G A C T C A, where PC =
ACG, PL = T G, PR = T C and d1 = 3 and d2 = 4. Then, in X we find a PC in position 5
- 7, a PL at position 3 (with d1 = 0) and PR at position 10 (with d2 = 2).</p>
      <p>In Section 3, we introduce a detailed description of our problem.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Problem Description</title>
      <p>In this section we describe in detail the input and output of the SMR problem that is
considered in this work.</p>
      <p>First, we describe the input of the problem. Let T be a DNA database, PL, PC ,
and PR three motifs to search for, where PC is a motif called ”central pattern”, PL
and PR called satellites (left and right respectively), two distances d1 (the distance
between PL and PC ) and d2 (the distance between PC and PR), and a percentage error
(#mismatches allow in a matching process between a substring of T and a satellite).
Based on these input parameters, the output of the SMR problem consists of finding all
substrings S of T , such that:
1. There exists a substring S of T such that S PC , and
2. There exists a left substring S 0 of T at a distance d1 of S such that S 0
3. There exists a right substring S 00 of T at a distance d2 of S such that S 00
4. S 0 and S 00 must both be generated within an error , ie, (PL;S 0)
(PR;S 00) .</p>
      <p>PI , and</p>
      <p>PD, and
and</p>
      <p>It is important to mention that this problem has some variations based on the input
parameters. These variants are described bellow:
– If the input of the SMR problem is: T , PL, PC , d1 and , then we need to find all
instances S and S 0 from T such that S PC , S 0 PI , d(S ; S 0) d1 and (PL;S 0) .</p>
      <p>This problem is called PL PC .
– If the input of the SMR problem is: T , PC , PR, d2 and , then we need to find all
instances S and S 0 from T such that S PC , S 0 PR, d(S ; S 0) d1 and (PR;S 0) .</p>
      <p>This problem is called PC PR.
– If the input of the SMR problem is: T , PL, PC, PR, d1, d2 and , then we need
to find all instances S , S 0 and S 00 from T such that S PC, S 0 PL, S 00 PR,
d(S ; S 0) d1, d(S ; S 00) d2, (PL;S 0) , and (PR;S 00) . This problem is called
PL</p>
      <p>PC</p>
      <p>PR</p>
      <p>For the sake of simplicity, in this paper we describe the solution of the PL - PC
problem. Note that in order tho solve the other two versions of the problem, we only
need to change the orientation of the distances, or combine the results of PL - PC and
PC - PR. In the next section we introduce a proposal to solve the PL - PC problem.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Proposal</title>
      <p>With the aim to solve the PL - PC problem, we propose a method divided in two phases:
– We first implement a searching phase, where all substrings S of T , S PC are
located. As result of this phase, we obtain a set C = fS : S is a substring of T , and
S PCg. To do this, we propose an automaton that searches for all instances of PC
in T . At the end of this phase, all results are stored in a matrix, including the end
position of each pattern.
– In a second phase, all S 2 C are processed, searching if there exists a substring S 0
of T (satellite), such that S 0 PL and d(S 0; S ) d1. Since insertions, deletions, and
substitution operations are needed in this phase, we propose a technique based on
a dynamic programming strategy. As output, this phase reports all central patterns
with their left-satellites.</p>
      <p>We proceed to explain in detail the way in which each phase is implemented.
4.1</p>
      <p>
        Searching the Central Patterns
As we mentioned above, we implement an automaton to search for all the instances of
PC in T . This automaton, called MFA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is based on the idea of storing in a temporal
memory each pattern that has previously been recognized during the searching phase
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In other words, this automaton implements a strategy that stores knowledge about
how the pattern matches overlap with itself, avoiding the computation of the prefix that
matches with the last largest su x. As result, in the searching phase, each character in
T is examined exactly once (linear time with respect to jT j). In our explanation and for
the sake of simplicity, consider that PC = AMS , PL = GK and a distance d1 = 4.
      </p>
      <p>The MFA automaton must be constructed from the pattern PC in a preprocessing
step before it can be used to search PC in T . This preprocessing phase is divided in
three stages: first, we perform a phase called expansion of PC, where all characters
from PC in E are substituted by characters in B (based on the IUPAC nomenclature);
then, in a second step we build a matrix that stores the states of the MFA automaton;
finally, in a third phase we generate the transition matrix of the MFA automaton.</p>
      <p>The expansion of PC is performed by a substitution process, where each character of
PC in E is ”expanded” with each valid character in B, restricted to the combinations
derived from the IUPAC specification. As result, we generate a set called setP, where
we store each combination of the pattern created with the substitution of characters. As
an example, if PC = AMS , where M = fA; Cg, S = fC; Gg, then M and S are replaced
by characters in B, from right to left. As final result, seqP = fAAC; ACC; AAG; ACGg.</p>
      <p>Based on seqP, the second phase consists on building a matrix called matQ, which
stores all states of the MFA automaton. With the aim to identify with precision all the
final states, this matrix is filled sequentially, from top to bottom (by rows), and from
left to right (by columns), where the number of columns is equal to jPC j (each column
corresponds to a character). The values assigned to the matrix start from 1, and with unit
increments. Each column is assigned a number of values that is equal to the product of
each character cardinality associated to the previous columns, including the current
column. The last column of this matrix stores all the final automaton states. Resuming
to our example, matQ for PC = AMS is filled as shown in Fig. 1.</p>
      <p>
        As final step of the preprocessing phase, we generate the transition matrix of the
MFA automaton. This matrix is filled row by row, its values depend of the largest su x
that is prefix of some element of seqP (for further reference, see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]).
      </p>
      <p>For example, consider vertex 2 of Fig. 2 with label ”AA”, which is obtained by
concatenating the letters in the path from the root to this vertex. From AA, we derive
a new expansion by adding at the end each character in B, obtaining the set of strings
AAA, AAC, AAT, and AAG. We then test these strings, searching their largest su x
that is a prefix for some element in seqP. In this example, AAC and AAG satisfy this
condition, since they are su xes of the first and third element of seqP respectively.</p>
      <p>As next step, we compute states that are reached from vertex 2 through transitions
with C and G. Each result is stored in [i; j], where i is the ith element of seqP associated
to the prefix, and j is the jth element of B (used in the respective transition). Resuming
to our example, from vertex 2 with transition throughout ”C” (third element of B),
we obtain the string AAC, which is su x of the first element of seqP. Therefore,
[2; 2] = matQ[k; l], where k is the kth element of seqP associated to the prefix, and
l is the cardinality of this prefix. In this example, k = 3 (AAC is a sufix of the first
element of seqP), and l = 3 (jAACj = 3). As result of this, from vertex 2 with label AA,
there exists a transition with C towards vertex 4. This process is used to compute each
transition for each state of the automaton.</p>
      <p>Before we continue and for the sake of simplicity in our explanation, we enumerate
in our example the rows of matQ starting from 1, and the rows of starting from 0.</p>
      <p>At the end of the preprocessing phase, the MFA automaton obtained from our
example is shown in Fig. 3.</p>
      <p>After that we build , the next step consists of search of each pattern from setP in
T . However, this phase is simple because is computed in linear time T , character by
character. Consider that our results are stored in a matrix called PCS npc;2, where npc is
the total number of patterns located in T . For simplicity, consider that the first column
of PCS stores each pattern, and the second column the position of the last character of
the pattern in T .</p>
      <p>As an example, if T = AT GGACAACC and is the automaton shown in Fig. 3, we
obtain the output illustrated in Fig 4. In this example, the circles are used to mark the
final states, which represent patterns found in the searching phase. In this example, the
found patterns are AAC and ACC. These results are stored in the PCS matrix illustrated
in Fig. 5.
The next step in our methodology consists of a process where based on a string S in
PCS , we search for substrings S 0 of T where S 0 PL, and d(S 0; S ) d1.</p>
      <p>Since PL, as the central patterns, may contain characters in E, we first replace these
characters by characters in B (in this phase we could use the same substitution process
explained before). Let  be a set that stores each pattern generated from S .</p>
      <p>Let PL = GK be the pattern to search for. After we replace each extended character
from PL, our output is  = fGG; GT g. We then search for each element of this set in a
segment of T , limited by the position of the central pattern and a left distance no larger
than d1. Resuming to our example where T = AT GGACAACC, PC = AMS and d1 = 4,
we have two possible central patterns, AAC and ACC. If we first process AAC as PC
and GG as PL, then we need to find all possible alignments as we show in Fig. 6.</p>
      <p>As we mentioned in section 4, it is possible to use insertions, deletions, and
substitutions in the searching phase of PL. The total number of these operations is
limited by , which is the error allowed by the user (this value is defined as a percentage
with respect to the dimension of PL).</p>
      <p>
        Since this problem is related to the string alignment problem [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], we propose a
solution based on a dynamic programming strategy with the Levenshtein distance [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
which determines the total number of insertions, deletions and substitutions that we
need to transform a string (original pattern) into other string (target pattern).
      </p>
      <p>As we see in Fig. 6, the first step consists of defining a start and end position where
we will search for PL in T . This gap is called a search window, and there is a window
search per each pattern in PCS . The limits of a search window are computed from the
initial position of each PC in PCS , where: Posstart = Posend jPCj + 1 (Posend value is
retrieved from PCS ). Based on our example, where PCS includes the central patterns
AAC and ACC, we obtain the following initial positions:
– AAC: Posstart = Posend
– ACC: Posstart = Posend
jAACj + 1 = 9
jAACj + 1 = 10</p>
      <p>Based on Posstart, we compute the initial and final positions of each search window
of each PL in , where: Wini = Posstart d1 jPLj, and Wend = Posstart 1. In our
example, where PL = GK,  = fGG; GT g and d1 = 4 we compute these positions per
each PC :
– AAC: Wini = Posstart
– ACC: Wini = Posstart</p>
      <p>Note that if we do not implement an intelligent strategy to search for possible
alignments, we need to test a character more than once if there exist intersections
between two or more search windows. In order to avoid this computation, we compute
the intersection between two search windows. This value is stored in a variable called
Mem, where</p>
      <p>Mem =
( Wepnrdevious
0</p>
      <p>Wsctuarrrtent if Wepnrdevious &gt; Wsctuarrrtent</p>
      <p>otherwise</p>
      <p>Wepnrdevious is the end position of the previous search window and Wsctuarrrtent is the start
position of the current search window. For our example of Fig. 7, we compute Mem per
each central pattern as follows:
– AAC: Wepnrdevious
– ACC: Wepnrdevious</p>
      <p>Wsctuarrrtent = 0
Wsctuarrrtent = 6
1, then Mem = 1
2, then Mem = 4</p>
      <p>
        The next step consists of building a matrix that stores the total number of insertions,
deletions, and substitutions which are needed to align PL with a segment of characters
into the current search window. This matrix, denoted by D(m+1) (n+1), has m + 1 rows,
where m = jS 0j (S 0 is an element of ), and n + 1 is the total number of columns (where
n is the cardinality of the search window associated to the current central pattern). This
matrix is filled with the Levenshtein algorithm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
– D[0; j] = 0, where 0
– D[i; 0] = i, where 0
–
i
j
      </p>
      <p>n
m
D[i; j] =
(</p>
      <p>D[i 1; j 1] if k;i = T [ j]</p>
      <p>D[i 1; j 1] + 1 otherwise
where k;i is the ith character of the kth element of , and T [ j] is the jth character of
the search window that is currently being processed. These rules are used to fill each D
for each element in . When we changed of search window, we compute Mem to know
if there exist columns in D that are a copy of the matrix of the previous search window
(these columns are a copy starting from the second column of the current matrix D).</p>
      <p>Resuming to our example, consider the case of ACC with GG as PL with Mem = 4.
Therefore, the current matrix D is filled as follows: the first column is computed based
on the Levenshtein algorithm; the next four columns are copied from the matrix of AAC
with GG as PL; the last column is also filled based on the Levenshtein algorithm. This
example is shown in Fig. 8.</p>
      <p>Finally, these matrices are used to find possible alignments based on the error .
The procedure is simple: we select all D[jPLj; j], where D[jPLj; j] , and j is the final
position of a valid left pattern of the corresponding search window. As an example, if
we consider that = 50%, then we need to find all positions where D[jPLj; j] 1,
because jPLj = 2. In our example of Fig. 8, there exist three valid results for AAC and
GG as left pattern: D1[2; 2] = 1, D1[2; 3] = 0, D1[2; 4] = 1; and three valid results for
ACC and GG as left pattern: D4[2; 1] = 1, D4[2; 2] = 0, and D4[2; 3] = 1. With this
information, we retrieve the left patterns based on T = AT GGACAACC:
– Since the search window for PC = AAC and GG as left pattern is AT GGAC, then
our left patterns are located at the end positions: 2 (TG is transformed in to GG
with one substitution), 3 (corresponding for the pattern GG), 4 (GA, is transformed
to GG with one substitution)
– Since the search window for PC = ACC and GG as left pattern is T GGACA, then
all patterns for this search window are the same as the previous window (they are
not computed again).</p>
      <p>As we can see in this example, our proposal has the following advantages: we can
find the central patterns very fast because we avoid computing any character in T twice.
With a dynamic programming strategy based on the Levenshtein algorithm, it is possible
to improve the runtime taken to find the left patterns, we avoid computing common
regions twice. However, it is important to see that this approach needs to generate a
matrix per each possible left pattern.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>This paper presents a proposal for solving the structure motive recognition in a DNA
sequence. Our approach is divided in two phases: first, we locate all central patterns
with an automation, that performs this task very fast; second, we implement a dynamic
programming strategy and the Levenshtein algorithm with the aim to find satellites that
are located at a distance no larger than d. Actually, we implemented and tested the
automaton and found that its performance is very fast, allowing us to work with large
DNA databases.</p>
      <p>We are currently implementing the dynamic programming strategy with the
Levenshtein algorithm proposed in this work. After testing this phase, we will develop
a tool that can be used by biologists that need to find structured patterns.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bofivoj</surname>
          </string-name>
          , Melichar.
          <article-title>Approximate string matching by finite automata</article-title>
          .
          <source>In Conf. on Analysis of Images and Patterns, number 970 in LNCS. Pages</source>
          <volume>342</volume>
          -
          <fpage>349</fpage>
          .
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cormen</surname>
          </string-name>
          , Thomas H.
          <article-title>Introduction to Algorithms</article-title>
          .
          <source>2nd Edition</source>
          .
          <source>The MIT Press and McGrawHill. Pages 805-811</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Maxime</given-names>
            <surname>Crochemore</surname>
          </string-name>
          and
          <string-name>
            <surname>Marie-France</surname>
            <given-names>Sagot</given-names>
          </string-name>
          ,
          <source>Motifs in Sequences: Localization and Extraction. Pages 26-31</source>
          ,
          <year>2000</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Navarro</surname>
            ,
            <given-names>Gonzalo.</given-names>
          </string-name>
          <article-title>A Guide Tour to Approximate String Matching</article-title>
          .
          <article-title>ACM Computing Surveys</article-title>
          . vol.
          <volume>33</volume>
          , Issue 1. Pages.
          <volume>31</volume>
          -
          <fpage>88</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Perez</surname>
          </string-name>
          , Gerardo et al.
          <article-title>An Automaton for Motifs Recognition in DNA Sequences</article-title>
          . To appear
          <source>in MICAI 2009</source>
          , Springer - Verlag.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>