An Extension of Linear-size Suffix Tries for
              Parameterized Strings

        Katsuhito Nakashima, Diptarama Hendrian, Ryo Yoshinaka, and
                             Ayumi Shinohara

          Graduate School of Information Sciences, Tohoku University, Japan
                  katsuhito nakashima@shino.ecei.tohoku.ac.jp
                  {diptarama,ryoshinaka,ayumis}@tohoku.ac.jp


        Abstract. In this paper, we propose a new indexing structure for pa-
        rameterized strings which we call PLSTs, by generalizing linear-size suffix
        tries for ordinary strings. Two parameterized strings are said to match if
        there is a bijection on the symbol set that makes the two coincide. PLSTs
        are applicable to the parameterized pattern matching problem, which is
        to decide whether the input parameterized text has a substring that
        matches the input parameterized pattern. The size of PLSTs is linear in
        the text size, with which our algorithm solves the parameterized pattern
        matching problem in linear time in the pattern size. PLSTs can be seen
        as a compacted version of parameterized suffix tries and a combination
        of linear-size suffix tries and parameterized suffix trees. We experimen-
        tally show that PLSTs are more space efficient than parameterized suffix
        trees for highly repetitive strings.

        Keywords: indexing data structure · linear-size suffix trie · parameter-
        ized pattern matching.


1     Introduction
The pattern matching problem is to check whether a pattern string occurs in a
text string or not. To efficiently solve the pattern matching problem, a numerous
number of text indexing structures have been proposed. Suffix trees are the most
widely used data structures and provide many applications including several
variants of pattern matching problems [5,10]. They can be seen as a compacted
type of suffix tries, where two branching nodes that have no other branching
nodes between them in a suffix trie are directly connected in the suffix tree. The
new edges have a reference to an interval of the text so that the original path
label of the suffix trie can be recovered. Recently, Crochemore et al. [4] proposed
a new indexing structure, called a linear-size suffix trie (LST), which is another
compacted variant of a suffix trie. An LST replaces paths consisting only of
    This research was partially supported by JSPS KAKENHI Grant Numbers
    JP15H05706 and JP19K20208.
    Copyright 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).


                                           97
98      K. Nakashima, D. Hendrian, R. Yoshinaka and A. Shinohara

non-branching nodes by edges like a suffix tree, but the original path labels
are recovered by referring to other edge labels in the LST itself unlike suffix
trees. LSTs use less memory space than suffix trees for indexing the same highly
repetitive strings [14]. LSTs may be used as an alternative of suffix trees for
various applications, like computing the longest common substrings, not limited
to the pattern matching problem.
    On the other hand, different types of pattern matching have been proposed
and intensively studied. The variant this paper is concerned with is the param-
eterized pattern matching problem, introduced by Baker [3]. Considering two
disjoint sets of symbols Σ and Π, we call a string over Σ ∪ Π a parameterized
string (p-string). In the parameterized pattern matching problem, given p-strings
T and P , we must check whether substrings of T can be transformed into P by
applying a one-to-one function that renames symbols in Π. The parameterized
pattern matching is motivated by applying to the software maintenance [2,3], the
plagiarism detection [8], the analysis of gene structure [13], and so on. Similarly
to the basic string matching problem, several indexing structures that support
the parameterized pattern matching have been proposed, such as parameterized
suffix trees [3], structural suffix trees [13], parameterized suffix arrays [6,12], and
parameterized position heaps [7,9].
    In this paper, we propose a new indexing structure for p-strings, which we
call PLST. A PLST is a tree structure that combines a linear-size suffix trie
and a parameterized suffix tree. We show that the size of a PLST is O(n) and
give an algorithm for the parameterized pattern matching problem for given a
pattern and a PLST, to find the occurrences of a pattern in the text, that runs in
O(m) time, where n is the length of the text and m is the length of the pattern.
Furthermore, we experimentally show that PLSTs are more space efficient than
parameterized suffix trees for highly repetitive strings such as Fibonacci strings.


2     Preliminaries
2.1   Basic definitions and notation
We denote the set of all non-negative integers by N . Let ∆ be an alphabet. For
a string w = xyz ∈ ∆∗ , x, y, and z are called prefix, substring, and suffix of
w, respectively. The length of w is denoted by |w| and the i-th symbol of w is
denoted by w[i] for 1 ≤ i ≤ |w|. The substring of w that begins at position i and
ends at position j is denoted by w[i : j] for 1 ≤ i ≤ j ≤ |w|. For convenience, we
abbreviate w[1 : i] to w[: i] and w[i : |w|] to w[i :] for 1 ≤ i ≤ |w|. The empty
string is denoted by ε, that is |ε| = 0. Moreover, let w[i : j] = ε if i > j. For a
string u and an extension uv, we write str(u, uv) = v.
    Throughout this paper, we fix two alphabets Σ and Π. We call elements of
Σ constant symbols and those of Π parameter symbols. An element of Σ ∗ is
called a constant string and that of (Σ ∪ Π)∗ is called a parameterized string, or
p-string for short. We assume that the size of Σ and Π are constant.
    Given two p-strings w1 and w2 of length n, w1 and w2 are a parameterized
match (p-match), denoted by w1 ≈ w2 , if there is a bijection f on Σ ∪ Π such
            An Extension of Linear-size Suffix Tries for Parameterized Strings       99


that f (a) = a for any a ∈ Σ and f (w1 [i]) = w2 [i] for all 1 ≤ i ≤ n [3]. We can
determine whether w1 ≈ w2 or not by using an encoding called prev-encoding
defined as follows.

Definition 1 (Prev-encoding [3]). For a p-string w of length n over Σ ∪ Π,
the prev-encoding for w, denoted by prev(w), is defined to be a string over Σ ∪N
of length n such that for each 1 ≤ i ≤ n,

            
            w[i]
                        if w[i] ∈ Σ,
prev(w)[i] = 0           if w[i] ∈ Π and w[i] 6= w[j] for 1 ≤ j < i,
            
             i−k         if w[i] ∈ Π and k = max{j | w[j] = w[i] and 1 ≤ j < i}.
            

      We call strings over Σ ∪ N pv-strings.

   For any p-strings w1 and w2 , w1 ≈ w2 if and only if prev(w1 ) = prev(w2 ).
For example, given Σ = {a, b} and Π = {u, v, x, y}, s1 = uvvvauuvb and
s2 = xyyyaxxyb are p-matches by f such that f (u) = x and f (v) = y, where
prev(s1 ) = prev(s2 ) = 0011a514b.
   We define parameterized pattern matching as follows.
Definition 2 (Parameterized pattern matching [3]). Given two p-strings,
text T and pattern P , decide whether T has a substring that p-matches P .
For example, considering a text T = auvaubuavbv and a pattern P = xayby
over Σ = {a, b} and Π = {u, v, x, y}, T has two substrings T [3 : 7] = vaubu and
T [7 : 11] = uavbv that p-match P .
    Throughout this paper, we assume that a text T ends with a sentinel symbol
$ ∈ Σ, which occurs nowhere else in T .


2.2     Suffix tries, suffix trees, and linear-size suffix tries

This subsection briefly reviews tree structures for indexing all the substrings of
a constant string T ∈ Σ ∗ .
    The suffix trie STrie(T ) is a tree with nodes corresponding to all the substrings
of T . Figure 1 (a) shows an example of a suffix trie. Throughout this paper, we
identify a node with its corresponding string for explanatory convenience. Note
that each node does not explicitly remember its corresponding string. For each
nonempty substring ua of T where a ∈ Σ, we have an edge from u to ua labeled
with a. Then by reading the labels on the path from the root to a node u, one can
obtain the string u the node corresponds. Then the path label from the node
u to a descendant uv is str(u, uv) = v for u, v ∈ Σ ∗ . Since there are Θ(|T |2 )
substrings of T , the size of STrie(T ) is Θ(|T |2 ).
    The suffix tree STree(T ) is a tree obtained from STrie(T ) by removing all
non-branching internal nodes and replacing each path with no branching nodes
by a single edge whose label refers to a corresponding interval of the text T .
That is, the label on the edge (u, v) is a pair (i, j) such that T [i : j] = str(u, v).
100     K. Nakashima, D. Hendrian, R. Yoshinaka and A. Shinohara


                                         b a               $                                                       b a              $
                                                               8                                                                        8
                                     a        b    a       $                                                   a        b    a      $
                                                                   7                                                                        7
                             a                a            b   $                                       a                a           b   $
                                 $                                         6                   b           $            a               a               6
                     b                        a                a
                                 5                                                                         5              +
                 a                           b     $               a                       a                           b    $               a
                                                                                                                                                +
             a                           a             4               $               a                           a            4                   $
         $                                                             3       +   $                       a                                        3
                                 a
        2                                                                      2
                             $                                                                     +
                                                                                                       $
                         1                                                                         1


                                             (a)                                                                       (b)

Fig. 1. (a) The suffix trie for T = abaabaa$. (b) The LST for T . Solid and broken
arrows represent the edges and suffix links, respectively. The LST keeps the first symbol
(black) on each edge, while the succeeding symbols (orange) are discarded. Big white
and small black circles represent nodes of Type 1 and Type 2, respectively. The ‘+’
signs represent the 1-bit flag. If a node v has ‘+’ sign, the edge (u, v) has a path label
of length greater than 1 in STrie(T ) where u is the parent node of v in LST(T ).


Since there are at most O(|T |) branching nodes, the size of STree(T ) is Θ(|T |).
An important auxiliary map on nodes is called suffix links, denoted by SL, which
is defined by SL(aw) = w for each node aw with a ∈ Σ and w ∈ Σ ∗ .


    The linear-size suffix trie (LST) [4] LST(T ) of a string T is another com-
pact variant of a suffix trie (see Figure 1 (b)). An LST suppresses (most) non-
branching nodes and replaces paths with edges like a suffix tree, but the labels
of those new edges do not refer to intervals of the input text. Each edge (u, v)
retains only the first symbol str(u, v)[1] of the original path label str(u, v). To
recover the original label str(u, v), we refer to another edge or a path in the LST
itself following a suffix link, using the fact that str(u, v) = str(SL(u), SL(v)).
The reference will be recursive, but eventually one can regain the original path
label by collecting those retained symbols. For this sake, LST(T ) keeps some
non-branching internal nodes from STrie(T ) and thus it may have more nodes
than STree(T ), but still the size is linear in |T |. The nodes of LST(T ) consist of
those of STree(T ) and non-branching node whose suffix links point at a branch-
ing node. We call the former Type 1 and the latter Type 2. Each edge (u, v) has
a 1-bit flag that tells whether |v| − |u| = 1. If it is the case, one knows the com-
plete label str(u, v) = str(u, v)[1]. Otherwise, one needs to follow the suffix link
to regain the other symbols. An LST uses suffix links to regain the original path
label in the suffix trie. If we had only Type 1 nodes, for some edge (u, v), there
may be a branching node between SL(u) and SL(v), which makes it difficult to
regain the original path label. Having Type 2 nodes, there is no branching node
between SL(u) and SL(v) for every edge (u, v). Then it is enough to go straight
down from SL(u) to regain the original path label.
                 An Extension of Linear-size Suffix Tries for Parameterized Strings                                                                                                                  101

                      [13,13,13]                                                                                                    $
                              [1,1,1]                                                                                                           0                 b
            13
                                                                    [4,5,4]                                    13                                         a
                     [13,13,12]                                                                                             $                                             0
                                                                                                                                                a             b
                12
                                   [2,2,1]                                                                         12                       0
                                              [2,5,2]
                                                            [3,6,3]
                                                                 [13,13,11]                  [6,6,4]
                                                                                                                        $                           b             0               $        0
       [11,13,10]
                                                                                                                                b       a
           10    [11,13,9]       [3,4,1]                                      11                              10                                     0                            11
                                                                                                                            3                                         0
                                                                                                                                        b
                                                                                                                        $                             0                                   b    b
                                                                                                  [11,13,8]
                                                                                                                                                                                                3
                      9                                                                                                 9
                                                                                                                            0           4                           b     b              4
                                                                                                                                                                                                 $
                                                                                                                                                  b b
                                                                       11,13,7]
                                                   [11,13,6]                                                               0                                       4       3            0
                                  [9,13,5]                                 [7,13,4]                    8                  a             0                                                        8
                                                                                                                                                  4 3              0        $           b
                                                        [7,13,3]                                                         b               b            $
                [5,13,1]               [7,13,2]                           7
                                                                                                                                                 0                b           7        3
                                                        6                                                               4                3       a    6
                                                                                                                       9                  $                       3                    $
                                                                                                                                                3                 $
                                   5                                                    4                             b                  5      $                                      4
                                                                3                                                    3                                                3
                                              2                                                                                                     2
                                                                                                                    $
                 1                                                                                                  1


                             1     2      3    4   5        6   7     8   9        10       11   12   13
                                                                                                                                                          (b)
           prev(𝑇) = 0             0 a b 0 0 a b 4 9 b 3 $

                                                        (a)

Fig. 2. (a) The parameterized suffix tree PSTree(T ) for T = xyabzwabzxbz$ where
Σ = {a, b,$} and Π = {x, y, z, w}. (b) The referenced substrings are shown on edges.
Broken blue arrows denote suffix links. Some suffix links do not point to a branching
node shown with a bold broken arrow.


2.3   Parameterized suffix tries and parameterized suffix trees
For a p-string T ∈ (Σ ∪ Π)∗ , a prev-encoded substring (pv-substring) of T is the
prev-encoding prev(w) of a substring w of T . The set of pv-substrings of T is
denoted by PrevSub(T ).
    A parameterized suffix trie of T , denoted by PSTrie(T ), is the trie that rep-
resents all the pv-substrings of T . The size of PSTrie(T ) is Θ(|T |2 ).
    For a pv-string u ∈ (Σ ∪ N )∗ , the k-re-encoding for u, denoted by huik , is
defined to be the pv-string of length |u| such that for each 1 ≤ i ≤ |u|,
                            (
                             0    if u[i] ∈ N and u[i] ≥ i − k + 1,
                 huik [i] =
                             u[i] otherwise.

When k = 1, we omit k. We then have hprev(w)[i : j]i = prev(w[i : j]) for any
p-string w ∈ (Σ ∪ Π)∗ and i, j ≤ |w|.
    Usually suffix links are defined on nodes of suffix trees, but it is convenient
to have “implicit suffix links” on all nodes except the root of STrie(T ), i.e., all
the nonempty substrings of T , as well. For a nonempty pv-string u ∈ (Σ ∪ N )+ ,
let sl(u) denote the re-encoding hu[2 :]i of the string obtained by deleting the
first symbol. This operation on strings will define real suffix links in index-
ing structures for parameterized strings based on parameterized suffix tries.
Differently from constant strings, u ∈ PrevSub(T ) does not necessarily imply
u[2 :] ∈ PrevSub(T ). What we actually have is sl(u) = hu[2 :]i ∈ PrevSub(T ).
    A parameterized suffix tree (p-suffix tree) [3] of T , denoted by PSTree(T ), is
a compacted variant of the parameterized suffix trie. Figure 2 shows an example
of a p-suffix tree. Like the suffix tree for a constant string over Σ, PSTree(T ) is
obtained from PSTrie(T ) by removing non-branching internal nodes and giving
each edge as a label a reference to some interval of the prev-encoded text prev(T ).
The reference is represented by a triple (i, j, k) of a text start position, end
position, and suffix number, which refers to the pv-string hprev(T )[k :]i[i : j].
102     K. Nakashima, D. Hendrian, R. Yoshinaka and A. Shinohara

3     PLSTs
We now introduce our indexing tree structures for p-strings, which we call
PLSTs, based on LSTs and p-suffix trees reviewed in Sections 2.2 and 2.3. There
are two difficulties in extending LSTs to deal with p-strings. Figure 3(a) shows
the LST-like structure obtained from PSTrie(T ) in the same way as LST(T ) is ob-
tained from STrie(T ). We want to know str(u, v) for an edge (u, v) by “reduction
by suffix links”, but
 1. it is not necessarily that str(u, v) = str(sl(u), sl(v)),
 2. there can be a branching node u of PSTrie(T ) such that sl(u) is not branching.
An example edge (u, v) exhibiting the first difficulty consists of u = 00 and
v = 00b3$, where str(u, v) = b3$ but str(sl(u), sl(v)) = b0$. This is caused by
the fact that sl(u) = hu[2 :]i rather than sl(u) = u[2 :]. Then, the path label
str(sl(u), sl(v)) referenced by the suffix link may not give exactly what we want.
We solve this problem by giving the node v a “re-encoding sign” with which
one can recover str(u, v) from str(sl(u), sl(v)). An example node for the second
case is 00ab. This is a branching node but hsl(00ab)i = 0ab does not appear
as a node. To handle this case, we simply refer to the corresponding interval
of the original text T by keeping the necessary subsequence, where, as we will
observe in experiments, the necessary subsequence tends to be rather small. Our
proposed structure PLST is shown in Figure 3(b). In what follows we explain
PLSTs.

3.1   Definition and properties of PLSTs
Let U = PrevSub(T ) be the set of nodes of PSTrie(T ). The set V of nodes of the
PLST PLST(T ) for T is a subset of U , which is partitioned as V = V1 ∪ V2 ⊆ U .
Nodes in Vi are called Type i for i = 1, 2. The definition of Type 1 and Type 2
nodes follows the one for original LSTs [4].
 1. A node u ∈ U is Type 1 if u is a leaf or a branching node in PSTrie(T ).
 2. A node u ∈ U is Type 2 if u ∈
                                / V1 and sl(u) ∈ V1 .
Edges of PLST(T ) are trivially determined: we have (u, uv) ∈ V × V as an edge
if and only if v 6= ε and there is no proper nonempty prefix v 0 of v such that
uv 0 ∈ V . We will show in Section 3.3 that |V | ∈ O(|T |). We say that u ∈ V is
good if sl(u) ∈ V , and u ∈ V is bad otherwise. Note that any u ∈ V2 is good by
the definition of V2 , and that the root ε is bad.
    To obtain str(u, v) for an edge (u, v), if |v| − |u| = 1, we simply read the edge
label v[1] like an LST. Otherwise, if both u and v are good, we basically use
the technique of “reduction by suffix links”. An important observation is that
the equation str(u, v) = str(sl(u), sl(v)), which was a key property to regain the
original label in LSTs, does not necessarily hold for PLSTs. Figure 4 shows an
example, where str(u, v) = cb40 6= cb00 = str(sl(u), sl(v)); the third symbol 4 in
str(u, v) is re-encoded to 0 in str(sl(u), sl(v)), because the first symbol v[1] = 0 of
v, that is referenced by the symbol 4, is cut out in sl(v). Fortunately, the possible
difference between str(u, v) and str(sl(u), sl(v)) is limited.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             10
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              12


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         9


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  13


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           9
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 $


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 $


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   3


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     $


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       $


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        b


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      b a


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             5
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  $


                                                                                                                                                                                                                                                                        Re(v) =
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      b 0 a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            b


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             $


                                                                                                                                                                                                                                                                                (
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             4


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              0


                                                                                                                                                                                                                                                                                  0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   a


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   6
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        b


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               (a)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   b


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                a b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               b 3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    7
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   11
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       sha1_base64="nT83JBfhK2+73gEpvpKhjMqf3xo=">AAADo3icrVJNaxNRFL3p2A9bbZO6EdyEhoircJMulEKh1I3gps1HU2hDmBlf0kfni5k3qXHIXvoHXIgLBRXxZ7jxDyjUf1BcVujGhWdeBkpb2m58w8zce+49993z3rUCR0aK+Sg3YdyanJqeuT07d+fu/EK+sLgV+XFoi5btO364bZmRcKQnWkoqR2wHoTBdyxFta/9pGm8PRBhJ32uqYSA6rtn3ZE/apgLUzRd2lXipQjdZKTYRLtZG3XyJK6xX8bJRzYwSZWvDL+RWaZdekE82xeSSII8UbIdMivDsUJWYAmAdSoCFsKSOCxrRLLgxsgQyTKD7+Pbh7WSoBz+tGWm2jV0cvCGYRSrzD/7CJ/ydv/Ix/72yVqJrpL0M8bfGXBF0Fw7vN05vZLn4K9o7Y13D6KMzE+ieVhjdoE9Rj55oXRI6A42kiu3xXoNXb04aK/Vy8pA/8G9ofc9H/A1qvcEf++OmqL+9ppMItc86T/NCXVvQgT5FV0c93FuCWE9rlTjbIRCh/RieAjeNp9Uu4ujwyt1jfbvuOKvxkw/d19we/fr/XWBaqxdn87KxVatUlyu1zVppbT2b2xl6QEv0CLP5mNboGW1QCx0d0Dv6RJ+NsvHcqBvNcepELuPco3PL6PwDeN3PjA==</latexit>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       <latexit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            sha1_base64="LuB3EBJ/eyMmz80ZxU+1bojZ0Dc=">AAADo3icrVJLaxNRFD7p+Kj10UQ3BTfBEHEVzqQLpSAU3Qhu2jyaQhvCzHiTXjovZu6kxiF76R9wIS4qtFL6M7rpH1Co/0BcVnDjwm9uBootbTfeYWbO+c75zj3fvccOXRkr5uPClHHt+o2b07dmbt+5e2+2WLq/EgdJ5Ii2E7hBtGpbsXClL9pKKleshpGwPNsVHXvzZRbvDEUUy8BvqVEoup418GVfOpYC1CuW1pV4qyIvXSi3EC6b416xwjXWq3zeMHOjQvlaCkqF57RObygghxLySJBPCrZLFsV41sgkphBYl1JgESyp44LGNANugiyBDAvoJr4DeGs56sPPasaa7WAXF28EZpmq/JX3+YSP+IB/8J8La6W6RtbLCH97whVhb3Z7rvn7SpaHv6KNU9YljAE6s4BuaIXxFfoU9emZ1iWhM9RIptiZ7DV89+GkudCopo/5M/+E1h0+5kOo9Ye/nN1l0fh4SScxap92nuVFuragLX2Kno76uLcUsb7WKnG2IyBC+wk8BW4Wz6qdxdHhhbsn+na9SVbzG29777kz/v7/u8C0mmdn87yxUq+Z87X6cr2y+CKf22l6SI/oCWbzKS3SK1qiNjraok+0R1+MqvHaaBitSepUIec8oH+W0f0Lda/Piw==</latexit>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           <latexit
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   3


                                                                                                                                                                                       Observation 1 and Definition 3.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          8
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    $


                                                                                                                                                                                                                                                                                          otherwise.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       : Type 2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           : Type 1


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         2


                                                                                                                                                                                                                                                                     of v. We define re-encoding sign to v by
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             10
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              12


                                                                                                                                                                                                                                                                                                                                                                                                                                           str(u, v) for each edge (u, v) in PLST(T ).
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         9


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  13


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           9
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 $


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 $


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  0 0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      $


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              [3,4,-1]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       $


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         b


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         a


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          5
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               [5,6,1]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       b 0 a


                                                                                                                                                                                                                                                                                                                                                        can regain str(u, v) from str(sl(u), sl(v)) as follows.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                b


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     𝑇′
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           $ 3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                4


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         [1,2,-1] 0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         0

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   a


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                6


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   = a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        b


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               $0 0  4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             (b)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   b


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      a b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     b 3


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          7
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   11


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               5
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        $


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               6
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  4


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   b 0 0 4 9
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   i


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         a
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      0


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   sha1_base64="QotbNwZHZ3eN+lbjUueMyYq8R80=">AAADrXicrVLNTttAEJ7g/lD6Q4BLJS5RI6pemk7SAwgJCdFLj5A0BInQyDYTs8JeW951IFg5cKt4gR56aqWqqrjQZ+ilL9BK9A2qHqnUSw8dbyyhgoBL17I98818s/PtjhP5QmnE48KIde36jZujt8Zu37l7b7w4MbmqwiR2qemGfhivObYiX0hqaqF9WotisgPHp5az/SyLt3oUKxHKF7of0UZge1J0hWtrhjrF6bamXR0H6XypTo9JuuGmkF5JCU8OOsUyVtCs0nmjmhtlyNdyOFFYgDZsQgguJBAAgQTNtg82KH7WoQoIEWMbkDIWsyVMnGAAY8xNOIs4w2Z0m78ee+s5KtnPairDdnkXn9+YmSWYwa/4EU/wCx7iD/xzYa3U1Mh66fPfGXIp6owf3G/8vpIV8F/D1inrEobHndmMbhmF6gp9GrowZ3QJ1hkZJFPsDvfq7b0+aczXZ9KH+A5/sta3eIyfWa3s/XLfr1D9zSWdKK592nmWF5vaBDvmFAMTlXxvKce6Rqvgs+0zQsZP2NPMzeJZtbM4d3jh7om53WCY1fiGB8ErbA2+//8ueFqrZ2fzvLFaq1SfVmortfLiUj63ozAND+ARz+YsLMJzWIYmd7QPH+AIPllPrKbVtl4OU0cKOWcK/lmW9xcbkNP4</latexit>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   <latexit
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       sha1_base64="nT83JBfhK2+73gEpvpKhjMqf3xo=">AAADo3icrVJNaxNRFL3p2A9bbZO6EdyEhoircJMulEKh1I3gps1HU2hDmBlf0kfni5k3qXHIXvoHXIgLBRXxZ7jxDyjUf1BcVujGhWdeBkpb2m58w8zce+49993z3rUCR0aK+Sg3YdyanJqeuT07d+fu/EK+sLgV+XFoi5btO364bZmRcKQnWkoqR2wHoTBdyxFta/9pGm8PRBhJ32uqYSA6rtn3ZE/apgLUzRd2lXipQjdZKTYRLtZG3XyJK6xX8bJRzYwSZWvDL+RWaZdekE82xeSSII8UbIdMivDsUJWYAmAdSoCFsKSOCxrRLLgxsgQyTKD7+Pbh7WSoBz+tGWm2jV0cvCGYRSrzD/7CJ/ydv/Ix/72yVqJrpL0M8bfGXBF0Fw7vN05vZLn4K9o7Y13D6KMzE+ieVhjdoE9Rj55oXRI6A42kiu3xXoNXb04aK/Vy8pA/8G9ofc9H/A1qvcEf++OmqL+9ppMItc86T/NCXVvQgT5FV0c93FuCWE9rlTjbIRCh/RieAjeNp9Uu4ujwyt1jfbvuOKvxkw/d19we/fr/XWBaqxdn87KxVatUlyu1zVppbT2b2xl6QEv0CLP5mNboGW1QCx0d0Dv6RJ+NsvHcqBvNcepELuPco3PL6PwDeN3PjA==</latexit>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       <latexit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            sha1_base64="LuB3EBJ/eyMmz80ZxU+1bojZ0Dc=">AAADo3icrVJLaxNRFD7p+Kj10UQ3BTfBEHEVzqQLpSAU3Qhu2jyaQhvCzHiTXjovZu6kxiF76R9wIS4qtFL6M7rpH1Co/0BcVnDjwm9uBootbTfeYWbO+c75zj3fvccOXRkr5uPClHHt+o2b07dmbt+5e2+2WLq/EgdJ5Ii2E7hBtGpbsXClL9pKKleshpGwPNsVHXvzZRbvDEUUy8BvqVEoup418GVfOpYC1CuW1pV4qyIvXSi3EC6b416xwjXWq3zeMHOjQvlaCkqF57RObygghxLySJBPCrZLFsV41sgkphBYl1JgESyp44LGNANugiyBDAvoJr4DeGs56sPPasaa7WAXF28EZpmq/JX3+YSP+IB/8J8La6W6RtbLCH97whVhb3Z7rvn7SpaHv6KNU9YljAE6s4BuaIXxFfoU9emZ1iWhM9RIptiZ7DV89+GkudCopo/5M/+E1h0+5kOo9Ye/nN1l0fh4SScxap92nuVFuragLX2Kno76uLcUsb7WKnG2IyBC+wk8BW4Wz6qdxdHhhbsn+na9SVbzG29777kz/v7/u8C0mmdn87yxUq+Z87X6cr2y+CKf22l6SI/oCWbzKS3SK1qiNjraok+0R1+MqvHaaBitSepUIec8oH+W0f0Lda/Piw==</latexit>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           <latexit
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     discarded. The reference text T 0 = ab0049 is shorter than the original T .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  b
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   3

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          8
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    $
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       : Type 2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           : Type 1


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               An Extension of Linear-size Suffix Tries for Parameterized Strings


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   : Re-encoding sign


                                                                                                                                                                                                                                                                                  i − |u| if there exists i such that v[i] = i − 1 and |u| < i ≤ |v|,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     only the first symbol (black) on each edge, while the succeeding symbols (orange) are
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     2, respectively. The numbers in rhombus represent re-encoding signs. The PLST keeps
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     gers are reference to the text. White and black circles represent nodes of Type 1 and


parameterized suffix tries, not the number of nodes from the root to u in PLSTs.
sign at v and the depth |u| of u. Note that the depth |u| is the depth of u in
Lemma 1 tells how to recover str(u, v) from str(sl(u), sl(v)) using the re-encoding
                                                                                        Re(v) ≥ 1, then str(u, v)[Re(v)] = |u| + Re(v) − 1 and str(sl(u), sl(v))[Re(v)] = 0.
                                                                                        Then for any i ∈ {1, . . . , |str(u, v)|} \ {Re(v)}, str(u, v)[i] = str(sl(u), sl(v))[i]. If
                                                                                                                                                                                                                                                                     Definition 3 (Re-encoding sign). For each node v ∈ V , let u be the parent
                                                                                                                                                                                                                                                                                                                                                        For each edge (u, v), we associate an integer named re-encoding sign, so that we
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     T = xyabzwabzxbz$ where Σ = {a, b,$} and Π = {x, y, z, w}. The triples of inte-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Fig. 3. (a) LST-like structure for PrevSub(T ) and (b) the PLST PLST(T ) for


by suffix links” and simply label the edge with the reference to the corresponding
    If either u or v is bad in an edge (u, v) with |v|−|u| ≥ 2, we give up “reduction
                                                                                        Lemma 1. Let (u, v) be an edge in PLST(T ) such that both u and v are good.
                                                                                                                                                                                       an example of re-encoding signs. The next lemma immediately follows from
                                                                                                                                                                                       The re-encoding sign Re(v) is well-defined by Observation 1. Figure 4 shows
                                                                                                                                                                                                                                                                                                                                                                                                                                           any j ∈ {2, . . . , |v|} \ {i}, sl(v)[j − 1] = v[j]. Thus, such a position is unique in
                                                                                                                                                                                                                                                                                                                                                                                                                                           i such that v[i] = i − 1. For such a position i, we have sl(v)[i − 1] = 0 and for
                                                                                                                                                                                                                                                                                                                                                                                                                                           Observation 1. Any prev-encoded substring v of text T has at most one position
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               103
104      K. Nakashima, D. Hendrian, R. Yoshinaka and A. Shinohara

                                                                                                                     root


                                                                                                                     sha1_base64="(null)">(null)</latexit>
                                                                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                     <latexit
                                                                                                                     0                                                  a                                                c                                                                   (1)           (2)

                                                                                                                                                                     sl(u)                                                               u


                                                                                                                                                                                                                                         sha1_base64="(null)">(null)</latexit>
                                                                                                                                                                                                                                         sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                                                                                         sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                                                                                         sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                                                                                         latexit
                                                                                                                                                                                                                                         <
                                                                                                                                                                     sha1_base64="(null)">(null)</latexit>
                                                                                                                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                     <latexit
                                                                                                                     a                                                                                               c       b                                                           a     a       a   a     a   a
                                                               u


                                                               sha1_base64="(null)">(null)</latexit>
                                                               sha1_base64="(null)">(null)</latexit><latexit
                                                               sha1_base64="(null)">(null)</latexit><latexit
                                                               sha1_base64="(null)">(null)</latexit><latexit
                                                               latexit
                                                               <
              |u| = 2                                                                                                                                                                                                        0                                                           0     0       0   0     0   0
                                                                                                                     c                                                                                               b


              sha1_base64="(null)">(null)</latexit>
              sha1_base64="(null)">(null)</latexit><latexit
              sha1_base64="(null)">(null)</latexit><latexit
              sha1_base64="(null)">(null)</latexit><latexit
              <latexit
                                                                                                                     b                                                                                               0       0                                                           4     4       4   0     0   0
                                                                                                                                                                                                                     0
                                                                                                                     4 sl(v)
                                                                                                                                                                                                                     0                                                                   b     b       b   b     b   b


                                                                                                                                                                     sha1_base64="(null)">(null)</latexit>
                                                                                                                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                     <latexit
                                                                                                                     0
                                                               v     sha1_base64="(null)">(null)</latexit>
                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                     sha1_base64="(null)">(null)</latexit><latexit
                                                                     latexit
                                                                     <
                                                                                                                     3
                                                                                                                                                                                                                                         v                                               0         0   3   0     0   0


                                                                                                                                                                                                                                         sha1_base64="(null)">(null)</latexit>
                                                                                                                                                                                                                                         sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                                                                                         sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                                                                                         sha1_base64="(null)">(null)</latexit><latexit
                                                                                                                                                                                                                                         latexit
                                                                                                                                                                                                                                         <
              |v| = 6
               sha1_base64="(null)">(null)</latexit>
               sha1_base64="(null)">(null)</latexit><latexit
               sha1_base64="(null)">(null)</latexit><latexit
               sha1_base64="(null)">(null)</latexit><latexit
               <latexit
Fig. 4. Illustrating how re-encoding signs                                                                                                                                                                                       Fig. 5. Illustration for our pattern match-
are given. For the node v = 0acb40, we                                                                                                                                                                                           ing algorithm using fast links. In this fig-
have Re(v) = 5 − 2 = 3, because v[5] = 4                                                                                                                                                                                         ure, we check whether str(u, v) matches
and the parent of v is u = 0a of length 2.                                                                                                                                                                                       pattern p = a04b using fast links.
For the node sl(v) = acb00, Re(sl(v)) = 0.


substring of the original text T , like p-suffix trees. However, differently from p-
suffix trees, not every part of the original text is referenced by an edge in our
case. We keep only the subsequence T 0 of T obtained by removing parts that are
not referred to. We label an edge (u, v) connected to a bad node with an integer
triple (i, j, k) such that str(u, v) = hT 0 ik [i : j].
     In summary, PLST(T ) consists of three kinds of nodes: good Type 1, bad
Type 1, and Type 2 (all good). If u ∈ V is a good node, u has its depth, suffix
link and re-encoding sign, i.e., the triple (|u|, SL(u), Re(u)), where SL(u) = sl(u).
Here we use the notation SL(u) to emphasize that the suffix link SL(u) is a
pointer to the node corresponding to the string sl(u) rather than the string
itself. Therefore, it requires only constant size of memory space. If u ∈ V is
bad, u dose not have a suffix link, i.e., u has the triple (|u|, null, Re(u)). Each
edge (u, v) has either a label character or triple; if both u and v are good or
str(u, v) = 1, the edge label is str(u, v)[1]. Otherwise, the edge label is a triple
(i, j, k) such that str(u, v) = hT 0 ik [i : j]. If some bad nodes appear in PLST(T ),
we need the subsequence T 0 of prev(T ) to recover the labels of edges connecting
the bad nodes. Otherwise, we do not need any text.
     We remark that another idea to overcome the problem of the absence of sl(u)
in a PLST for a node u might be to add sli (u) to V for all i = 1, . . . , |u| so that V
is closed under sl, where sli (u) = sl(sli−1 (u)) and sl0 (u) = u. However, there exists
a series of texts Tn = x1 a1 . . . xn an x1 a1 . . . xn an y1 a1 . . . yn an z$ where xi , yi , z ∈
Π and ai ∈ Σ for each i, for which the number of those additional nodes will be
Ω(|Tn |2 ). Thus, the size of the index structures cannot be kept in linear.


3.2    Parameterized pattern matching with PLSTs

This subsection presents our algorithm for solving the parameterized pattern
matching problem as an application of PLSTs. The function P-Match of Algo-
rithm 1 takes a prev-encoded string p and a node in PLST(T ) and checks whether
there is v ∈ PrevSub(T ) such that p = str(u, v). If it is the case, it returns the
           An Extension of Linear-size Suffix Tries for Parameterized Strings            105


least extension v 0 of v such that v 0 ∈ V . In other words, p is a prefix of str(u, v 0 ),
where v 0 should be v itself if v ∈ V . Otherwise, it returns null.
    For an input pair (p, u), if p = ε, then P-Match returns u, as it is required.
Otherwise, it first tries to regain str(u, v) for the p[1]-child v of u, if u has such a
child. At first, suppose |p| ≥ |v|−|u| = l. We would like to know whether p[1 : l] =
str(u, v). If l = 1, it means that we have already confirmed that p[1 : l] = str(u, v).
Then we just go down to v and recursively call P-Match(p[2 :], v). If l ≥ 2 and
either u or v is bad, we refer to T 0 and check if p[1 : l] = str(u, v) as with matching
in a p-suffix tree. If l ≥ 2 and both u and v are good, we cannot know from the
edge (u, v) itself what str(u, v) is except for its first symbol str(u, v)[1] = p[1].
To recover whole str(u, v), we use the suffix link of u. Since u is good, SL(u) is
defined. If Re(v) = 0, we have str(u, v) = str(sl(u), sl(v)) by Lemma 1, and we
simply call P-Match(p[1 : l], SL(u)). Otherwise, we have p[1 : l] = str(u, v) if
and only if p[Re(v)] = |u| + Re(v) − 1 and p0 [1 : l] = str(sl(u), sl(v)), where
                                       (
                                0       0     if i = Re(v),
                               p [i] =
                                        p[i] otherwise

for i = 1, . . . , |p|. Thus, the recursive call of P-Match(p0 [1 : l], SL(u)) returns
null iff p[1 : l] 6= str(u, v). If P-Match(p0 [1 : l], SL(u)) returns a node, then p[1 :
l] = str(u, v) and thus we continue matching by calling P-Match(p[l + 1 :], v).
    The above discussion is valid when |p| ≤ |v| − |u|. If Re(v) = 0 or Re(v) > |p|,
then p is a prefix of str(u, v) iff p is a prefix of str(sl(u), sl(v)). Otherwise, p is a
prefix of str(u, v) iff p[Re(v)] = |u| + Re(v) − 1 and p0 is a prefix of str(sl(u), sl(v)).
Thus the recursion is justified. If P-Match(p0 [1 : l], SL(u)) returns a node, p is
a prefix of str(u, v) and we call P-Match(ε, v), which returns v.

Proposition 1. We can decide whether T has a substring that p-matches P
using Algorithm 1.

    The time complexity of Algorithm 1 is not linear as it is. Suppose that
P-Match(p, u) is called. It can be the case |v| − |u| ≥ |p| ≥ 2 and either
Re(v) = 0 or Re(v) > l where v = child(u, p[1]). In this case, the algorithm sim-
ply calls P-Match(p, SL(u)), where the first argument has not changed from the
preceding call. Such recursion may be repeated, and amortized time complexity
is not linear. The same difficulty and a solution have already been discussed
by Crochemore et al. [4] for LSTs. Following them, we introduce fast links as
follows, which allow us to skip recursions that always preserve the first argument.

Definition 4 (Fast link). For each edge (u, v) ∈ V × V such that |v| − |u| > 1
and both u, v are good, the fast link for (u, v) is defined to be FL(u, v) = SLk (u)
where k ≥ 1 is the smallest integer satisfying either |vk | < |v| − k or 0 < Re(vk ),
where vk = child(SLk (u), a) for a = str(u, v)[1].

    Algorithm 1 will run in linear time by replacing SL(u) in Line 14 by FL(u, v).
If |vk | < |v| − k = |slk (v)|, the node vk occurs between slk (u) and slk (v). Then,
P-Match(p, SLk (u)) will call P-Match(p[1 : |vk | − |SLk (u)|], SLk+1 (u)). When
106     K. Nakashima, D. Hendrian, R. Yoshinaka and A. Shinohara


 Algorithm 1: P-Match(p, u)
   Input: A string p and a node u in PLST(T )
   Output: The highest descendant v of u such that p is a prefix of str(u, v)
 1 if p = ε then return u;
 2 else
 3     if child(u, p[1]) is undefined then return null;
 4     else
 5         v ← child(u, p[1]);
 6         l ← min{|p|, |v| − |u|};
 7         if l ≥ 2 and u or v is bad then
 8              let α be the label of the edge (u, v);
 9              if p[: l] 6= α[: l] then return null;
10         else if l ≥ 2 then
11              if 1 ≤ Re(v) ≤ |p| then
12                  if p[Re(v)] = |u| + Re(v) − 1 then p[Re(v)] ← 0;
13                  else return null;
14              if P-Match(p[1 : l], SL(u)) = null then return null;
15     return P-Match(p[l + 1 :], v);


0 < Re(vk ), we change the Re(vk )-th symbol of p, which must be a positive
integer, to 0. Therefore, the number of fast links we follow is bounded by 2|p|.
Figure 5 shows how to p-match str(u, v) and p = a04b using fast links. We know
that p[1] = str(u, v)[1] = a. After following the fast link (1), we check whether
p[Re(sl2 (v))] = 4 and rewrite the value of p[Re(sl2 (v))] to 0. After using (2), we
check whether p[3] = 0. In this way, we can know that p matches str(u, v).

Theorem 1. Given PLST(T ) and a pattern P of length m, we can decide whether
T has a substring that p-matches P in O(m) time.

3.3   The size of PLSTs
We now show that the size of PLST(T ) is linear with respect to the length n of a
text T . First, we show a linear upper bound on the number of nodes of PLST(T ).
The nodes of Type 1 appear in the p-suffix tree, so they are at most 2n [3]. It is
enough to show that the number of nodes of Type 2 is linearly bounded as well.

Lemma 2. The number of Type 2 nodes in PLST(T ) is smaller than 2n.

Proof. Let us consider an implicit suffix link chain in PSTrie(T ) starting from
w = prev(T [: k]) with 1 ≤ k < n, i.e., (w, sl(w), sl2 (w), . . . , sl|w| (w)). PSTrie(T )
has n − 1 such chains and each internal node of PLST(T ) appears in at least
one chain. If a chain has two distinct Type 2 nodes sli (w) and slj (w) with i < j,
since sli+1 (w) is Type 1 by definition, there exists a Type 1 node between them.
    Define a binary relation R between V1 and V2 by

 R = { (u, v) ∈ V1 × V2 | there is i s.t. v = sli (u) and slj (u) ∈
                                                                  / V2 for all j < i }
          An Extension of Linear-size Suffix Tries for Parameterized Strings     107


and let R2 = { v ∈ V2 | (u, v) ∈ R for some u ∈ V1 }. Since R is a partial
function from branching nodes to Type 2 nodes, we have |R2 | ≤ n. By the above
argument on a chain, each chain has at most one Type 2 node v ∈ V2 such
that v ∈ / R2 . Since there are n − 1 chains, we have |V2 \ R2 | < n. All in all,
|V2 | = |R2 | + |V2 \ R2 | < n + n = 2n.                                       t
                                                                               u

   The number of edges and their labels, as well as the number of suffix links,
depth and re-encoding sign for nodes, is asymptotically bounded above by the
number of nodes in PLST(T ). T 0 is a subsequence of prev(T ), thus its length is
O(n). Therefore, the size of PLST(T ) is O(n).

Theorem 2. Given a p-string T of length n, the size of PLST(T ) is O(n).


4   Experiments

We performed comparative experiments on the number of nodes of PLSTs and
p-suffix trees for four sorts of text strings changing their length. Text strings we
used are random strings of length n = 10, . . . , 10240 over a constant alphabet
Σ with |Σ| = 2 and those over a parameter alphabet Π with |Π| = 2, and the
11th through 22nd Fibonacci strings over Σ with |Σ| = 2 and those over Π with
|Π| = 2. PLSTs for constant strings are of course identical to LSTs. For random
strings, we measured the average number of nodes for 100 strings of each length.
The results of our experiments are shown in Table 1. Recall that p-suffix trees
consist of Type 1 nodes and prev(T ), while PLSTs have Type 1 and Type 2
nodes and T 0 . For random strings, we can see that the number of Type 2 nodes
is close to the text length. Since a node requires much more memory size than a
character, PLSTs will be bigger than p-suffix trees for random strings. On the
other hand, for Fibonacci strings, PLSTs have few Type 2 nodes and T 0 = .
Thus, PLSTs will use less space than p-suffix trees for Fibonacci strings.
    In addition to Fibonacci strings, we constructed PLSTs for other kinds of
highly repetitive strings such as period-doubling and Thue-Morse strings [1]. We
confirmed that those PLSTs have few Type 2 nodes and T 0 = . PLSTs for such
highly repetitive strings use less memory than p-suffix trees.


5   Conclusion and future work

In this paper, we presented an indexing structure called a PLST for the pa-
rameterized pattern matching problem. Given a p-string T of length n, the size
of PLST for T is O(n). We presented an algorithm that solves the problem in
O(m) time, where m is the length of the pattern. We experimentally showed that
PLST is space-saving from p-suffix tree for indexing highly repetitive strings.
    For PLSTs to be useful for various applications, like computing the longest
common substrings, an efficient algorithm for constructing PLSTs is required
like LSTs [11]. Furthermore, the ideas developed in this paper may be useful to
generalize L-CDAWGs [14] to a data structure for parameterized strings.
108       K. Nakashima, D. Hendrian, R. Yoshinaka and A. Shinohara

          Table 1. The numbers of nodes of PLSTs for different sorts of strings

                           random strings                                Fibonacci strings
                  constant string      p-string                   constant string      p-string
      length n   Type 1 Type 2 Type 1 Type 2           length n   Type 1 Type 2 Type 1 Type 2
            10      16.98      6.04   16.93     5.23         90      178        12     177      12
            20      35.66    12.78    35.72    12.27        145      285        12     285      13
            40      74.58    27.25    74.53    26.22        234      466        15     465      15
            80     153.61    56.82   153.48    56.04        378      751        15     751      16
           160     312.37 115.55     312.45 115.24          611     1220        18   1219       18
           320     631.40 234.55     631.27 235.32          988     1971        18   1971       19
           640    1270.34 477.29 1270.47 475.34            1598     3194        21   3193       21
          1280    2549.35 956.18 2549.39 957.03            2585     5165        21   5165       22
          2560    5108.37 1923.62 5108.48 1922.97          4182     8362        24   8361       24
          5120   10227.48 3845.35 10227.29 3853.97         6766    13527        24  13552       25
         10240   20466.49 7710.50 20466.14 7704.25        10947    21892        27  21918       27


References
 1. Allouche, J.P., Shallit, J.: Automatic sequences: theory, applications, generaliza-
    tions. Cambridge university press (2003)
 2. Baker, B.S.: A program for identifying duplicated code. Computing Science and
    Statistics 24, 49–57 (1992)
 3. Baker, B.S.: Parameterized pattern matching: Algorithms and applications. Jour-
    nal of Computer and System Sciences 52(1), 28–42 (1996)
 4. Crochemore, M., Epifanio, C., Grossi, R., Mignosi, F.: Linear-size suffix tries. The-
    oretical Computer Science 638, 171–178 (2016)
 5. Crochemore, M., Rytter, W.: Jewels of Stringology: Text Algorithms. World Sci-
    entific (2003)
 6. Deguchi, S., Higashijima, F., Bannai, H., Inenaga, S., Takeda, M.: Parameterized
    suffix arrays for binary strings. In: Proceedings of the Prague Stringology Confer-
    ence 2008. pp. 84–94 (2008)
 7. Diptarama, Katsura, T., Otomo, Y., Narisawa, K., Shinohara, A.: Position heaps
    for parameterized strings. In: 28th Annual Symposium on Combinatorial Pattern
    Matching (CPM 2017). pp. 8:1–8:13 (2017)
 8. Fredriksson, K., Mozgovoy, M.: Efficient parameterized string matching. Informa-
    tion Processing Letters 100(3), 91–96 (2006)
 9. Fujisato, N., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Right-to-left
    online construction of parameterized position heaps. In: Prague Stringology Con-
    ference 2018. pp. 91–102 (2018)
10. Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and
    Computational Biology. Cambridge University Press (1997)
11. Hendrian, D., Takagi, T., Inenaga, S.: Online Algorithms for Constructing Linear-
    Size Suffix Trie. In: 30th Annual Symposium on Combinatorial Pattern Matching
    (CPM 2019). pp. 30:1–30:19 (2019)
12. I, T., Deguchi, S., Bannai, H., Inenaga, S., Takeda, M.: Lightweight parameterized
    suffix array construction. In: Combinatorial Algorithms (IWOCA 2009). pp. 312–
    323 (2009)
13. Shibuya, T.: Generalization of a suffix tree for RNA structural pattern matching.
    Algorithmica 39(1), 1–19 (2004)
14. Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., Arimura, H.: Linear-size cdawg:
    new repetition-aware indexing and grammar compression. In: International Sym-
    posium on String Processing and Information Retrieval. pp. 304–316 (2017)