=Paper= {{Paper |id=Vol-1146/paper8 |storemode=property |title=On Representations of Ternary Order Relations in Numeric Strings |pdfUrl=https://ceur-ws.org/Vol-1146/paper8.pdf |volume=Vol-1146 |dblpUrl=https://dblp.org/rec/conf/icabd/KimANPS14 }} ==On Representations of Ternary Order Relations in Numeric Strings== https://ceur-ws.org/Vol-1146/paper8.pdf
         On Representations of Ternary Order
            Relations in Numeric Strings
        Jinil Kim1,⇤ , Amihood Amir2,3 , Joong Chae Na4 , Kunsoo Park1,⇤ ,
                                     Jeong Seop Sim5

    1
        Dep. of Computer Science and Engineering, Seoul National University, Korea
               2
                 Department of Computer Science, Bar-Ilan University, Israel
                             3
                               Johns Hopkins University, USA
           4
             Dep. of Computer Science and Engineering, Sejong University, Korea
         5
           Dep. of Computer and Information Engineering, Inha University, Korea
                            ⇤
                                {jikim | kpark}@theory.snu.ac.kr




                                         Abstract
Order-preserving matching is a string matching problem of two numeric strings
where the relative orders of consecutive substrings are matched instead of the
characters themselves. The order relation between two characters is a ternary
relation (>, <, =) rather than a binary relation (>, <), but it was not sufficiently
studied in previous works [5, 7, 1]. In this paper, we extend the representations of
order relations by Kim et al. [5] to ternary order relations, and prove the equiva-
lence of those representations. The extended prefix representation takes log m + 1
bits per character, while the nearest neighbor representation takes 2 log m bits
per character. With our extensions, the time complexities of order-preserving
matching in binary order relations can be achieved in ternary order relations as
well.




1        Introduction
Order-preserving matching is a string matching problem of two numeric strings
where the relative orders of substrings are matched instead of the characters them-
selves. It has many practical applications such as stock price analysis and musical
melody matching. The study on this field was introduced by Kubica et al. [7]
and Kim et al. [5] where Kubica et al. [7] defined order relations by order isomor-
phism of two strings, while Kim et al. [5] defined them explicitly by the sequence

Copyright c by the paper’s authors. Copying permitted only for private and academic purposes.
In: Costas S. Iliopoulos, Alessio Langiu (eds.): Proceedings of the 2nd International Conference
on Algorithms for Big Data, Palermo, Italy, 7-9 April 2014, published at http://ceur-ws.org/



                                              46
On Representations of Ternary Order Relations in Numeric Strings

of rank values (which they called the natural representation). Recently, variants
of order-preserving matching have been studied such as order-preserving suffix
trees [2], approximate matching with k mismatches [3], and a Boyer-Moore type
algorithm [1].
   An order relation between two characters is a ternary relation (>, <, =) rather
than a binary relation (>, <), but the representations of order relations were not
sufficiently studied for ternary order relations. Kim et al. [5] considered only binary
order relations by assuming that all the characters in a string are distinct, while
Kubica et al. [7] considered ternary order relations but their match condition on
equal characters was faulty and it was fixed later by Cho et al. [1]. As there
was no integrated study on the relationship between the representations of order
relations, previous works [1, 2, 3] individually handled the cumbersome details of
ternary order relations on their own works.
   The number of order relations on a sequence of n elements is n! in binary order
relations, but it is the ordered Bell number [4] in ternary order relations. The
                                                                     Pn 1
ordered Bell number is the solution of the recurrence f (n) = 1 + j=1 nj f (n j),
which is exponentially greater than n!. The representations of order relations
should be extended for ternary order relations, which might incur some space
overhead on the representations.
   In this paper, we extend the representations of order relations by Kim et al. [5]
to ternary order relations, and prove the equivalence of those representations.
With the extended prefix representation, order-preserving matching can be done
in O(n log m) time, and the representation of order relations takes (log m + 1)-bit
per character. With the nearest neighbor representation, the matching can be done
in O(n + m log m), but the representation takes (2 log m)-bit per character. The
nearest neighbor representation is suitable for the single pattern matching while
the extended prefix representation is space-efficient and can be useful in order-
preserving multiple pattern matching [5] and order-preserving suffix trees [2].

2    Problem Formulation
Let ⌃ denote the set of numbers such that a comparison of two numbers can be
done in constant time, and let ⌃⇤ denote the set of strings over the alphabet ⌃. For
a string x 2 ⌃⇤ , let |x| denote the length of x. A string x is described by a sequence
of characters (x[1], x[2], ..., x[|x|]). Let a substring x[i..j] be (x[i], x[i + 1], ..., x[j])
and a prefix xi be x[1..i]. For a character c 2 ⌃, let rankx (c) = 1 + |{i : x[i] <
c for 1  i  |x|}|, and let existx (c) be 1 if c exists in x, and 0 otherwise. Let
ex-rankx (c) = (rankx (c), existx (c)). For any boolean condition cond, let (cond)
be 1 if cond is true, 0 otherwise.

Order Isomorphism [7] For two strings x and y of length n, x and y are order-
isomorphic if 8i, j 2 [1..n], x[i]  x[j] , y[i]  y[j].

   Order isomorphism implicitly deals with ternary order relations because each of
ternary order relations (>, <, =) can be checked by variants of the proposition in
Definition 2 (changing i and j, taking the contrapositive, or both). For example,


                                              47
On Representations of Ternary Order Relations in Numeric Strings

if x[i] > x[j], then y[i] > y[j] by the contrapositive of x[i]  x[j] ( y[i]  y[j]. If
x[i] = x[j], then y[i] = y[j] by x[i]  x[j] ) y[i]  y[j] and x[j]  x[i] ) y[j]  y[i].
   The definition of order isomorphism looks simple, but it is somewhat compli-
cated to handle in practice. The number of the order relations involved in checking
order isomorphism of two strings of length n is O(n2 ), hence it has an inherent
quadratic term if the definition is used directly for order-preserving matching.

Natural Representation [5] For a string x of length n, the nat-
ural representation of the order relations is defined as N at(x) =
(rankx (x[1]), rankx (x[2]), ..., rankx (x[n])).

   In the natural representation, ternary order relations are explicitly stated in
terms of ranks. For example, x[i] > x[j] if and only if N at(x)[i] > N at(x)[j], and
x[i] = x[j] if and only if N at(x)[i] = N at(x)[j]. That is, the order relations of two
strings coincide if and only if N at(x) = N at(y). The comparison of two natural
representations takes O(n) time if the natural representations are given.
   These two definitions are equivalent because the natural representations of two
strings are identical if and only if they are order-isomorphic. We adopt the natural
representation throughout this paper because the definition itself and subsequent
analysis are more intuitive.

Order-Preserving Matching [5] Given a text T [1..n] 2 ⌃⇤ and a pattern
P [1..m] 2 ⌃⇤ , P matches T at position i if N at(P ) = N at(T [i m + 1..i]). Order-
preserving matching is the problem of finding all positions of T matched with P .


Equivalent Representation For any two representations R1 (·) and R2 (·), R1 is
equivalent to R2 if R1 (x) = R1 (y) , R2 (x) = R2 (y) for any two strings x, y.

  For KMP-based algorithms [6], the length of matches is incrementally increased
when the next character of the text matches that of the pattern. Such a match
operation is formalized by the match condition as follows.

Match Condition A match condition of a representation R(·) is a boolean func-
tion M atch(x, y, R(x), t + 1) such that N at(xt+1 ) = N at(yt+1 ) holds if and only
if N at(xt ) = N at(yt ) and M atch(x, y, R(x), t + 1) where x, y 2 ⌃⇤ , |x| = |y| and
t 2 [1..|x| 1].

3    Prefix Representation
Prefix Representation [5] For a string x, the prefix representation of the order
relations is defined as P re(x) = (rankx1 (x[1]), rankx2 (x[2]), ..., rankx|x| (x[|x|])).

   The prefix representation has an ambiguity between di↵erent strings in ternary
order relations [1]. For example, when x = (10, 30, 20), and y = (10, 20, 20), the
prefix representations of both x and y are (1, 2, 2). The ambiguity is resolved in
the extended prefix representation.


                                           48
On Representations of Ternary Order Relations in Numeric Strings

         i         1       2         3        4           5       6        7       8
        x         30      10        50       20          30      20       25      20
     N at(x6 )     4
                  ✓ ◆
                           1
                          ✓ ◆
                                     6
                                    ✓ ◆
                                              2
                                             ✓ ◆
                                                          4
                                                         ✓ ◆
                                                                  2
                                                                 ✓ ◆     ✓ ◆
                    1       1        3        2           3       2       4
    Ex-pre(x7 )
                  ✓ 0 ◆   ✓ 0 ◆     ✓0 ◆     ✓0 ◆        ✓1◆     ✓1◆     ✓0◆
                    1       1        2        2           1       4       6
     N N (x7 )      1       1        1        1           1       4       5
     N at(x7 )     5
                   ✓ ◆
                             1
                           ✓ ◆
                                      7
                                    ✓ ◆      ✓ ◆
                                                2         5
                                                         ✓ ◆
                                                                  2
                                                                 ✓ ◆
                                                                          4
                                                                         ✓ ◆      ✓ ◆
                    1       1        3        2           3       2       4        2
    Ex-pre(x8 )
                  ✓ 0 ◆   ✓ 0 ◆     ✓0 ◆     ✓0 ◆        ✓1◆     ✓1◆     ✓0◆      ✓1 ◆
                    1       1        2        2           1       4       6        6
     N N (x8 )      1       1        1        1           1       4       5        6
     N at(x8 )     6         1        8         2         6       2       5        2

                  Figure 1: Example of Lemma 3.1 and Lemma 4.1
Extended Prefix Representation For a string x, the extended prefix
representation is defined as Ex-pre(x) = (ex-rankx1 (x[1]), ex-rank(x2 [2]),
..., ex-rank(x|x| [|x|])).

  For any t 2 [1..|x|         1], N at(xt+1 ) can be computed from N at(xt ) and
Ex-pre(xt+1 ).

Lemma 3.1 (Representation Conversion) Given N at(xt ) and Ex-pre(xt+1 ),
                     ⇢
                       a + ((a > c) _ (a = c ^ d = 0)) for 1  i  t
    N at(xt+1 )[i] =
                       c                               for i = t + 1
                             ✓ ◆
                              c
where a = N at(xt )[i] and    d
                                 = Ex-pre(xt+1 )[t + 1].

   An example of Lemma 3.1 is shown in Figure 1 for x                         =
(30, 10, 50, 20, 30, 20, 25, 20). Let’s consider when
                                                  ✓ ◆
                                                      t +
                                                        ✓ ◆
                                                           1 = 7. For i = 1, we
have N at(x6 )[1] = a = 4 and Ex-pre(x7 )[7] = dc = 40 . Since a = c and d = 0,
                                                    ✓ ◆
                                                     c
we have N at(x7 )[1] = a + 1 = 5. For i = 2,         d
                                                        is the same as for i = 1, and we
have N at(x6 )[2] = a = 1. Since a < c, N at(x7 )[2] = a = 1. Let’s consider
                                                                      ✓ ◆
                                                                             the
                                                                             ✓ ◆
                                                                                 next
                                                                       c       2
step when t + 1 = 8. For i = 4, we have N at(x7 )[4] = a = 2 and d = 1 . As
a = c and d = 1, N at(x8 )[4] = a = 2.

Theorem 3.2 Ex-pre(·) is equivalent to N at(·).

Theorem 3.3 (Match Condition) Given x, y and t, the condition
Ex-pre(xt+1 )[t + 1] = Ex-pre(yt+1 )[t + 1] is a match condition of Ex-pre(·).


4     Nearest Neighbor Representation
Let LM axx [i] = j if x[j] = max{x[k] : x[k]  x[i] for 1  k  i 1}, or 1 if no
such j. Let LM inx [i] = j if x[j] = min{x[k] : x[k]      x[i] for 1  k  i 1}, or


                                           49
On Representations of Ternary Order Relations in Numeric Strings

1 if no such j. If there are multiple j’s for LM axx [i] or LM inx [i], we choose the
rightmost one. In Figure 1, LM axx [8] = 6 since x[6] is the rightmost one among
the maximum values which are less than or equal to x[8] in x[1..7]. Similarly,
LM inx [8] = 6.

Nearest Neighbor Representation [5, 7, 1] For a string x, the nearest neigh-
bor representation can be defined as
                                ✓                ◆✓                ◆         ✓                  ◆
                                    LM axx [1]        LM axx [2]                 LM axx [|x|]
                   N N (x) =        LM inx [1]        LM inx [2]
                                                                       ···       LM inx [|x|]

   For convenience, let x[ 1] =       1, x[1] = 1, N at(x)[ 1] = 0 and
N at(x)[1] = |x| + 1 for any string x. Then, N at(x)[LM axx [i]]  N at(x)[i] 
N at(x)[LM inx [i]] holds for any i 2 [1..|x|] even when LM axx [i] = 1 or
LM inx [i] = 1.

Lemma 4.1 (Representation Conversion) Given N at(xt ) and N N (xt+1 ),
                     ⇢
                       a + ((a > f ) _ (a = f ^ e 6= f )) for 1  i  t
    N at(xt+1 )[i] =
                       f                                  for i = t + 1
                          ✓ ◆
                           c
where a = N at(xt )[i],    d
                              = N N (xt+1 )[t + 1], e = N at(xt )[c] and f = N at(xt )[d].



   An example of Lemma 4.1 is shown in Figure 1 for x                          =
(30, 10, 50, 20, 30, 20, 25, 20). Let us consider
                                           ✓ ◆
                                                  when
                                                  ✓ ◆
                                                       t + 1 = 7. For i =  1, we
have N at(x6 )[1] = a = 4, N N (x7 )[7] = dc = 65 , e = 2 and f = 4. Since a = f
                                                                                   ✓ ◆
                                                                                    c
and e 6= f , we get N at(x7 )[1] = a + 1 = 5. For i = 2,                            d
                                                                                       , e and f are the same
as for i = 1, and we have a = 1. Since a < e, we have N at(x7 )[2] = a = 1. For
i = 3, we have a = 6 and a > f , and thus N at(x7 )[3] = a + 1 = 7. For i = 4, we
have a = 2. Since a = e and e 6= f , we get N at(x7 )[4] = a = 2. For i = 7, we get
N at(x7 )[7] = f = 4 since i = t + 1.✓ Consider
                                         ◆ ✓ ◆
                                                the next step when t + 1 = 8. For
                                       c     2
i = 4, we have a = 2, N N (x8 )[8] = d = 1 , e = 2 and f = 2. Since a = e = f ,
we get N at(x8 )[4] = a = 2.

Theorem 4.2 N N (·) is equivalent to N at(·).

   A naive match condition of the nearest neighbor representation is N N (xt+1 )[t +
1] = N N (yt+1 )[t + 1] as that of the extended prefix representation in Theorem 3.3,
which requires computing the nearest neighbor representations of both x and y.
Kubica et al. [7] proposed an efficient match condition for ternary order relations
which can be checked in constant time when the nearest neighbor representation
of x is given, but it was faulty. Cho et al. [1] presented a modified match condition
in ternary order relations, which can produce an O(n + m log m) algorithm as in
binary order relations.


                                                  50
On Representations of Ternary Order Relations in Numeric Strings

Theorem 4.3 (Match Condition of Nearest Neighbor Representation [1])
Given x, y and t, the condition (y[c]
                                   ✓ ◆
                                       < y[t + 1] < y[d]) _ (y[t + 1] = y[c] = y[d]) is
                                      c
a match condition of N N (·) where d = N N (xt+1 )[t + 1].

   We can generalize order-preserving matching algorithms in binary order rela-
tions [5] to ternary order relations using the representations above. For single
pattern matching, we can obtain an O(n log m) algorithm with the extended prefix
representation, and an O(n + m log m) algorithm with the nearest neighbor repre-
sentation, both of which are consistent with the results in [5, 7, 1]. For multiple
pattern matching, we can obtain an O((n + m) log m) algorithm in ternary order
relations using the extended prefix representation.

Acknowledgements
The work of Jinil Kim and Kunsoo Park was supported by Next-Generation Informa-
tion Computing Development Program through the National Research Foundation of
Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2011-0029924).
Amihood Amir was partly supported by NSF grant CCR-09-04581, ISF grant 347/09,
and BSF grant 2008217. Joong Chae Na was supported by the IT R&D program of
MKE/KEIT [10038768, The Development of Supercomputing System for the Genome
Analysis], and by Basic Science Research Program through the National Research
Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future
Planning (2011-0007860). Jeong Seop Sim was supported by the National Research
Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No.
2012R1A2A2A01014892), and by the Industrial Strategic technology development
program (10041971, Development of Power Efficient High-Performance Multime-
dia Contents Service Technology using Context-Adapting Distributed Transcoding)
funded by the Ministry of Knowledge Economy (MKE, Korea).

References
[1] S. Cho, J. C. Na, K. Park, and J. S. Sim. Fast order-preserving pattern match-
    ing. In COCOA, 2013.

[2] M. Crochemore, C. S. Iliopoulos, T. Kociumaka, M. Kubica, A. Langiu, S. P.
    Pissis, J. Radoszewski, W. Rytter, and T. Walen. Order-preserving incomplete
    suffix trees and order-preserving indexes. In SPIRE, pages 84–95, 2013.

[3] P. Gawrychowski and P. Uznanski. Order-preserving pattern matching with k
    mismatches. CoRR, abs/1309.6453, 2013.

[4] O. A. Gross. Preferential arrangements. The American Mathematical Monthly,
    69:4–8, 1962.

[5] J. Kim, P. Eades, R. Fleischer, S.-H. Hong, C. S. Iliopoulos, K. Park, S. J.
    Puglisi, and T. Tokuyama. Order-preserving matching. To appear in Theor.
    Comput. Sci.


                                         51
On Representations of Ternary Order Relations in Numeric Strings

[6] D. E. Knuth, J. H. M. Jr., and V. R. Pratt. Fast pattern matching in strings.
    SIAM J. Comput., 6(2):323–350, 1977.
[7] M. Kubica, T. Kulczynski, J. Radoszewski, W. Rytter, and T. Walen. A linear
    time algorithm for consecutive permutation pattern matching. Inf. Process.
    Lett., 113(12):430–433, 2013.




                                         52