=Paper= {{Paper |id=Vol-1548/108-Ueki |storemode=property |title=A Fast Order-Preserving Matching with q-neighborhood Filtration Using SIMD Instructions |pdfUrl=https://ceur-ws.org/Vol-1548/108-Ueki.pdf |volume=Vol-1548 |authors=Yohei Ueki,Kazuyuki Narisawa,Ayumi Shinohara |dblpUrl=https://dblp.org/rec/conf/sofsem/UekiNS16 }} ==A Fast Order-Preserving Matching with q-neighborhood Filtration Using SIMD Instructions== https://ceur-ws.org/Vol-1548/108-Ueki.pdf
A Fast Order-Preserving Matching with q-neighborhood
          Filtration Using SIMD Instructions

                Yohei Ueki, Kazuyuki Narisawa, and Ayumi Shinohara

             Graduate School of Information Sciences, Tohoku University, Japan
            {yohei_ueki@shino.,narisawa@,ayumi@}ecei.tohoku.ac.jp



       Abstract. The order-preserving matching problem is a variant of the pattern
       matching problem focusing on shapes of sequences instead of values of sequences.
       Given a text and a pattern, the problem is to output all positions where the pattern
       and a subsequence in the text are of the same relative order. Chhabra and Tarhio
       proposed a fast algorithm based on filtration for the order-preserving match-
       ing problem, and Faro and Külekci improved Chhabra and Tarhio’s solution by
       extending the filter. Furthermore, Cantone et al. and Chhabra et al. proposed solu-
       tions based on filtration using SIMD (Single Instruction Multiple Data)
       instructions, and showed that SIMD instructions are efficient in speeding up their
       algorithms. In this paper, we propose a fast matching algorithm for the order-
       preserving matching problem using SIMD instructions based on filtration
       proposed by Faro and Külekci. We show that our algorithm is practically faster
       than previous solutions.

       Keywords: pattern matching problem, order-preserving matching problem,
       SIMD instructions


1 Introduction
The order-preserving matching problem [9, 11] is a variant of the pattern matching
problem focusing on shapes of sequences instead of values of sequences. This match-
ing can be applied to various fields, such as musical matching, sensor data analysis
and stock price analysis. Kubica et al. [11] defined an order-isomorphism as one of the
similarities of sequences. For two numerical sequences X and Y of the same length,
the order-isomorphism expresses that the relative order of X coincides with that of Y.
For example, two sequences X = (8, 32, 40, 24, 16) and Y = (18, 42, 50, 34, 26) are
order-isomorphic because both the relative orders of X and Y are (1, 4, 5, 3, 2). The
order-preserving matching problem is, given a text and a pattern, to output all positions
of subsequences in the text that are order-isomorphic to the pattern.
    Various sequential matching algorithms for the order-preserving matching prob-
lem have been developed, based on the Knuth-Morris-Pratt Algorithm [9, 11], Horspoo
Algorithm [5], and forward automaton Algorithm [1]. Moreover, Chhabra and Tarhio [4]
proposed a practically fast pattern matching algorithm using a filtration method. In this
method, text T and pattern P are encoded into binary sequences T ′ and P′ respectively,
based on the relationship to the adjacent values. Because the standard string matching P′
in T ′ is much faster than the order-preserving matching P in T , it can be used to narrow
                                                  Fast Order-Preserving Matching ...     109

the candidate positions, although some incorrect answers may be included. Hence this
methods requires verification steps for the candidates. Faro and Külekci [7] improved
the filter by considering q-neighborhood values, instead of adjacent (1-neighborhood)
values. More recently, solutions based on filtration using SIMD (Single Instruction Mul-
tiple Data) instructions were proposed by Cantone et al. [2] and Chhabra et al. [3]. They
showed that SIMD instructions are efficient in speeding up their algorithms.
     In this paper, we propose a new fast algorithm using SIMD instructions based on
filtration proposed by Faro and Külekci [7]. Our experiments show that our algorithm
is practically faster than previous solutions.


2     Preliminaries
2.1   Notations
Let Σ be an ordered alphabet, and Σ ∗ be the set of all sequences over Σ. |X| denotes
the length of a sequence X ∈ Σ ∗ , and X[i] denotes the i-th value of X for 1 ≤ i ≤ |X|.
A subsequence of X beginning at i and ending at j for 1 ≤ i ≤ j ≤ |X| is denoted by
X[i : j] = (X[i], X[i + 1], . . . , X[ j − 1], X[ j]).

2.2   Order Preserving Matching
Definition 1 (Order-isomorphism [11]). Two sequences X, Y ∈ Σ ∗ of the same length
are order-isomorphic if X[i] ≤ X[ j] ⇐⇒ Y[i] ≤ Y[ j] for any 1 ≤ i, j ≤ |X|. We write
X ≈ Y if X is order-isomorphic to Y, and X 0 Y otherwise.

Example 1. For sequences X = (8, 32, 40, 24, 16), Y = (18, 42, 50, 34, 26) and Z =
(20, 24, 45, 38, 31), we have X ≈ Y and X 0 Z.

Definition 2 (Order-Preserving Matching Problem [9, 11]). Given a text T ∈ Σ ∗ of
length n, and a pattern P ∈ Σ ∗ of length m, the order-preserving matching problem asks
for all positions i satisfying T [i : i + m − 1] ≈ P for 1 ≤ i ≤ n − m + 1.

Example 2. For a text T = (13, 18, 42, 50, 34, 26, 12, 20, 24, 45, 38, 31) and a pattern
P = (8, 32, 40, 24, 16), the output is 2 because T [2 : 6] ≈ P, see Fig. 1.

    Solutions for the order-preserving matching problem based on filtration require
a verification step. That is, each candidate T [i : i + m − 1] is verified whether it is
order-isomorphic to the pattern P of length m. It takes O(m2 ) time by using a naive
algorithm based on Definition 1. The previous work [2, 4] showed the following lemma
for verifying the order-isomorphism of two sequences of length m in O(m) time with
O(sort(m)) preprocessing time, where sort(m) is the time required to sort one of the
sequences.

Definition 3 (relative order array [2, 4]). For a sequence Y ∈ Σ ∗ , the relative or-
der array RY of Y is defined as RY = (rank−1                −1                −1
                                                Y (1), rankY (2), . . . , rankY (|Y|)), where
rankY (i) = |{k : Y[k] < Y[i] or (Y[k] = Y[i] and k < i)}|.
110      Y. Ueki, K. Narisawa, and A. Shinohara




Fig. 1. An example of order-preserving matching. The pattern P = (8, 32, 40, 24, 16) matches at
position 2 in the text T = (13, 18, 42, 50, 34, 26, 12, 20, 24, 45, 38, 31), because both the relative
orders of P and T [2 : 6] = (18, 42, 50, 34, 26) are (1, 4, 5, 3, 2).



Lemma 1 ([4]). Given two sequences X, Y ∈ Σ ∗ of length m, and the relative order
array RY of Y, we have X 0 Y if and only if there exists 1 ≤ j ≤ m − 1 such that one of
the following conditions holds:
(1) X[RY [ j]] > X[RY [ j + 1]],
(2) X[RY [ j]] = X[RY [ j + 1]] and X[RY [ j]] , X[RY [ j + 1]], or
(3) X[RY [ j]] < X[RY [ j + 1]] and X[RY [ j]] = X[RY [ j + 1]].

    The relative order array can be computed in O(sort(m)) time [4]. Therefore, the veri-
fication algorithm based on Lemma 1 runs in O(m) time with O(sort(m)) preprocessing
time.


3     Previous Work
3.1   Neighborhood Ranking Filter
Chhabra and Tarhio [4] proposed an order-preserving filtration technique. In this paper,
we call it Neighborhood Ranking filter (shortly NR filter). It consists of two phases, the
filtration phase and the verification phase.
     In the filtration phase, candidates are filtered out by using Neighborhood Ranking
code (shortly NR code) defined by the following.

Definition 4 (Neighborhood Ranking code). For a sequence X ∈ Σ ∗ , the Neighbor-
hood Ranking code of X is defined as B(X)[i] = 1 if X[i] < X[i + 1], and 0 otherwise,
for 1 ≤ i ≤ |X| − 1.

At the beginning of the filtering phase, we compute the NR code B(P) of the pattern
P. Next, we find all positions i satisfying B(T [i : i + m − 1]) = B(P) for 1 ≤ i ≤
n − m + 1, that are candidates of the order-preserving matching. In order to find these
positions, we can use any standard string matching algorithms, such as Knuth-Morris-
Pratt algorithm [10]. Such candidates include no false negative results, in other words,
the filter never removes correct answers by the following proposition.
                                                   Fast Order-Preserving Matching ...      111

Proposition 1 ([4]). For any two sequences X, Y ∈ Σ ∗ , X ≈ Y ⇒ B(X) = B(Y).
The converse of Proposition 1 is not always true. Hence, candidates include false posi-
tive results, in other words, the filter may pass incorrect answers.
    In the verification phase, every candidate is checked whether it is order-isomorphic
to the pattern or not. The verification method is based on Lemma 1, which requires
pre-computing the relative order array RP of pattern P.
Example 3. We consider again the instance in Example 1 and Fig. 1. At first, the relative
order array RP = (1, 5, 4, 2, 3) of P is computed. Next, in the filtering phase, T and P
are encoded into NR codes as B(T ) = (1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0) and B(P) = (1, 1, 0, 0),
respectively. The positions i that satisfy B(T )[i : i + m − 1] = B(P) are i = 2 and i = 8
only. Therefore, the candidates are T [2 : 6] = (18, 42, 50, 34, 26) and T [8 : 12] =
(20, 24, 45, 38, 31). Lastly, in the verification phase, T [2 : 6] and T [8 : 12] are verified
whether they are order-isomorphic to P or not, using the verification algorithm based
on Lemma 1. As a result, the position 2 is reported.

3.2   q-Neighborhood Ranking Filter
Faro and Külekci [7] proposed q-Neighborhood Ranking code (q-NR code) for the fil-
tration technique, which is a more effective filtration technique than that of using the
original neighborhood ranking code. It uses q-neighborhood relationships, while the
original neighborhood ranking code uses only one-adjacent relationships.
Definition 5 (q-Neighborhood Ranking code [7]). Let X ∈ Σ ∗ be a sequence of length
n, and q be an integer satisfying 1 ≤ q < n. The q-Neighborhood Ranking code of X is
                       ∑
defined as Bq (X)[i] = qj=1 (βX (i, j) · 2q− j ) where βX (i, j) = 1 if X[i] < X[i + j], and 0
otherwise, for 1 ≤ i ≤ n − q.
Example 4. For a sequence X = (8, 32, 40, 24, 16), we have B1 (X) = (1, 1, 0, 0), B2 (X) =
((11)2 , (10)2 , (00)2 ) = (3, 2, 0) and B3 (X) = ((111)2 , (100)2 ) = (7, 4).
    The next lemma guarantees that candidates filtered by q-NR code also contain no
false negatives as well as the NR filter.
Lemma 2 ([7]). For any two sequences X, Y ∈ Σ ∗ , X ≈ Y ⇒ Bq (X) = Bq (Y).
   The converse of Lemma 2 is not always true, similarly to Proposition 1. Hence,
candidates obtained by using the q-NR filter also possibly contain false positive results.

4     Proposed Methods
4.1   Fast Implementation Using SIMD Instructions
In this section, we propose a fast implementation using SIMD (Single Instruction Mul-
tiple Data) instructions. We use SSE4.2 [8] for a SIMD instruction set. SSE4.2 supports
128-bit registers, that can contain two 64-bit, four 32-bit, eight 16-bit, or sixteen 8-bit
numbers. The packing factor α = 128/w stands for the number of w-bit numbers con-
tained in the 128-bit register. SSE4.2 allows to perform the same operation in parallel
on several numbers stored in the 128-bit register.
112       Y. Ueki, K. Narisawa, and A. Shinohara

            Table 1. SIMD functions, their SSE intrinsics (w = 8) and their behaviors

            function             SSE intrinsics                          behavior
                                                  Return a sequence C̃, where C̃[i] = 2w − 1
     CompLt(X̃1 , X̃2 )          mm cmplt epi8
                                                  if X̃1 < X̃2 , and 0 otherwise for 1 ≤ i ≤ α.
                                                  Return a sequence Ã, where
     And(X̃1 , X̃2 )             mm and si128
                                                  Ã[i] = X̃1 [i] & X̃2 [i] for 1 ≤ i ≤ α.
                                                  Return a sequence Õ, where
     Or(X̃1 , X̃2 )              mm or si128
                                                  Õ[i] = X̃1 [i] | X̃2 [i] for 1 ≤ i ≤ α.
                                                  Return a bit mask from most significant
     MoveMask(X̃)                mm movemask epi8
                                                  bits of X̃[i] for 1 ≤ i ≤ α.
     SearchStr(X̃P , m, X̃T , n) mm cmpestrm      Find X̃P of length m from X̃T of length n.



  Algorithm 1: OPFSIq Algorithm
   Input: text T of length n and pattern P of length m.
   Output: all positions i satisfying T [i : i + m − 1] ≈ P.
 1 Compute the relative order array RP ;
 2 P̃ ← Bq (P)[1 : min (α, m − q)];
 3 if | P̃| < α then P̃ is padded with trailing zeros so that | P̃| = α;
 4 i ← 1;
 5 while i ≤ n − α − q + 1 do
 6        T̃ ← Encodeq (T, i);   S̃ ← SearchStr(P̃, min (α, m − q), T̃ , α);
 7        mask ← MoveMask(S̃ );        sum ← 0;
 8        while mask , 0 do
 9             j ← ctz(mask) + 1; mask ← mask ≫ j; sum ← sum + j;
10            if P ≈ T [i + sum − 1 : i + sum + m − 2] then output i + sum − 1;
11        i ← i + α;
12    Check remaining text naively;



SIMD Instructions Table 1 shows the functions and SIMD instructions used in this
paper. We explain some non-trivial functions in it. The function MoveMask(X̃) returns
a bit mask that (consists of the most ) significant bit of each value in a sequence X̃, i.e.,
             ∑
it returns αi=1 ⌊X̃[i] · 21−w ⌋ · 2i−1 . For two sequences X̃P and X̃T , and two integers
1 ≤ m, n ≤ α, the function SearchStr(X̃P , m, X̃T , n) returns a sequence S̃ such that
                      (                                                            )
              
              
                        X̃P [1 : m] = X̃T [i : i + m − 1] (1 ≤ i ≤ n − m + 1), or
              2 − 1
                w
     S̃ [i] = 
                        X̃P [1 : n − i + 1] = X̃T [i : n]  (n − m + 1 < i ≤ n)
              
              0       (otherwise)

for 1 ≤ i ≤ α. Note that this function compares the prefix of X̃P and the suffix of X̃T for
n − m + 1 < i ≤ n.

Order-preserving Matching Algorithm Using SIMD Instructions We propose a fast
algorithm, called order-preserving matching filtration technique using SIMD instruc-
tions with q-NR code (shortly OPFSIq ), using the functions in Table 1. The OPFSIq
                                                           Fast Order-Preserving Matching ...   113


  Algorithm 2: Encodeq (X, i)
   Input: A sequence X ∈ Σ ∗ , and a position i.
   Output: Bq (X[i : i + α − 1]).
 1 X̃ ← X[i : i + α − 1];          K̃ ← (0, 0, . . . , 0) /* |K̃| = α */
 2 for j ← 1 to q do
 3     X̃ j ← X[i + j : i + j + α − 1]; C̃ j ← CompLt(X̃, X̃ j );
 4     M̃ j ← (2q− j , 2q− j , . . . , 2q− j ) /* | M̃ j | = α */
 5     K̃ j ← And(C̃ j , M̃ j ); K̃ ← Or(K̃, K̃ j );
 6   return K̃;



algorithm is presented in Algorithm 1 and 2. The function ctz(b) returns the number
of trailing zeros in b from the least significant bit. For example, ctz((11000100)2 ) = 2.
This function can be computed fast by using the bsf instruction in x86.
    In this algorithm, a chunk of length α of the text is processed by SIMD instructions.
The chunk of the text is encoded to the q-NR code of length α using function Encodeq
shown in Algorithm 2. The matching between the encoded chunk and the encoded pat-
tern is performed in line 6 of Algorithm 1, by a SIMD instruction. The bit mask rep-
resents candidate positions. For example, if mask = (01000001)2 then candidates are
i and i + 7.
    Our algorithm has some weaknesses in the following two cases.
(1) The case that the pattern is long so that m > α − q. In this case, this algorithm
    uses the prefix of Bq (P), and only checks whether the prefix of Bq (P) matches with
    Bq (T [i : i + α − 1]) or not.
(2) The case that a subsequence order-isomorphic to the pattern is split in two or more
    chunks of text. In this case, this algorithm only checks whether the prefix of Bq (P)
    matches to the suffix of Bq (T [i : i + α − 1]) (see the detail of SearchStr function).
In these two cases, this algorithm can find all correct positions by Lemma 2, since
Bq (X) = Bq (Y) ⇒ Bq (X)[1 : i] = Bq (Y)[1 : i] for 1 ≤ i ≤ |X| − q, although the number
of candidates increases.
    The main difference between our algorithm and the algorithm proposed by Faro
and Külekci [7] is utilizing SIMD instructions. Their algorithm encodes the text naively
and finds candidates based on the SBNDM2 [6] string matching algorithm, while our
algorithm encodes α elements at once and finds candidates by the SIMD instruction.

Optimized Implementation Assume that w = 16 and q ≤ 8. In this case, an 8-bit
integer is enough to handle each value of q-NR code, although a 16-bit integer is used by
the naive implementation of Algorithm 1. Therefore, we can pack two encoded chunks
T̃ 1 = Bq (T [i : i + α − 1]) and T̃ 2 = Bq (T [i + α : i + 2α − 1]) into one 128-bit register, by
extracting the lower 8-bits of each value of T̃ 1 and T̃ 2 . Then, we can perform a matching
between Bq (T [i : i + 2α − 1]) and the encoded pattern by SearchStr function.
     In order to implement it, we encode two chunks in each iteration, and change the
loop step size from α into 2α. Extracting the lower 8-bits of each value and pack-
ing into one 128-bit register can be implemented by the mm shuffle epi8 and the
114                       Y. Ueki, K. Narisawa, and A. Shinohara

 mm or si128 instructions. This optimization technique is very effective because the
 mm cmpestrm instruction (SearchStr function) is much slower than other instructions
[8]. This technique can be used in a similar way in the case of w = 32.


5                     Experimental Results
We performed experiments comparing running time of our algorithm with that of previ-
ous work [2–4, 7]. We used a machine with Intel Xeon E5-2640 processor and 128GB
memory on Ubuntu 14.04LTS. We implemented the OPFSIq algorithm 1 in C++, and
compiled with gcc 4.8.4 . The OPFSIq algorithm was implemented by SSE4.2 intrinsic
functions. Compiler options were -O3 -msse4.2.
    We used two kinds of text data. One was random text data, consisting of 1000000
random integers of range 1 to 100. The other was temperature text data in Sendai city,
consisting of 32452 integers. From a text data, 100 patterns were randomly chosen, and
we computed the average running time of 100 runs for each pattern.
    The algorithms proposed by Chhabra and Tarhio [4], Faro and Külekci [7], Can-
tone et al. [2] and Chhabra et al. [3] are respectively denoted as CT14, FK15, CFK15,
and CKT15. These algorithms are implemented by themselves 2 . Similarly to previous
work [2, 7], CT14 is based on the SBNDM2 algorithm. CKT15 utilizes SSE4.2 and
AVX instruction set.


                  8                                                                0 25
                                               OPFSI1 (w=8)                                                  OPFSI4 (w=8)
                  7                            OPFSI4 (w=8)                                        OPFSI4 (w=16, optimized)
                                              OPFSI4 (w=16)                                        OPFSI2 (w=32, optimized)
                                                                                    0.2
                  6                  OPFSI4 (w=16, optimized)                                                         CT14
                                     OPFSI3 (w=32, optimized)                                                         FK15
                                                        CT14                                                        CFK15
                                                                     time (msec)
    time (msec)




                  5                                     FK15                       0.15                             CKT15
                                                      CFK15
                  4                                   CKT15
                                                                                    0.1
                  3

                  2
                                                                                   0 05
                  1

                  0                                                                  0
                      5      10      15         20        25    30                        5   10    15          20       25   30
                                    pattern length m                                               pattern length m



                      Fig. 2. Running times on random data.                   Fig. 3. Running times on temperature data.


    Results are shown in Fig. 2 and Fig. 3. In FK15 and CFK15, the best results are
selected among various parameters. The parameter w in OPFSIq means that the text data
are treated as either w-bit integers (w = 8, 16) or w-bit floating points (w = 32), and
optimized denotes that the optimization technique described in Section 4.1 is applied. In
Fig. 2 and Fig. 3, we show the best results of the OPFSIq algorithm with w = 8, 16, 32.
Furthermore, in Fig. 2 we also show results of OPFSI1 (w = 8) and OPFSI4 (w = 16) in
order to discuss the effectiveness of the OPFSIq algorithm.
 1
   Our implementations can be downloaded from http://www.shino.ecei.tohoku.ac.jp/
   member/youhei_ueki/sofsem2016
 2
   We acknowledge the authors for sharing their program.
                                                     Fast Order-Preserving Matching ...        115

    The OPFSIq algorithm runs extremely faster for short patterns. Especially, for
OPFSI4 (w = 8) and m = 7, the algorithm runs approximately 4.7 times faster than
all the previous work on the random text data. Comparisons between the results of
OPFSI1 (w = 8) with OPFSI4 (w = 8), and OPFSI4 (w = 16) with OPFSI4 (w = 16,
optimized) respectively show the effectiveness of q-NR filtration and the optimization
technique described above. All previous work runs faster as the pattern length increases,
whereas the OPFSIq algorithm does not. This is because the OPFSIq algorithm runs in
linear time on average.



6    Conclusion
We proposed an effective filtration for the order-preserving matching problem using
SIMD instructions, and confirmed that it practically runs faster than existing methods.
    We used the SIMD instruction set SSE4.2 that supports 128 bit registers. We expect
that it will become faster if we use other SIMD instruction sets that support wider
registers, such as AVX2 supporting 256 bit registers and AVX-512 supporting 512 bit
registers.

Acknowledgments. This work was supported by ImPACT Program of Council for
Science, Technology and Innovation (Cabinet Office, Government of Japan), and KAK-
ENHI Grant Numbers 25240003 and 15H05706.

References
 1. Belazzougui, D., Pierrot, A., Raffinot, M., Vialette, S.: Single and multiple consecutive
    permutation motif search. In: ISAAC. 66–77 (2013)
 2. Cantone, D., Faro, S., Külekci, M.O.: An efficient skip-search approach to the order-
    preserving pattern matching problem. 22–35 (2015)
 3. Chhabra, T., Külekci, M.O., Tarhio, J.: Alternative algorithms for order-preserving matching.
    In: PSC. 36–46 (2015)
 4. Chhabra, T., Tarhio, J.: Order-preserving matching with filtration. In: SEA. 307–314 (2014)
 5. Cho, S., Na, J.C., Park, K., Sim, J.S.: A fast algorithm for order-preserving pattern matching.
    Information Processing Letters 115(2) 397–402 (2015)
 6. Durian, B., Holub, J., Peltola, H., Tarhio, J.: Improving practical exact string matching.
    Information Processing Letters 110(4) 148–152 (2010)
 7. Faro, S., Külekci, M.O.: Efficient algorithms for the order preserving pattern matching prob-
    lem. arXiv:1501.04001 (2015)
 8. Intel Corporation: Intel (R) 64 and IA-32 Architectures Optimization Reference Manual.
    (2014)
 9. Kim, J., Eades, P., Fleischer, R., Hong, S.H., Iliopoulos, C.S., Park, K., Puglisi, S.J.,
    Tokuyama, T.: Order-preserving matching. Theoretical Computer Science 525(13) 68–79
    (2014)
10. Knuth, D.E., Jr., J.H.M., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on
    Computing 6(2) 323–350 (1977)
11. Kubica, M., Kulczynski, T., Radoszewski, J., Rytter, W., Walen, T.: A linear time algo-
    rithm for consecutive permutation pattern matching. Information Processing Letters 113(12)
    430–433 (2013)