=Paper= {{Paper |id=Vol-2756/paper13 |storemode=property |title=Shape-Preserving Pattern Matching |pdfUrl=https://ceur-ws.org/Vol-2756/paper_13.pdf |volume=Vol-2756 |authors=Domenico Cantone,Simone Faro,M. Oguzhan Kulekci |dblpUrl=https://dblp.org/rec/conf/ictcs/CantoneFK20 }} ==Shape-Preserving Pattern Matching== https://ceur-ws.org/Vol-2756/paper_13.pdf
           Shape-Preserving Pattern Matching?

           Domenico Cantone† , Simone Faro† and M.Oguzhan Külekci‡
             †
                 Università di Catania, Viale A. Doria 6, 95125 Catania, Italy
                        {domenico.cantone,simone.faro}@unict.it
                                ‡
                                    Istanbul Technical University
                                        kulekci@itu.edu.tr



        Abstract. Two sequences of integers x and z of the same length m ≥ 2
        are shape-isomorphic if, up to a positive proportional factor, the se-
        quences of the distances between consecutive elements of x and z are
        the same, i.e., if for some ρ > 0 one has x[i + 1] − x[i] = ρ(z[i + 1] − z[i]),
        for each 1 ≤ i ≤ m − 1. In this paper we present two linear-time algo-
        rithms which, given a text y and a pattern x over an integer alphabet,
        finds all the factors of y that are shape-isomorphic to x. Our first so-
        lution is a two steps algorithm based on a reduction to the standard
        exact string matching problem, while our second solution is an online
        algorithm based on the well-known Knuth-Morris-Pratt string matching
        algorithm. We call this problem shape-preserving pattern-matching prob-
        lem.

        Keywords: Approximate text analysis, non-standard string matching,
        text processing.


1     Introduction

Given a pattern x of length m and a text y of length n, both over a common
alphabet Σ, the exact string matching problem consists in finding all occurrences
of the string x in y. String matching is a very important subject in the wider
domain of text processing. Algorithmic solutions for it, both in its exact and ap-
proximation versions, are basic components of the implementations of practical
softwares available under most operating systems.
    Among the different approximation variants of the string matching problem,
the order-preserving pattern-matching problem [12, 7, 6, 4, 10, 5, 3] (OPPM, for
short) has recently gained attention. In this variant, the characters of the pattern
and of the text are drawn from a linearly ordered alphabet Σ, so that each string
z in Σ ? can naturally be mapped into the sequence of the ranks (within the
ordered sequence of the characters occurring in z) of its characters, which we
call rank sequence. For instance, if the alphabetic order is used for characters,
?
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0). This work was supported
    by the University of Catania “programma ricerca di ateneo UNICT 2020-22 linea 2”
              y = 8 11 13 20 14 8 17 15 14 18 22 18 14 20 15 25 26
              x =8 6 4 7




Fig. 1. Example of a pattern x of length 4 over an integer alphabet with two shape
preserving occurrences in a text y of length 17, at positions 3 and 10.


the rank sequence of the string x = “gdich” is the sequence h3, 2, 5, 1, 4i, since
g has rank 3 in x, d has rank 2 in x, and so on. Then, given a text y and a
pattern x, the OPPM problem consists in finding all the order-occurrences of x
in y, namely the factors of y that have the same rank sequence as x.
    The first solution to the OPPM problem was presented by Kubica et al. [12]
in 2013. They provided a O(n+m log m) solution over generic ordered alphabets
based on the Knuth-Morris-Pratt algorithm [13] and also a O(n) solution in the
case of integer alphabets. In the same year, Cho et al. [7] showed how the Boyer-
Moore approach [2] can be applied to the OPPM problem, and Belazzougui
et al. [1] showed that the Aho-Corasik approach can be applied to the OPPM
problem for searching a set of patterns. More recently, Chhabra and Tarhio [6]
have proposed a more practical solution based on the filtration method.
    However, for applications such as time series analysis, weather data analysis,
music melody matching, etc., OPPM is not adequate, as order-occurrences do not
retain enough information to catch up the shape features of interest. Consider
for instance the sequences shown in Fig. 1, where y may represent the price
variation of some goods throughout a period of time. In this context, the pattern
x = h8, 6, 4, 7i could be interpreted as follows: when the price decreases twice
the same amount δ, then one expects a subsequent increase of the price of 32 δ.
Observe that the given pattern has two order-occurrences in y, the first one at
position 4 and the second one at position 11. However, despite of the fact that
x and h18, 12, 11, 13i share the same relative order, the two sequences are far
from being similar. On the other hand, the second occurrence, h22, 18, 14, 20i,
perfectly catches up the features of the pattern x.
    In this paper we shall consider a restricted variant of the OPPM problem,
that we call shape-preserving pattern-matching problem.
    We say that two non-constant strings x and z of the same length m ≥ 2, over
an integer alphabet Σ = {1, 2, . . . , σ},1 are shape-isomorphic if, for some positive
factor ρ > 0, we have x[i+1]−x[i] = ρ(z[i+1]−z[i]), for 1 ≤ i ≤ m−1. Concerning
constant strings of the same length, we agree that they are shape-isomorphic
with proportionality factor ρ = 0. Then, the shape-preserving pattern-matching
1
    For notational convenience, we shall restrict our presentation to integer alphabets of
    the form Σ = {1, 2, . . . , σ} only. However, all our considerations can be immediately
    generalized to any finite linearly ordered alphabet, by just identifying each character
    with its position in the alphabet.


                                             2
problem (SPPM, for short) is the problem of finding all the factors of a given
text y that are shape-isomorphic to a given pattern x, where x and y are strings
over an integer alphabet.
    We observe that in many practical cases a fixed ratio of proportionality is
rare, thus a more practical approach would be to consider an approximate ver-
sion of the problem or allowing the ratios to belong to some bounded interval.
However the restricted problem accounted in this paper is a first step towards a
more suitable method for comparing two numeric strings.
    Specifically we shall provide two linear-time algorithms for the SPPM prob-
lem, based on (1) a reduction to the exact string matching problem and (2) on
a simple, yet non trivial, modification of the Knuth-Morris-Pratt algorithm.
    The paper is organized as follows. In Section 2, after some preliminary no-
tions, we formally define the concept of shape-isomorphism, and state some of its
properties. Then in Section 3 we present a linear-time algorithm for the SPPM
problem based on a simple reduction to the exact string matching problem,
while in Section 4 we present a linear-time KMP based algorithm, proving also
its correctness. We draw our conclusions in Section 5.


2    Preliminary Notions and Definitions

Let Σ = {1, 2, . . . , σ} be a finite integer alphabet of size σ. A string x over Σ is
a sequence of elements in Σ. We denote by |x| the length of x and by x[i], for
1 ≤ i ≤ |x|, the i-th element of x. In addition, for 1 ≤ i ≤ j ≤ |x|, we denote
by x[i .. j] the substring of x of length (j − i + 1), starting with the element of x
at position i and ending with the element of x at position j. By x.y we denote
the concatenation of x and y. We write Σ + for the collection of all nonnull finite
strings over Σ.
    Two sequences x, y ∈ Σ + of the same length are said to be order-isomorphic
if their elements have the same relative order. More formally, we have:

Definition 1 (Order-Isomorphism). Two sequences x, y ∈ Σ + of the same
length are order-isomorphic, and we write x ∼ y, if the following condition holds:

         x[i] ≤ x[j]    if and only if   y[i] ≤ y[j] ,   for 1 ≤ i, j ≤ |x| .

It is immediate to check that order-isomorphism is an equivalence relation.
    We say that two sequences x, y ∈ Σ + of the same length m ≥ 2 are shape-
isomorphic if, up to a positive factor, the sequences of the distances between
consecutive elements of x and y are the same. We also agree that any two se-
quences of length 1 are always regarded as shape-isomorphic. More formally, we
have:

Definition 2 (Shape-Isomorphism). Two non-constant sequences x, y ∈ Σ +
of the same length m ≥ 2 are said to be shape-isomorphic with proportionality
factor ρ > 0 (or, more simply, ρ-isomorphic), and we write x ≈ρ y, if the follow-
ing condition holds: x[i + 1] − x[i] = ρ(y[i + 1] − y[i]), for all 1 ≤ i ≤ m − 1 .

                                          3
If x, y ∈ Σ + are constant sequences of the same length m ≥ 1, we agree that
they are shape-isomorphic with proportionality factor ρ = 0, and write x ≈0 y.
We say that two sequences x, y ∈ Σ + are shape-isomorphic, and write x ≈ y, if
they are ρ-isomorphic, for some factor ρ ≥ 0.

   The following lemma lists some very elementary facts concerning shape- and
order-isomorphism. In particular, it states that shape-isomorphism is a heredi-
tary equivalence relation which is finer than order-isomorphism.
Lemma 1. Let x, y, z ∈ Σ + , where Σ = {1, 2, . . . , σ}, and let ρ > 0 and ρ1 , ρ2 ≥
0. Then the following properties hold:
(a) either x ≈0 x or x ≈1 x
(b) if x ≈ρ y, then y ≈ ρ1 x;
(c) if x ≈ρ1 y and y ≈ρ2 z, then x ≈ρ1 ρ2 z;
(d) if x ≈ρ1 y, then either x[i .. j] ≈ρ1 y[i .. j] or x[i .. j] ≈0 y[i .. j], for all
     1 ≤ i ≤ j ≤ |x| − 1;
(e) ≈ is an equivalence relation over Σ + ;
(f ) if x ≈ y, then x ∼ y, i.e., shape-isomorphism is finer than order-isomorphism;
(g) if x ≈ y, then x[i .. j] ≈ y[i .. j], for all 1 ≤ i ≤ j ≤ |x| − 1, i.e., shape-
     isomorphism is hereditary on substrings.
   By exploiting the characterization contained in the following straightforward
lemma, one can easily test in linear time whether two given sequences of the
same length are shape-isomorphic.
Lemma 2. Let x, z ∈ Σ + be two sequences of the same length m. Then x ≈ z
if and only if either x and z are both constant sequences, or x ≈ρ z, where
ρ = x[i+1]−x[i]
     z[i+1]−z[i] , for any 1 ≤ i ≤ m − 1 such that z[i] 6= z[i + 1].


2.1   The Shape-Preserving Pattern-Matching Problem
Next, we formally define the shape-preserving pattern-matching problem.
Definition 3 (Shape-Preserving Pattern-Matching Problem). Let x, y ∈
Σ + , where Σ = {1, 2, . . . , σ}, be sequences of length m and n, respectively, such
that m ≤ n. The shape-preserving pattern-matching problem consists in finding
all shape-occurrences of x in y, namely all positions 1 ≤ i ≤ n − m + 1 such
that y[i .. i + m − 1] ≈ x. In this context, x is the pattern and y is the text.
    By Lemma 1(f), any algorithm for the OPPM problem can be used as a filter
to locate all candidate shape-occurrences of a pattern x in a text y. When an
order-occurrence of x is found in y, a O(|x|)-time verification procedure based
on Lemma 2 can be run to check whether such an order-occurrence is shape-
isomorphic to x. Since, for a pattern of length m and a text of length n, the
OPPM problem can be solved in time O(n), the algorithm just outlined for the
SPPM problem will run in O(n + km)-time, where k is the number of order-
occurrences of x in y.


                                          4
            y = 8 11 10 15 16 14 17 15 14 18 15 18 12 21 15 25 26
            x = 6 8 4 10 6




Fig. 2. A pattern x of length 5 with two shape-occurrences in a text y of length 17: at
positions 4 and 11, with proportionality factors 21 and 23 , respectively.


    In the next sections we present two linear algorithms for the SPPM problem.
Our first solution, presented in Section 3, is a two steps algorithm based on a
reduction of the SPPM problem to the standard exact string matching problem.
Our second solution, presented in Section 4, is a single step algorithm modeled
after the well-known Knuth-Morris-Pratt algorithm, whose running time does
not dependent on the number of order-occurrences.

3    Reducing SPPM to Exact Pattern Matching
In this section we present a first linear algorithms for solving the SPPM prob-
lem. Our solution is straightforward and assume that the proportionality ratio
between the pattern and its occurrence in the text is a fixed constant ρ ≥ 0. We
observe that in this case the problem can be easily reduced, after some suitable
transformations of the input strings, to the ordinary string matching problem.
To begin we give some additional definitions.
Definition 4 (Delta Function). The Delta Function δ() is the fucntion which
associates a given input string x with the corresponding sequence of the differ-
ences between adjacent characters in x. Formally, for a given string x ∈ Σ ∗ ,
with |x| = m, we define δ(x) as the numeric sequence of length |x| − 1 such that,
for 1 ≤ i ≤ m − 1, δ(x)[i] := x[i + 1] − x[i].
Definition 5 (Last Non-Zero Function). The last non-zero function γ() is
the function which associates a given input string x with the sequence of the last
non zero element up to each position of x. Formally, for a given string x ∈ Σ ∗ ,
with |x| = m, we have that γ(x) is a sequence of length m − 1, such that, for
1 ≤ i ≤ m − 1,
        γ(x)[i] := x[j] where j = max({1 ≤ h < i + 1 : x[h] 6= 0} ∪ {0})
    Observe that γ(x)[1] = 0 if and only if x[1] = 0.
Definition 6 (Ratios Function). The ratios-function ψ() is the function which
associates a given input string x with the sequence of the ratios between its (al-
most) adjacent characters. Formally, for a string x ∈ Σ ∗ of length |x| = m we
have, for 1 ≤ i ≤ m − 1
                                          x[i + 1]
                               ψ(x)[i] :=
                                           γ(x)[i]

                                          5
    Our algorithm is based on a reduction of the SPPM problem to the exact
string matching problem. Such reduction is inspired by the following straight-
forward technical lemmas. Specifically the following elementary Lemma 3 de-
scribes how to compute shape-isomorphic occurrences for a constant string,
while Lemma 5 describes how to compute shape-isomorphic occurrences for a
non-constant string.
Lemma 3. Let x be a constant string of length m and let y be a text of length
n, both strings over the same alphabet Σ. Then we have that x has a shape-
isomorphic occurrence at position i of the text, i.e. x ≈ y[i..i + m − 1], if and
only if δ(x)[j] = δ(y)[i + j] for 1 ≤ j ≤ m − 1.
Lemma 4. Let x and y be two non constant strings over the same alphabet Σ,
with |x| = |y| = m and such that x ≈ρ y with a proportionality factor ρ > 0.
Then we have that γ(δ(x))[i] = ρ · γ(δ(y))[i], for all 1 ≤ i < m − 1.
Lemma 5. Le x and y be two strings of length m > 2 over the same alphabet
Σ, and assume x[1] 6= x[2] and y[1] 6= y[2]. Then x ≈ y if and only if ψ(δ(x)) =
ψ(δ(y)) and δ(x)[1] × δ(y)[1] > 0.
    Based on Lemma 5, Algorithm1 depicted in Fig.4 finds all shape-isomorphic
occurrences of a given pattern x of length m in a given text y of length n. Dur-
ing the preprocessing phase the algorithm performs a partition of the pattern x
into two strings, x1 and x2 , where x1 is the constant prefix of the pattern with
maximal length, while x2 is the suffix of the pattern with maximal length such
that x2 [1] 6= x2 [2].
    More formally we compute an index k, such that

                k = max({j : j > 0 and x[j] 6= x[j + 1]} ∪ {0}),

and set x1 = x[1 . . . k] and x2 = x[k . . . m]. The value of k can be computed in
linear time in the size of x.
    In a first step the algorithm computes all shape-isomorphic occurrences of x1
in y. Since x1 is a constant sequence all such occurrences are simply computed
(based on Lemma 3) by running a linear exact string matching algorithm in
order to search δ(y) for all occurrences of δ(x1 ). We indicate with Γ1 the set of
such occurrences. In the event that |x1 | = 1 we skip this step and set Γ1 = {i :
1 ≤ i ≤ n − m}.
    In a second step the algorithm searches for all shape-isomorphic occurrences
of x2 in y by running (based on Lemma 5) a linear exact string matching algo-
rithm in order to search γ(δ(y)) for all occurrences of γ(δ(x2 )). We indicate with
Γ2 the set of such occurrences. In the event that |x2 | < 2 we skip this step and
set Γ2 = {i : 1 ≤ i ≤ n − m}.
    The last step of the algorithm the two sets of occurrences Γ1 and Γ2 are
combined in order to find the set Γ of all positions i such that x ≈ y[i..i + m − 1].
We keep out from Γ2 all values i such that δ(x)[1] × δ(y)[i] < 0 (this is due to
the condition ρ >= 0 in the shape-isomorphic Definition 2). Specifically we have
Γ = {i − k + 2 : i ∈ Γ2 and (i − k + 2) ∈ Γ1 and δ(x)[1] × δ(y[i]) > 0}.


                                         6
Algorithm1(x, y)
1.   m ← |x|;
2.   k ← min({j : 1 ≤ j ≤ m − 1 and x[j] 6= x[j + 1]} ∪ {m});
3.   x1 ← x[1..k];
4.   x2 ← x[k..m];
5.   Γ1 ← Γ2 ← {i : 1 ≤ i ≤ n − m};
6.   if(|x1 | > 1) then Γ1 ← occurrences of δ(x1 ) in δ(y);
7.   if(|x2 | > 2) then Γ2 ← occurrences of ψ(δ(x2 )) in ψ(δ(y));
8.   for each i ∈ Γ2 do;
9.        if(δ(x)[1] × δ(y[i]) > 0 and (i − k + 2) ∈ Γ1 ) then output(i − k + 2);

Fig. 3. A linear-time algorithm for the SPPM problem based on a reduction to the
exact string matching problem.



   If we use a linear worst case time exact string matching algorithm for the
two steps, then it is trivial to observe that Algorithm1 achieves an O(n + m)
worst case time complexity. Observe, however, that the last step of the algorithm
depends on the number of occurrences of x1 and x2 , which is always bounded
by O(n).


4    A KMP based Algorithm for the SPPM Problem
The Knuth-Morris-Pratt algorithm [13] has been the first algorithm to achieve
a linear worst-case time complexity for the exact pattern matching problem.
It uses a prefix table, also called border table or prefix function, to carry the
information which allows one to compute in constant time the length of the
longest safe shift when a mismatch occurs or a match is found, thus keeping the
number of character comparisons linear in the size of the text.
    Much in the same way, for the SPPM problem of our interest we shall use a
shape-border table, which associates to each position i of a given pattern x the
length of the longest proper suffix of x[1 .. i] that is shape-isomorphic to a prefix
of x[1 .. i].
    The shape-border table for a finite integer sequence x is defined as follows.
Definition 7 (Shape-Border Table). Let x be an integer sequence of length
m ≥ 1. The shape-border table for x is the map πx : {0, 1, . . . , m} → N defined
by
                                                                        
    πx [i] =Def max j : 1 ≤ j < i and x[1 .. j] ≈ x[i − j + 1 .. i] ∪ {0} .
                                                                                 
Plainly, for 2 ≤ i ≤ m, we have 1 ≤ πx [i] < i and x 1 .. πx [i] ≈ x i−πx [i]+1 .. i .
The latter relationship allows us to readily define a related proportionality factor
map %x : {2, 3, . . . , m} → N such that, for 2 ≤ i ≤ m,
                                                                  
                          x 1 .. πx [i] ≈%x [i] x i − πx [i] + 1 .. i .           (1)


                                          7
When the sequence x is understood from the context, we shall simply write π
and % in place of πx and %x .
   Using the notation (·)j for map iteration,2 by induction (1) generalizes to

                    x 1 .. π j [i] ≈%[πj−1 [i]] x i − π j [i] + 1 .. i ,
                                                                   
                                                                             (2)

for all j ≥ 1 such that π j−1 [i] ≥ 2. We also notice that since π[i] < i (for
2 ≤ i ≤ m) and π[1] = π[0] = 0, any sequence of the form π 1 [i], π 2 [i], π 3 [i], . . .
starts with a (possibly empty) strictly decreasing substring of positive integers
and then continues indefinitely with a sequence of 0s.
    Next we show that the shape-border table π satisfies the following recursive
relation, for 1 ≤ i < m, which, together with (2), will yield to a linear algorithm
for computing π:
    (
        π[0] = π[1] = 0n
                                                                                 o
        π[i + 1] = max π j [i] + 1 : x 1 .. π j [i] + 1 ≈ x i − π j [i] + 1 .. i + 1 .   (3)
                       j≥1


Let 1 ≤ i < m and let

   Ai =Def π j [i] + 1 : j ≥ 1 and x 1 .. π j [i] + 1 ≈ x i − π j [i] + 1 .. i + 1 .
                                                                              


By the very definition of π, it follows easily that max Ai ≤ π[i + 1] ≤ π[i] + 1.
Thus, to prove the correctness of (3), it is enough to show that π[i + 1] ∈ Ai ,
which we do as follows. If π[i + 1] = 1, then we plainly have π[i + 1] ∈ Ai . Thus,
let us assume that π[i + 1] ≥ 2. If π[i + 1] = π[i] + 1, we are done; otherwise, let
j be the largest index j ≥ 1 such that π[i + 1] − 1 < π j [i], so that

                                 π j+1 [i] ≤ π[i + 1] − 1 < π j [i].                     (4)
                                                                            
By (2) x 1 .. π j [i] ≈ x i − π j [i] + 1 .. i , and since x 1 .. π[i + 1] − 1 ≈ x i − π[i +
           
1] + 2 .. i and we have π[i + 1] − 1 < π j [i], by the hereditarity and transitivity
of shape-isomorphism it follows that x[1 .. π[i + 1] − 1] is shape-isomorphic to a
proper suffix of x[1 .. π j [i]]. Hence, π[i + 1] − 1 ≤ π j+1 [i] which, by (4), implies
π[i + 1] = π j+1 [i] + 1, proving that π[i + 1] ∈ Ai , and in turn completing the
proof of correctness of (3).
                               ji
    From (3), π[i j+ 1] = π [i]+ 1 ≤j π[i] − ji + 2, where ji is the smallest j ≥ 1
such that x 1 .. π [i] + 1 ≈ x i − π [i] + 1 .. i + 1 . Thus, to compute π[i + 1], it
is enough to test the latter condition ji times, yielding a linear number of tests
in m = |x|, since
         m−1
         X             m−1
                       X                        
                ji ≤         π[i] − π[i + 1] + 2 = π[1] − π[m] + 2m − 2 ≤ m − 3.
          i=1          i=1

2
    We recall that the operator (·)j for map iteration is defined as follows. Given a map
    f : A → A, for x ∈ A we put: f 0 (x) = x and, recursively, f j+1 (x) = f (f j (x)).


                                                 8
            i          1      2      3      4       5      6       7       8
           x[i]        4      2     10      6       22    14      13      17
           π[i]        0      1      1      2       3      4       2       3
           %[i]        -      0     0       2        2    2       0.5     0.5

Table 1. The shape-border table and the proportionality factor map for the pattern
x = h4, 2, 10, 6, 22, 14, 13, 17i.


  Compute-Shape-Border-Table(x)
   1.   π[1] ← 0; π[2] ← 1; %[2] ← 0;
   2.   for i ← 3 to |x| do
   3.         k ← i − 1;
   4.         repeat
   5.              t ← π[k];
   6.              %[i] ← ExtShapeIso x[1 .. t], x[i − t .. i − 1], x[t + 1], x[i], %[k]);
   7.              k ← π[k];
   8.         until %[i] ≥ 0;
   9.         π[i] ← t + 1;
  10.   return π, %;

  ExtShapeIso(x, y, a, b, ρ)
   1.      m ← |x|;
   2.      return if ρ = 0 and x[m] = a and y[m] = b then 0
                  elseif ρ = 0 and (a − x[m]) · (b − y[m]) > 0 then a−x[m]
                                                                    b−y[m]
                  elseif ρ > 0 and (a − x[m]) = ρ · (b − y[m]) then ρ
                  else −1;

Fig. 4. The procedure Compute-Shape-Border-Table for computing the shape-
border table and the proportional factor map for a pattern x and its subroutine
ExtShapeIso.



If the values of the proportionality factor map % are also tabulated during the
computation of the shape-border table π, it is possible to make each of the tests

                      x 1 .. π j [i] + 1 ≈ x i − π j [i] + 1 .. i + 1 ,
                                                                  
                                                                                             (5)

for j = 1, 2, . . . , ji , in constant time by means of the following procedure call

ExtShapeIso x 1 .. π j [i] , x i − π j [i] + 1 .. i , x π j [i] + 1 , x[i + 1], %[π j−1 [i]]
                                                                                      


(cf. Fig. 4). If the condition (5) is true, then such a call will return the propor-
tionality factor %[i + 1] of (5), otherwise it will return the value −1.
    The above considerations leads to the O(m)-time procedure Compute-Shape-
Border-Table reported in Fig. 4 for computing the shape-border table and
the proportional factor map for a pattern x of length m. Table 1 reports the
shape-border table and the proportionality factor map for the pattern x =
h4, 2, 10, 6, 22, 14, 13, 17i.


                                                9
     Much in the style of the Knuth-Morris-Pratt algorithm, we show next how to
find all shape-occurrences of a pattern x (of length m) in a text y (of length n,
with n ≥ m) in time O(n), using the shape-border table and the proportionality
factor map for x. The complete algorithm, called Algorithm2, is reported in
Fig. 5. Line references in the following discussion are intended to point out to
its pseudo-code, even if not explicitly stated.
     Let us assume that it is known that x[1 .. j] ≈ρ y[i .. i + j − 1] holds, for some
ρ ≥ 0, i ≤ n − m, and j ≤ m, and that no further progress in the search for a
shape-occurrence of x at position i in y is possible. This means that either j = m,
in which case a shape-occurrence of x at position i in y has been found (cf. line 9),
or j < m and x[1 .. j + 1] 6≈ y[i .. i + j]. In any case, one needs to examine a new
position i0 > i to find a (new) shape-occurrence of x. We claim that if ` > i is
any position in y of a shape-occurrence of x, then ` ≥ i + j − π[j], so that it is
safe to move to position i0 = i + j − π[j] (cf. line 11). Indeed, if ` < i + j − π[j],
then x[1 .. i + j − `] ≈ y[` .. i + j − 1], so that, by the hereditarity and transitivity
of shape-isomorphism, we would have x[1 .. i + j − `] ≈ x[` − i + 1 .. j]. Hence, by
the very definition of π[j], it would follow that i + j − ` ≤ π[j], a contradiction.
     Notice that once we move to position i0 = i + j − π[j] in y, we already
know that x[1 .. π[j]] ≈ y[i + j − π[j] .. i + j − 1] (for some proportionality factor;
see below). Hence, the matching phase relative to position i0 does not need
to reconsider again the positions from 1 to π[j] of the pattern x and related
positions from i0 to i0 + π[j] − 1 of the text y (cf. line 11). However, to execute
efficiently the matching phase by repeated calls to the procedure ExtShapeIso
in Fig. 4 (cf. line 5), at each step (even at the first one) one needs to know
the proportionality factor of the shape-isomorphic substrings which have been
matched so far. From the initial assumption x[1 .. j] ≈ρ y[i .. i + j − 1], we can
infer that x[1 .. π[j]] ≈ρ y[i+j −π[j] .. i+j −1] holds only when x[1 .. π[j]] is not a
constant sequence. Otherwise, we would have x[1 .. π[j]] ≈0 y[i+j−π[j] .. i+j−1].
Plainly, x[1 .. π[j]] is a constant sequence if and only if %[j] = 0 (cf. line 10).
Finally, we point out that the procedure ExtShapeIso, called at line 5, not
only allows one to make progress in the matching phase, but it also takes care
of updating, if needed, the proportionality factor ρ.
     The above discussion highlighted the key points needed in a more formal
proof of the correctness of the Algorithm2 in Fig. 5 for the SPPM problem.

Complexity issues
Next we show that the overall time complexity of the while-loop at lines 3–11 of
Algorithm2 is O(n), where, as usual, n is the size of the text. Since, as already
shown, the call to procedure Compute-Shape-Border-Table at line 1 takes
O(m) time, where, again, m is the size of the pattern, and since m ≤ n, it will
follow that the total time complexity of Algorithm2 is O(n).
    Plainly, the time complexity of the while-loop at lines 3–11 is dominated by
the number N of calls to procedure ExtShapeIso at line 5 (each of which takes
constant time). Let us associate to each such a call

            ExtShapeIso x[1 .. j], y[i .. i + j − 1], x[j + 1], y[i + j], ρ),


                                           10
    Algorithm2(x, y)
     1.   (π, %) ← Compute-Shape-Border-Table(x)
     2.   i ← 1; j ← 1; ρ ← 0;
     3.   while i ≤ |y| − |x| do
     4.        repeat
     5.              ρ0 ← ExtShapeIso x[1 .. j], y[i .. i + j − 1], x[j + 1], y[i + j], ρ);
     6.              if ρ0 ≥ 0 then
     7.                    j ← j + 1; ρ ← ρ0 ;
     8.        until j = |x| or ρ0 = −1;
     9.        if j = |x| then write i; //a shape-occurrence has been found
    10.        if %[j] = 0 then ρ ← 0;
    11.        i ← i + j − π[j]; j ← π[j];

          Fig. 5. A linear-time KMP based algorithm for the SPPM problem.




the pair (i, j) of the values of the parameters i and j when the call is made, and
form their sequence
                            (i1 , j1 ), (i2 , j2 ), . . . , (iN , jN ),         (6)
following the same ordering of the calls (so that (i1 , j1 ) = (1, 1)).
    By a simple inspection of the pseudo-code of Algorithm2, it is easy to see
that the sequence (6) enjoys the following two properties:
(a) i1 + j1 ≤ i2 + j2 ≤ · · · ≤ iN + jN ≤ n; (b) j` ≥ 1, for all 1 ≤ ` ≤ N .
    To ease presentation, let us refer to any pair h(i, j), (i0 , j 0 )i of consecutive
pairs (i, j), (i0 , j 0 ) in (6) as a transition, and distinguish between increasing
transitions, when j 0 = j + 1, and decreasing transitions, when j 0 < j. Let I
and D be, respectively, the number of increasing and of decreasing transitions
in (6). We plainly have N = I + D + 1, as any transition is either increasing or
decreasing. In addition, since j1 = 1, we have D ≤ I, so that N ≤ 2D+1. Finally,
for any increasing transition h(i, j), (i0 , j 0 )i, we have 2 ≤ i + j < i0 + j 0 ≤ n, so
that I ≤ n − 2. From the latter inequality, we obtain N ≤ 2n − 3, yielding the
linear bound O(n) seeked for for the time complexity of our Algorithm2.


5     Conclusions

In this paper we introduced a restricted variant of the Order Preserving Pattern
Matching problem, called Shape Preserving Pattern Matching. Specifically we
say that two non-constant strings x and z of the same length m ≥ 2 are shape-
isomorphic if, for some positive factor ρ > 0, we have x[i + 1] − x[i] = ρ(z[i +
1) − z[i]), for 1 ≤ i ≤ m − 1. In most practical applications this restricted variant
turns out to be more effective then the original problem. We also provided two
linear-time algorithms for such problem, based on a simple, yet non trivial,
modification of the Knuth-Morris-Pratt algorithm.
    Although a fixed ratio between two sequences is rare in practice, this pa-
per presents a first step towards a more suitable way to compare two numeric


                                              11
sequences. A more practical approach would be considering a k-approximate ver-
sion of the problem or allowing the ratios to belong to some bounded interval.
Moreover, we are also interested to extend the dependency of differences in the
occurrence from the differences in the pattern not only for a linear function but
also for an arbitrary function.


References
1. Belazzougui, D., Pierrot, A., Raffinot, M., Vialette, S.: Single and Multiple Consec-
   utive Permutation Motif Search, In Proc. of ISAAC 2013, LNCS, vol. 8283, 66–77.
   Springer (2013)
2. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the
   ACM 20(10), 762–772 (1977)
3. Cantone, D. Faro, S., Külekci, M.O., The order-preserving pattern matching prob-
   lem in practice, Discret. Appl. Math., vol.274, pp.11–25 (2020)
4. Cantone, D., Faro, S., Külekci, M.O., An Efficient Skip-Search Approach to the
   Order-Preserving Pattern Matching Problem, in Proc. of Stringology 2015, pp. 22-
   35 (2015)
5. Chhabra, T., Faro, S., Külekci, M.O., Tarhio, J., Engineering order-preserving pat-
   tern matching with SIMD parallelism, Softw. Pract. Exp., vol.47 (5), pp. 731–739
   (2017)
6. Chhabra, T., Tarhio, J.: Order-preserving matching with filtration. In: Proc. SEA
   ’14, 13th International Symposium on Experimental Algorithms. LNCS, vol. 8504,
   307–314. Springer (2014)
7. Cho, S., Na, J.C., Park, K., Sim, J.S.: Fast order-preserving pattern matching. In:
   Widmayer, P., Xu, Y., Zhu, B. (eds.) COCOA 2013. LNCS, vol. 8287, 295–305.
   Springer (2013)
8. Crochemore, M, Iliopoulos, C.S., Kociumaka, T., Kubica, M., Langiu, A., Pissis,
   S.P., Radoszewski, J., Rytter, W., Walen, T.: Order-preserving incomplete suffix
   trees and order-preserving indexes. In: Proc. SPIRE 2013, 20th International Sym-
   posium. LNCS, vol. 8214, 84–95. Springer (2013)
9. Durian, B., Holub, J., Peltola, H., Tarhio, J.: Improving practical exact string match-
   ing. Information Processing Letters 110(4): 148–152 (2010)
10. Faro, S., Külekci, M.O., Efficient Algorithms for the Order Preserving Pattern
   Matching Problem, In Proc. of Algorithmic Aspects in Information and Management
   - 11th International Conference, AAIM 2016, Lecture Notes in Computer Science,
   vol.9778, pp.185–196, Springer (2016)
11. Kim, J., Eades, P., Fleischer, R., Hong, S.-H., Iliopoulos, C.S., Park, K., Puglisi, S.
   J., Tokuyama, T.: Order preserving matching. Theoretical Computer Science 525,
   68–79 (2014)
12. Kubica, M., Kulczynski, T., Radoszewski, J., Rytter, W., Walen, T.: A linear time
   algorithm for consecutive permutation pattern matching. Information Processing
   Letters 113(12), 430–433 (2013)
13. Knuth, D.E., Morris, J.M., Pratt, V.R.: Fast pattern matching in strings. SIAM
   Journal on Computing 6(2), 323–350 (1977)
14. Navarro, G., Raffinot, M.: Flexible pattern matching in strings. Practical on-line
   search algorithms for texts and biological sequences. Cambridge University Press,
   New York, NY, 2002



                                            12