=Paper=
{{Paper
|id=Vol-2756/paper13
|storemode=property
|title=Shape-Preserving Pattern Matching
|pdfUrl=https://ceur-ws.org/Vol-2756/paper_13.pdf
|volume=Vol-2756
|authors=Domenico Cantone,Simone Faro,M. Oguzhan Kulekci
|dblpUrl=https://dblp.org/rec/conf/ictcs/CantoneFK20
}}
==Shape-Preserving Pattern Matching==
Shape-Preserving Pattern Matching? Domenico Cantone† , Simone Faro† and M.Oguzhan Külekci‡ † Università di Catania, Viale A. Doria 6, 95125 Catania, Italy {domenico.cantone,simone.faro}@unict.it ‡ Istanbul Technical University kulekci@itu.edu.tr Abstract. Two sequences of integers x and z of the same length m ≥ 2 are shape-isomorphic if, up to a positive proportional factor, the se- quences of the distances between consecutive elements of x and z are the same, i.e., if for some ρ > 0 one has x[i + 1] − x[i] = ρ(z[i + 1] − z[i]), for each 1 ≤ i ≤ m − 1. In this paper we present two linear-time algo- rithms which, given a text y and a pattern x over an integer alphabet, finds all the factors of y that are shape-isomorphic to x. Our first so- lution is a two steps algorithm based on a reduction to the standard exact string matching problem, while our second solution is an online algorithm based on the well-known Knuth-Morris-Pratt string matching algorithm. We call this problem shape-preserving pattern-matching prob- lem. Keywords: Approximate text analysis, non-standard string matching, text processing. 1 Introduction Given a pattern x of length m and a text y of length n, both over a common alphabet Σ, the exact string matching problem consists in finding all occurrences of the string x in y. String matching is a very important subject in the wider domain of text processing. Algorithmic solutions for it, both in its exact and ap- proximation versions, are basic components of the implementations of practical softwares available under most operating systems. Among the different approximation variants of the string matching problem, the order-preserving pattern-matching problem [12, 7, 6, 4, 10, 5, 3] (OPPM, for short) has recently gained attention. In this variant, the characters of the pattern and of the text are drawn from a linearly ordered alphabet Σ, so that each string z in Σ ? can naturally be mapped into the sequence of the ranks (within the ordered sequence of the characters occurring in z) of its characters, which we call rank sequence. For instance, if the alphabetic order is used for characters, ? Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). This work was supported by the University of Catania “programma ricerca di ateneo UNICT 2020-22 linea 2” y = 8 11 13 20 14 8 17 15 14 18 22 18 14 20 15 25 26 x =8 6 4 7 Fig. 1. Example of a pattern x of length 4 over an integer alphabet with two shape preserving occurrences in a text y of length 17, at positions 3 and 10. the rank sequence of the string x = “gdich” is the sequence h3, 2, 5, 1, 4i, since g has rank 3 in x, d has rank 2 in x, and so on. Then, given a text y and a pattern x, the OPPM problem consists in finding all the order-occurrences of x in y, namely the factors of y that have the same rank sequence as x. The first solution to the OPPM problem was presented by Kubica et al. [12] in 2013. They provided a O(n+m log m) solution over generic ordered alphabets based on the Knuth-Morris-Pratt algorithm [13] and also a O(n) solution in the case of integer alphabets. In the same year, Cho et al. [7] showed how the Boyer- Moore approach [2] can be applied to the OPPM problem, and Belazzougui et al. [1] showed that the Aho-Corasik approach can be applied to the OPPM problem for searching a set of patterns. More recently, Chhabra and Tarhio [6] have proposed a more practical solution based on the filtration method. However, for applications such as time series analysis, weather data analysis, music melody matching, etc., OPPM is not adequate, as order-occurrences do not retain enough information to catch up the shape features of interest. Consider for instance the sequences shown in Fig. 1, where y may represent the price variation of some goods throughout a period of time. In this context, the pattern x = h8, 6, 4, 7i could be interpreted as follows: when the price decreases twice the same amount δ, then one expects a subsequent increase of the price of 32 δ. Observe that the given pattern has two order-occurrences in y, the first one at position 4 and the second one at position 11. However, despite of the fact that x and h18, 12, 11, 13i share the same relative order, the two sequences are far from being similar. On the other hand, the second occurrence, h22, 18, 14, 20i, perfectly catches up the features of the pattern x. In this paper we shall consider a restricted variant of the OPPM problem, that we call shape-preserving pattern-matching problem. We say that two non-constant strings x and z of the same length m ≥ 2, over an integer alphabet Σ = {1, 2, . . . , σ},1 are shape-isomorphic if, for some positive factor ρ > 0, we have x[i+1]−x[i] = ρ(z[i+1]−z[i]), for 1 ≤ i ≤ m−1. Concerning constant strings of the same length, we agree that they are shape-isomorphic with proportionality factor ρ = 0. Then, the shape-preserving pattern-matching 1 For notational convenience, we shall restrict our presentation to integer alphabets of the form Σ = {1, 2, . . . , σ} only. However, all our considerations can be immediately generalized to any finite linearly ordered alphabet, by just identifying each character with its position in the alphabet. 2 problem (SPPM, for short) is the problem of finding all the factors of a given text y that are shape-isomorphic to a given pattern x, where x and y are strings over an integer alphabet. We observe that in many practical cases a fixed ratio of proportionality is rare, thus a more practical approach would be to consider an approximate ver- sion of the problem or allowing the ratios to belong to some bounded interval. However the restricted problem accounted in this paper is a first step towards a more suitable method for comparing two numeric strings. Specifically we shall provide two linear-time algorithms for the SPPM prob- lem, based on (1) a reduction to the exact string matching problem and (2) on a simple, yet non trivial, modification of the Knuth-Morris-Pratt algorithm. The paper is organized as follows. In Section 2, after some preliminary no- tions, we formally define the concept of shape-isomorphism, and state some of its properties. Then in Section 3 we present a linear-time algorithm for the SPPM problem based on a simple reduction to the exact string matching problem, while in Section 4 we present a linear-time KMP based algorithm, proving also its correctness. We draw our conclusions in Section 5. 2 Preliminary Notions and Definitions Let Σ = {1, 2, . . . , σ} be a finite integer alphabet of size σ. A string x over Σ is a sequence of elements in Σ. We denote by |x| the length of x and by x[i], for 1 ≤ i ≤ |x|, the i-th element of x. In addition, for 1 ≤ i ≤ j ≤ |x|, we denote by x[i .. j] the substring of x of length (j − i + 1), starting with the element of x at position i and ending with the element of x at position j. By x.y we denote the concatenation of x and y. We write Σ + for the collection of all nonnull finite strings over Σ. Two sequences x, y ∈ Σ + of the same length are said to be order-isomorphic if their elements have the same relative order. More formally, we have: Definition 1 (Order-Isomorphism). Two sequences x, y ∈ Σ + of the same length are order-isomorphic, and we write x ∼ y, if the following condition holds: x[i] ≤ x[j] if and only if y[i] ≤ y[j] , for 1 ≤ i, j ≤ |x| . It is immediate to check that order-isomorphism is an equivalence relation. We say that two sequences x, y ∈ Σ + of the same length m ≥ 2 are shape- isomorphic if, up to a positive factor, the sequences of the distances between consecutive elements of x and y are the same. We also agree that any two se- quences of length 1 are always regarded as shape-isomorphic. More formally, we have: Definition 2 (Shape-Isomorphism). Two non-constant sequences x, y ∈ Σ + of the same length m ≥ 2 are said to be shape-isomorphic with proportionality factor ρ > 0 (or, more simply, ρ-isomorphic), and we write x ≈ρ y, if the follow- ing condition holds: x[i + 1] − x[i] = ρ(y[i + 1] − y[i]), for all 1 ≤ i ≤ m − 1 . 3 If x, y ∈ Σ + are constant sequences of the same length m ≥ 1, we agree that they are shape-isomorphic with proportionality factor ρ = 0, and write x ≈0 y. We say that two sequences x, y ∈ Σ + are shape-isomorphic, and write x ≈ y, if they are ρ-isomorphic, for some factor ρ ≥ 0. The following lemma lists some very elementary facts concerning shape- and order-isomorphism. In particular, it states that shape-isomorphism is a heredi- tary equivalence relation which is finer than order-isomorphism. Lemma 1. Let x, y, z ∈ Σ + , where Σ = {1, 2, . . . , σ}, and let ρ > 0 and ρ1 , ρ2 ≥ 0. Then the following properties hold: (a) either x ≈0 x or x ≈1 x (b) if x ≈ρ y, then y ≈ ρ1 x; (c) if x ≈ρ1 y and y ≈ρ2 z, then x ≈ρ1 ρ2 z; (d) if x ≈ρ1 y, then either x[i .. j] ≈ρ1 y[i .. j] or x[i .. j] ≈0 y[i .. j], for all 1 ≤ i ≤ j ≤ |x| − 1; (e) ≈ is an equivalence relation over Σ + ; (f ) if x ≈ y, then x ∼ y, i.e., shape-isomorphism is finer than order-isomorphism; (g) if x ≈ y, then x[i .. j] ≈ y[i .. j], for all 1 ≤ i ≤ j ≤ |x| − 1, i.e., shape- isomorphism is hereditary on substrings. By exploiting the characterization contained in the following straightforward lemma, one can easily test in linear time whether two given sequences of the same length are shape-isomorphic. Lemma 2. Let x, z ∈ Σ + be two sequences of the same length m. Then x ≈ z if and only if either x and z are both constant sequences, or x ≈ρ z, where ρ = x[i+1]−x[i] z[i+1]−z[i] , for any 1 ≤ i ≤ m − 1 such that z[i] 6= z[i + 1]. 2.1 The Shape-Preserving Pattern-Matching Problem Next, we formally define the shape-preserving pattern-matching problem. Definition 3 (Shape-Preserving Pattern-Matching Problem). Let x, y ∈ Σ + , where Σ = {1, 2, . . . , σ}, be sequences of length m and n, respectively, such that m ≤ n. The shape-preserving pattern-matching problem consists in finding all shape-occurrences of x in y, namely all positions 1 ≤ i ≤ n − m + 1 such that y[i .. i + m − 1] ≈ x. In this context, x is the pattern and y is the text. By Lemma 1(f), any algorithm for the OPPM problem can be used as a filter to locate all candidate shape-occurrences of a pattern x in a text y. When an order-occurrence of x is found in y, a O(|x|)-time verification procedure based on Lemma 2 can be run to check whether such an order-occurrence is shape- isomorphic to x. Since, for a pattern of length m and a text of length n, the OPPM problem can be solved in time O(n), the algorithm just outlined for the SPPM problem will run in O(n + km)-time, where k is the number of order- occurrences of x in y. 4 y = 8 11 10 15 16 14 17 15 14 18 15 18 12 21 15 25 26 x = 6 8 4 10 6 Fig. 2. A pattern x of length 5 with two shape-occurrences in a text y of length 17: at positions 4 and 11, with proportionality factors 21 and 23 , respectively. In the next sections we present two linear algorithms for the SPPM problem. Our first solution, presented in Section 3, is a two steps algorithm based on a reduction of the SPPM problem to the standard exact string matching problem. Our second solution, presented in Section 4, is a single step algorithm modeled after the well-known Knuth-Morris-Pratt algorithm, whose running time does not dependent on the number of order-occurrences. 3 Reducing SPPM to Exact Pattern Matching In this section we present a first linear algorithms for solving the SPPM prob- lem. Our solution is straightforward and assume that the proportionality ratio between the pattern and its occurrence in the text is a fixed constant ρ ≥ 0. We observe that in this case the problem can be easily reduced, after some suitable transformations of the input strings, to the ordinary string matching problem. To begin we give some additional definitions. Definition 4 (Delta Function). The Delta Function δ() is the fucntion which associates a given input string x with the corresponding sequence of the differ- ences between adjacent characters in x. Formally, for a given string x ∈ Σ ∗ , with |x| = m, we define δ(x) as the numeric sequence of length |x| − 1 such that, for 1 ≤ i ≤ m − 1, δ(x)[i] := x[i + 1] − x[i]. Definition 5 (Last Non-Zero Function). The last non-zero function γ() is the function which associates a given input string x with the sequence of the last non zero element up to each position of x. Formally, for a given string x ∈ Σ ∗ , with |x| = m, we have that γ(x) is a sequence of length m − 1, such that, for 1 ≤ i ≤ m − 1, γ(x)[i] := x[j] where j = max({1 ≤ h < i + 1 : x[h] 6= 0} ∪ {0}) Observe that γ(x)[1] = 0 if and only if x[1] = 0. Definition 6 (Ratios Function). The ratios-function ψ() is the function which associates a given input string x with the sequence of the ratios between its (al- most) adjacent characters. Formally, for a string x ∈ Σ ∗ of length |x| = m we have, for 1 ≤ i ≤ m − 1 x[i + 1] ψ(x)[i] := γ(x)[i] 5 Our algorithm is based on a reduction of the SPPM problem to the exact string matching problem. Such reduction is inspired by the following straight- forward technical lemmas. Specifically the following elementary Lemma 3 de- scribes how to compute shape-isomorphic occurrences for a constant string, while Lemma 5 describes how to compute shape-isomorphic occurrences for a non-constant string. Lemma 3. Let x be a constant string of length m and let y be a text of length n, both strings over the same alphabet Σ. Then we have that x has a shape- isomorphic occurrence at position i of the text, i.e. x ≈ y[i..i + m − 1], if and only if δ(x)[j] = δ(y)[i + j] for 1 ≤ j ≤ m − 1. Lemma 4. Let x and y be two non constant strings over the same alphabet Σ, with |x| = |y| = m and such that x ≈ρ y with a proportionality factor ρ > 0. Then we have that γ(δ(x))[i] = ρ · γ(δ(y))[i], for all 1 ≤ i < m − 1. Lemma 5. Le x and y be two strings of length m > 2 over the same alphabet Σ, and assume x[1] 6= x[2] and y[1] 6= y[2]. Then x ≈ y if and only if ψ(δ(x)) = ψ(δ(y)) and δ(x)[1] × δ(y)[1] > 0. Based on Lemma 5, Algorithm1 depicted in Fig.4 finds all shape-isomorphic occurrences of a given pattern x of length m in a given text y of length n. Dur- ing the preprocessing phase the algorithm performs a partition of the pattern x into two strings, x1 and x2 , where x1 is the constant prefix of the pattern with maximal length, while x2 is the suffix of the pattern with maximal length such that x2 [1] 6= x2 [2]. More formally we compute an index k, such that k = max({j : j > 0 and x[j] 6= x[j + 1]} ∪ {0}), and set x1 = x[1 . . . k] and x2 = x[k . . . m]. The value of k can be computed in linear time in the size of x. In a first step the algorithm computes all shape-isomorphic occurrences of x1 in y. Since x1 is a constant sequence all such occurrences are simply computed (based on Lemma 3) by running a linear exact string matching algorithm in order to search δ(y) for all occurrences of δ(x1 ). We indicate with Γ1 the set of such occurrences. In the event that |x1 | = 1 we skip this step and set Γ1 = {i : 1 ≤ i ≤ n − m}. In a second step the algorithm searches for all shape-isomorphic occurrences of x2 in y by running (based on Lemma 5) a linear exact string matching algo- rithm in order to search γ(δ(y)) for all occurrences of γ(δ(x2 )). We indicate with Γ2 the set of such occurrences. In the event that |x2 | < 2 we skip this step and set Γ2 = {i : 1 ≤ i ≤ n − m}. The last step of the algorithm the two sets of occurrences Γ1 and Γ2 are combined in order to find the set Γ of all positions i such that x ≈ y[i..i + m − 1]. We keep out from Γ2 all values i such that δ(x)[1] × δ(y)[i] < 0 (this is due to the condition ρ >= 0 in the shape-isomorphic Definition 2). Specifically we have Γ = {i − k + 2 : i ∈ Γ2 and (i − k + 2) ∈ Γ1 and δ(x)[1] × δ(y[i]) > 0}. 6 Algorithm1(x, y) 1. m ← |x|; 2. k ← min({j : 1 ≤ j ≤ m − 1 and x[j] 6= x[j + 1]} ∪ {m}); 3. x1 ← x[1..k]; 4. x2 ← x[k..m]; 5. Γ1 ← Γ2 ← {i : 1 ≤ i ≤ n − m}; 6. if(|x1 | > 1) then Γ1 ← occurrences of δ(x1 ) in δ(y); 7. if(|x2 | > 2) then Γ2 ← occurrences of ψ(δ(x2 )) in ψ(δ(y)); 8. for each i ∈ Γ2 do; 9. if(δ(x)[1] × δ(y[i]) > 0 and (i − k + 2) ∈ Γ1 ) then output(i − k + 2); Fig. 3. A linear-time algorithm for the SPPM problem based on a reduction to the exact string matching problem. If we use a linear worst case time exact string matching algorithm for the two steps, then it is trivial to observe that Algorithm1 achieves an O(n + m) worst case time complexity. Observe, however, that the last step of the algorithm depends on the number of occurrences of x1 and x2 , which is always bounded by O(n). 4 A KMP based Algorithm for the SPPM Problem The Knuth-Morris-Pratt algorithm [13] has been the first algorithm to achieve a linear worst-case time complexity for the exact pattern matching problem. It uses a prefix table, also called border table or prefix function, to carry the information which allows one to compute in constant time the length of the longest safe shift when a mismatch occurs or a match is found, thus keeping the number of character comparisons linear in the size of the text. Much in the same way, for the SPPM problem of our interest we shall use a shape-border table, which associates to each position i of a given pattern x the length of the longest proper suffix of x[1 .. i] that is shape-isomorphic to a prefix of x[1 .. i]. The shape-border table for a finite integer sequence x is defined as follows. Definition 7 (Shape-Border Table). Let x be an integer sequence of length m ≥ 1. The shape-border table for x is the map πx : {0, 1, . . . , m} → N defined by πx [i] =Def max j : 1 ≤ j < i and x[1 .. j] ≈ x[i − j + 1 .. i] ∪ {0} . Plainly, for 2 ≤ i ≤ m, we have 1 ≤ πx [i] < i and x 1 .. πx [i] ≈ x i−πx [i]+1 .. i . The latter relationship allows us to readily define a related proportionality factor map %x : {2, 3, . . . , m} → N such that, for 2 ≤ i ≤ m, x 1 .. πx [i] ≈%x [i] x i − πx [i] + 1 .. i . (1) 7 When the sequence x is understood from the context, we shall simply write π and % in place of πx and %x . Using the notation (·)j for map iteration,2 by induction (1) generalizes to x 1 .. π j [i] ≈%[πj−1 [i]] x i − π j [i] + 1 .. i , (2) for all j ≥ 1 such that π j−1 [i] ≥ 2. We also notice that since π[i] < i (for 2 ≤ i ≤ m) and π[1] = π[0] = 0, any sequence of the form π 1 [i], π 2 [i], π 3 [i], . . . starts with a (possibly empty) strictly decreasing substring of positive integers and then continues indefinitely with a sequence of 0s. Next we show that the shape-border table π satisfies the following recursive relation, for 1 ≤ i < m, which, together with (2), will yield to a linear algorithm for computing π: ( π[0] = π[1] = 0n o π[i + 1] = max π j [i] + 1 : x 1 .. π j [i] + 1 ≈ x i − π j [i] + 1 .. i + 1 . (3) j≥1 Let 1 ≤ i < m and let Ai =Def π j [i] + 1 : j ≥ 1 and x 1 .. π j [i] + 1 ≈ x i − π j [i] + 1 .. i + 1 . By the very definition of π, it follows easily that max Ai ≤ π[i + 1] ≤ π[i] + 1. Thus, to prove the correctness of (3), it is enough to show that π[i + 1] ∈ Ai , which we do as follows. If π[i + 1] = 1, then we plainly have π[i + 1] ∈ Ai . Thus, let us assume that π[i + 1] ≥ 2. If π[i + 1] = π[i] + 1, we are done; otherwise, let j be the largest index j ≥ 1 such that π[i + 1] − 1 < π j [i], so that π j+1 [i] ≤ π[i + 1] − 1 < π j [i]. (4) By (2) x 1 .. π j [i] ≈ x i − π j [i] + 1 .. i , and since x 1 .. π[i + 1] − 1 ≈ x i − π[i + 1] + 2 .. i and we have π[i + 1] − 1 < π j [i], by the hereditarity and transitivity of shape-isomorphism it follows that x[1 .. π[i + 1] − 1] is shape-isomorphic to a proper suffix of x[1 .. π j [i]]. Hence, π[i + 1] − 1 ≤ π j+1 [i] which, by (4), implies π[i + 1] = π j+1 [i] + 1, proving that π[i + 1] ∈ Ai , and in turn completing the proof of correctness of (3). ji From (3), π[i j+ 1] = π [i]+ 1 ≤j π[i] − ji + 2, where ji is the smallest j ≥ 1 such that x 1 .. π [i] + 1 ≈ x i − π [i] + 1 .. i + 1 . Thus, to compute π[i + 1], it is enough to test the latter condition ji times, yielding a linear number of tests in m = |x|, since m−1 X m−1 X ji ≤ π[i] − π[i + 1] + 2 = π[1] − π[m] + 2m − 2 ≤ m − 3. i=1 i=1 2 We recall that the operator (·)j for map iteration is defined as follows. Given a map f : A → A, for x ∈ A we put: f 0 (x) = x and, recursively, f j+1 (x) = f (f j (x)). 8 i 1 2 3 4 5 6 7 8 x[i] 4 2 10 6 22 14 13 17 π[i] 0 1 1 2 3 4 2 3 %[i] - 0 0 2 2 2 0.5 0.5 Table 1. The shape-border table and the proportionality factor map for the pattern x = h4, 2, 10, 6, 22, 14, 13, 17i. Compute-Shape-Border-Table(x) 1. π[1] ← 0; π[2] ← 1; %[2] ← 0; 2. for i ← 3 to |x| do 3. k ← i − 1; 4. repeat 5. t ← π[k]; 6. %[i] ← ExtShapeIso x[1 .. t], x[i − t .. i − 1], x[t + 1], x[i], %[k]); 7. k ← π[k]; 8. until %[i] ≥ 0; 9. π[i] ← t + 1; 10. return π, %; ExtShapeIso(x, y, a, b, ρ) 1. m ← |x|; 2. return if ρ = 0 and x[m] = a and y[m] = b then 0 elseif ρ = 0 and (a − x[m]) · (b − y[m]) > 0 then a−x[m] b−y[m] elseif ρ > 0 and (a − x[m]) = ρ · (b − y[m]) then ρ else −1; Fig. 4. The procedure Compute-Shape-Border-Table for computing the shape- border table and the proportional factor map for a pattern x and its subroutine ExtShapeIso. If the values of the proportionality factor map % are also tabulated during the computation of the shape-border table π, it is possible to make each of the tests x 1 .. π j [i] + 1 ≈ x i − π j [i] + 1 .. i + 1 , (5) for j = 1, 2, . . . , ji , in constant time by means of the following procedure call ExtShapeIso x 1 .. π j [i] , x i − π j [i] + 1 .. i , x π j [i] + 1 , x[i + 1], %[π j−1 [i]] (cf. Fig. 4). If the condition (5) is true, then such a call will return the propor- tionality factor %[i + 1] of (5), otherwise it will return the value −1. The above considerations leads to the O(m)-time procedure Compute-Shape- Border-Table reported in Fig. 4 for computing the shape-border table and the proportional factor map for a pattern x of length m. Table 1 reports the shape-border table and the proportionality factor map for the pattern x = h4, 2, 10, 6, 22, 14, 13, 17i. 9 Much in the style of the Knuth-Morris-Pratt algorithm, we show next how to find all shape-occurrences of a pattern x (of length m) in a text y (of length n, with n ≥ m) in time O(n), using the shape-border table and the proportionality factor map for x. The complete algorithm, called Algorithm2, is reported in Fig. 5. Line references in the following discussion are intended to point out to its pseudo-code, even if not explicitly stated. Let us assume that it is known that x[1 .. j] ≈ρ y[i .. i + j − 1] holds, for some ρ ≥ 0, i ≤ n − m, and j ≤ m, and that no further progress in the search for a shape-occurrence of x at position i in y is possible. This means that either j = m, in which case a shape-occurrence of x at position i in y has been found (cf. line 9), or j < m and x[1 .. j + 1] 6≈ y[i .. i + j]. In any case, one needs to examine a new position i0 > i to find a (new) shape-occurrence of x. We claim that if ` > i is any position in y of a shape-occurrence of x, then ` ≥ i + j − π[j], so that it is safe to move to position i0 = i + j − π[j] (cf. line 11). Indeed, if ` < i + j − π[j], then x[1 .. i + j − `] ≈ y[` .. i + j − 1], so that, by the hereditarity and transitivity of shape-isomorphism, we would have x[1 .. i + j − `] ≈ x[` − i + 1 .. j]. Hence, by the very definition of π[j], it would follow that i + j − ` ≤ π[j], a contradiction. Notice that once we move to position i0 = i + j − π[j] in y, we already know that x[1 .. π[j]] ≈ y[i + j − π[j] .. i + j − 1] (for some proportionality factor; see below). Hence, the matching phase relative to position i0 does not need to reconsider again the positions from 1 to π[j] of the pattern x and related positions from i0 to i0 + π[j] − 1 of the text y (cf. line 11). However, to execute efficiently the matching phase by repeated calls to the procedure ExtShapeIso in Fig. 4 (cf. line 5), at each step (even at the first one) one needs to know the proportionality factor of the shape-isomorphic substrings which have been matched so far. From the initial assumption x[1 .. j] ≈ρ y[i .. i + j − 1], we can infer that x[1 .. π[j]] ≈ρ y[i+j −π[j] .. i+j −1] holds only when x[1 .. π[j]] is not a constant sequence. Otherwise, we would have x[1 .. π[j]] ≈0 y[i+j−π[j] .. i+j−1]. Plainly, x[1 .. π[j]] is a constant sequence if and only if %[j] = 0 (cf. line 10). Finally, we point out that the procedure ExtShapeIso, called at line 5, not only allows one to make progress in the matching phase, but it also takes care of updating, if needed, the proportionality factor ρ. The above discussion highlighted the key points needed in a more formal proof of the correctness of the Algorithm2 in Fig. 5 for the SPPM problem. Complexity issues Next we show that the overall time complexity of the while-loop at lines 3–11 of Algorithm2 is O(n), where, as usual, n is the size of the text. Since, as already shown, the call to procedure Compute-Shape-Border-Table at line 1 takes O(m) time, where, again, m is the size of the pattern, and since m ≤ n, it will follow that the total time complexity of Algorithm2 is O(n). Plainly, the time complexity of the while-loop at lines 3–11 is dominated by the number N of calls to procedure ExtShapeIso at line 5 (each of which takes constant time). Let us associate to each such a call ExtShapeIso x[1 .. j], y[i .. i + j − 1], x[j + 1], y[i + j], ρ), 10 Algorithm2(x, y) 1. (π, %) ← Compute-Shape-Border-Table(x) 2. i ← 1; j ← 1; ρ ← 0; 3. while i ≤ |y| − |x| do 4. repeat 5. ρ0 ← ExtShapeIso x[1 .. j], y[i .. i + j − 1], x[j + 1], y[i + j], ρ); 6. if ρ0 ≥ 0 then 7. j ← j + 1; ρ ← ρ0 ; 8. until j = |x| or ρ0 = −1; 9. if j = |x| then write i; //a shape-occurrence has been found 10. if %[j] = 0 then ρ ← 0; 11. i ← i + j − π[j]; j ← π[j]; Fig. 5. A linear-time KMP based algorithm for the SPPM problem. the pair (i, j) of the values of the parameters i and j when the call is made, and form their sequence (i1 , j1 ), (i2 , j2 ), . . . , (iN , jN ), (6) following the same ordering of the calls (so that (i1 , j1 ) = (1, 1)). By a simple inspection of the pseudo-code of Algorithm2, it is easy to see that the sequence (6) enjoys the following two properties: (a) i1 + j1 ≤ i2 + j2 ≤ · · · ≤ iN + jN ≤ n; (b) j` ≥ 1, for all 1 ≤ ` ≤ N . To ease presentation, let us refer to any pair h(i, j), (i0 , j 0 )i of consecutive pairs (i, j), (i0 , j 0 ) in (6) as a transition, and distinguish between increasing transitions, when j 0 = j + 1, and decreasing transitions, when j 0 < j. Let I and D be, respectively, the number of increasing and of decreasing transitions in (6). We plainly have N = I + D + 1, as any transition is either increasing or decreasing. In addition, since j1 = 1, we have D ≤ I, so that N ≤ 2D+1. Finally, for any increasing transition h(i, j), (i0 , j 0 )i, we have 2 ≤ i + j < i0 + j 0 ≤ n, so that I ≤ n − 2. From the latter inequality, we obtain N ≤ 2n − 3, yielding the linear bound O(n) seeked for for the time complexity of our Algorithm2. 5 Conclusions In this paper we introduced a restricted variant of the Order Preserving Pattern Matching problem, called Shape Preserving Pattern Matching. Specifically we say that two non-constant strings x and z of the same length m ≥ 2 are shape- isomorphic if, for some positive factor ρ > 0, we have x[i + 1] − x[i] = ρ(z[i + 1) − z[i]), for 1 ≤ i ≤ m − 1. In most practical applications this restricted variant turns out to be more effective then the original problem. We also provided two linear-time algorithms for such problem, based on a simple, yet non trivial, modification of the Knuth-Morris-Pratt algorithm. Although a fixed ratio between two sequences is rare in practice, this pa- per presents a first step towards a more suitable way to compare two numeric 11 sequences. A more practical approach would be considering a k-approximate ver- sion of the problem or allowing the ratios to belong to some bounded interval. Moreover, we are also interested to extend the dependency of differences in the occurrence from the differences in the pattern not only for a linear function but also for an arbitrary function. References 1. Belazzougui, D., Pierrot, A., Raffinot, M., Vialette, S.: Single and Multiple Consec- utive Permutation Motif Search, In Proc. of ISAAC 2013, LNCS, vol. 8283, 66–77. Springer (2013) 2. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 762–772 (1977) 3. Cantone, D. Faro, S., Külekci, M.O., The order-preserving pattern matching prob- lem in practice, Discret. Appl. Math., vol.274, pp.11–25 (2020) 4. Cantone, D., Faro, S., Külekci, M.O., An Efficient Skip-Search Approach to the Order-Preserving Pattern Matching Problem, in Proc. of Stringology 2015, pp. 22- 35 (2015) 5. Chhabra, T., Faro, S., Külekci, M.O., Tarhio, J., Engineering order-preserving pat- tern matching with SIMD parallelism, Softw. Pract. Exp., vol.47 (5), pp. 731–739 (2017) 6. Chhabra, T., Tarhio, J.: Order-preserving matching with filtration. In: Proc. SEA ’14, 13th International Symposium on Experimental Algorithms. LNCS, vol. 8504, 307–314. Springer (2014) 7. Cho, S., Na, J.C., Park, K., Sim, J.S.: Fast order-preserving pattern matching. In: Widmayer, P., Xu, Y., Zhu, B. (eds.) COCOA 2013. LNCS, vol. 8287, 295–305. Springer (2013) 8. Crochemore, M, Iliopoulos, C.S., Kociumaka, T., Kubica, M., Langiu, A., Pissis, S.P., Radoszewski, J., Rytter, W., Walen, T.: Order-preserving incomplete suffix trees and order-preserving indexes. In: Proc. SPIRE 2013, 20th International Sym- posium. LNCS, vol. 8214, 84–95. Springer (2013) 9. Durian, B., Holub, J., Peltola, H., Tarhio, J.: Improving practical exact string match- ing. Information Processing Letters 110(4): 148–152 (2010) 10. Faro, S., Külekci, M.O., Efficient Algorithms for the Order Preserving Pattern Matching Problem, In Proc. of Algorithmic Aspects in Information and Management - 11th International Conference, AAIM 2016, Lecture Notes in Computer Science, vol.9778, pp.185–196, Springer (2016) 11. Kim, J., Eades, P., Fleischer, R., Hong, S.-H., Iliopoulos, C.S., Park, K., Puglisi, S. J., Tokuyama, T.: Order preserving matching. Theoretical Computer Science 525, 68–79 (2014) 12. Kubica, M., Kulczynski, T., Radoszewski, J., Rytter, W., Walen, T.: A linear time algorithm for consecutive permutation pattern matching. Information Processing Letters 113(12), 430–433 (2013) 13. Knuth, D.E., Morris, J.M., Pratt, V.R.: Fast pattern matching in strings. SIAM Journal on Computing 6(2), 323–350 (1977) 14. Navarro, G., Raffinot, M.: Flexible pattern matching in strings. Practical on-line search algorithms for texts and biological sequences. Cambridge University Press, New York, NY, 2002 12