1 Introduction

This work was partially supported by RFBR (grant

Association Rules Discovery in Multivariate Time Series

0 0 University of St.-Petersburg, Faculty of Mathematics and Mechanics

2007

0 7 07

A problem of association rules discovery in a multivariate time series is considered in this paper. A method for finding interpretable association rules between frequent qualitative patterns is proposed. A pattern is defined as a sequence of mixed states. The multivariate time series is transformed into a set of labeled intervals and mined for frequently occurring patterns. Then these patterns are analyzed to find out which of them occur close to each other frequently. Some modifications and improvements of the method are proposed and discussed.

1 Introduction

Many classes of data that we deal with in our everyday life are temporal in their nature: medical information about patients and their diseases, goods selling data, financial data from stock markets, etc. In most cases they describe development of some variables or objects over time. All these data provide relatively small amount of knowledge by themselves, but much more information can be obtained from objects behavior analysis. Such analysis that aimed at extracting previously unknown rules from temporal data is called Temporal Data Mining or Knowledge Discovery. A comprehensive and detailed overview of temporal data mining techniques can be found, for example, in [ 3 ] and [ 12 ].

Discovered knowledge can be used both to improve understanding of underlying processes and for time series prediction given some observations in the past. In both cases it is important for all found regularities, patterns and rules to be understandable and interpretable by a domain expert. It seems that the best way for this is to create a model of time series behavior (e.g., ARMA or ARIMA models [ 4 ]), but in many cases these models are difficult to understand for a human. Furthermore, for many series, such as stock market data, it is impossible to develop a global model due to their chaotic nature. On the other hand, when someone thinks of the system behavior, he rather keeps in mind some dependencies between typical local development pieces or scenarios. When an expert inspects system behavior over time, usually he tries to find some relatively short and simple pieces of history that occur frequently enough and are easy to interpret. The next natural step is to find out some cause-effect, coincidence or some other dependencies between frequent episodes. For example, he may find that “IF pressure falls and it is summer THEN rain will start in 24 hours with high confidence”. In this paper we propose a method for such association rules discovery from a multivariate temporal sequence.

Automation of this process requires a definition of time series similarity. Frequent episodes obtained with different measures traditionally used for estimating similarity (e.g. Euclidian distance) provide a little information about system behavior that can be interpreted by a human ([ 1 ], [ 5 ]). In this paper we use a different way to discover frequent time series episodes (or patterns). Similar approaches are used, for example, in [ 10 ] and [ 14 ]. This kind of process is believed to be much closer to a process used by a human (see [ 10 ]). The main idea is to divide up the time series into a set of labeled intervals. Each interval is a time interval during which some condition is true in the original series (for example, “a series decreases” or “a patient has flu”). This can be done in different ways: intervals and their labels can be identified empirically or as a result of some automatic process, e.g. short subsequences clustering [ 6 ]. Once we have the initial set of intervals, we build all intersections of all subsets of intervals and add them to our set. Finally we know all segments of time series where one or more simple conditions hold simultaneously.

This paper proposes the method of frequent patterns discovery from the described interval set. A pattern is defined as a sequence of state labels (simple or mixed). More formally it will be defined later. An algorithm of finding such pattern’s occurrences has been developed and is described in the paper.

The paper is organized as follows: Section 2 contains a brief overview of similar works; Section 3 defines a notion of a pattern and describes frequent patterns discovery procedure; in Section 4 a process of association rules generation is briefly outlined; Section 5 is devoted to discussions of the method and some modifications are proposed there. Section 6 contains conclusions and outlines further work directions. matching even if some conditions except ones defined by the pattern are held on it.

2 Related Work

There are a number of approaches dealing with symbol sequences. In [ 6 ] a set of symbols is obtained from a time series by clustering short subsequences extracted with a sliding window. Then the sequence is mined for associations between symbols that are close enough to each other. In [ 13 ] an a priori style algorithm is used to discover frequent Episodes in a symbol sequence. In [ 8 ] authors apply genetic programming for pattern mining and use special hardware for efficient sequence matching.

The following approaches work with interval sequences produced from the time series, for example, using feature extraction. In [ 11 ] a sequence of intervals is searched for containment relations. In [ 7 ] a time series is divided into a set of labeled intervals. Then the interval set is mined for patterns described in terms of Allen’s [ 2 ] interval logic. Interesting patterns are restricted to A1 patterns, where operators can be appended only on the right hand side. Another method that uses Allen’s operators for pattern definition is proposed in [ 10 ]. A set of labeled intervals is obtained from a multivariate time series and mined for patterns with a sliding window to restrict pattern length. Then patterns are filtered by their interestingness using Jmeasure [ 15 ].

The last mentioned method is close to the proposed approach, but its has two main disadvantages. The first is that pattern’s frequency strongly depends on the width of the sliding window. If an occurrence of a pattern is a little longer than the window, it’s not taken into account at all. The second disadvantage is that patterns in [ 10 ] are expressed in terms of interval logic, that makes them difficult to understand.

The approach proposed in this paper solves both of these problems. It uses “windows” of all widths, that significantly smooths the difference between frequencies of patterns with instances of close lengths. Patterns are expressed in terms of simultaneous states – in the way that is closer to the human reasoning. The author believes that a description like “pressure increases and temperature decreases for 1 hour, and then (may be after some time) pressure slightly decreases” is more natural then “an interval where pressure increases is overlapped by an interval where temperature decreases, and the latter one is met by an interval where pressure slightly decreases”. Moreover, since the first statement involves only important sequence pieces, it matches more segments of the time series than the second one.

Ther is another work ([ 14 ]) that uses an idea of simultaneous states, but in rather simplified form: it’s authors state that a labelled interval matches an “Event” (a set of simultaneously holding states) if a set of states holding on this interval exactly coincides with the Event. The method proposed in the current paper uses more flexible approach that allows to treat an interval as

3 Frequent Patterns Discovery

The frequent patterns discovery procedure includes three general steps: 1. Extract a basic set of simple intervals from the time series (i.e. intervals labeled with simple states); 2. Convert an initial time series into a sequence of nonoverlapping intervals labeled with mixed states; 3. Mine the sequence for frequent patterns.

This section gives a definition of a pattern and describes all these stages in details.

3.1 Basic Interval Set

As was mentioned above, to be able to discover time series behavior qualitative patterns, we need to extract a set of labeled intervals from the series. Let’s denote as S0 a set of all simple states or conditions based on which we divide our series into intervals (for example, “a patient has a high temperature” or “humidity goes up”). States in S0 are not required to be mutually exclusive, so intervals of the resulting set may overlap. This fact allows us not to restrict ourselves with only single univariate time series, but freely work with multivariate time series or even with several ones as well. Thus we obtain a set I0 from our time series – a set of intervals labeled with simple states:

I0 = {(s1, b1, e1), (s2, b2, e2), … , (sn, bn, en)} (1)

Here (si, bi, ei) denotes an interval labeled with a state si ∈ S0, beginning at bi and ending at ei. All these intervals are required to be maximal intervals. This means that in I0 there are no adjacent intervals or intervals with non-empty intersection labeled with the same state. I.e.: ∀ i=1..n, j=1..n: si = sj => ej < bi ∨ ei < bj (2)

For the next step we need to order states of S0 in some way. A choice of ordering procedure is not very important. For example, states may be ordered lexicographically according to names of their labels or somehow else. This is needed only to enable ordering of all intervals of I0 according to the rules below: ∀ (si, bi, ei), (sj, bj, ej) ∈ I0: • If bi < bj then (si, bi, ei) < (sj, bj, ej); • If bi > bj then (si, bi, ei) > (sj, bj, ej); • If bi = bj then: − If si < sj then (si, bi, ei) < (sj, bj, ej); − If si > s j then (si, bi, ei) > (sj, bj, ej); − Note that si = sj is impossible due to the maximality requirement.

I0 – is a set of maximal simple intervals – that is ordered as described above, we call it a basic interval set. An example is shown on the Fig. 1. Here S0 = {A, B, C, D} and I0 = {(A, 1, 4), (B, 2, 3), (C, 2, 6), (D, 3, 5)}.

A B

C 4 D 1 2 3 5 6 As was mentioned above, the main goal of this method is to find association rules for patterns expressed in terms of simultaneous or mixed states. A mixed state is any subset of S0. An interval i is labeled with a mixed state s = {s1 , …, sn} if: i = i1 ∩ i2 ∩ … ∩ in: ik ∈ I0 and ik is labeled with sk, k = 1..n (3)

In other words, if two or more simple intervals overlap, then their intersection is labeled with a mixed state, composed of all simple state labels of these intervals. Since I0 consists of maximal intervals, all simple states in a mixed state are different.

For further description of patterns and matching procedure we need to define a sequence I that will be searched for pattern occurrences. Let D = {d1, d2, …, dm} be an ordered sequence of all different beginning and ending points of I0 intervals. I consists of all intervals (s, dk, dk+1), where s is a mixed state that includes all simple states holding during this time interval. For example, Fig. 2 (a) illustrates I for intervals shown on Fig. 1: D = {1, 2, 3, 4, 5, 6}; I = (({A}, 1, 2), ({A, B, C}, 2, 3), ({A, C, D}, 3, 4), ({C, D}, 4, 5), ({C}, 5, 6))

Note that I consists of maximal intervals due to maximality of all intervals of I0.

A pattern is defined as a sequence of mixed states. For example, <{A}, {A, B}, {C}> denotes a pattern that consists of mixed states {A}, {A, B} and {C}, where A, B and C are simple states.

Consider: − a pattern P = <ps1, …, psn> − a subsequence Q of I: Q = ((s1, b1, e1), (s2, b2, e2), … , (sm, bm, em)), where (sk, bk, ek) ∈ I and ek ≤ bk+1.

An algorithm that matches P and Q is outlined below. After successful matching the sequence Q is broken into n groups of consecutive intervals so that kth group is associated with psk. Q matches P only if u > n, v > m and all intervals of Q are associated with states of P when this process finishes.

u := 1; v := 1; while (u ≤ n and v ≤ m and psu while ((v = 1 or ev-1 = bv or (sv-1,bv-1,ev-1) is

associated with psu-1) and v ≤ m and psu ⊆ sv ⊆ sv) and (u = n or psu+1 { sv or psu+1 ⊆ psu)) associate (sv,bv,ev) with psu; v := v + 1; end while u := u + 1; end while.

There are some facts that should be noted: 1. If an interval (sk, bk, ek) belongs to a group associated with psu, then psu ⊆ sk. 2. If ek < bk+1, then (sk, bk, ek) and (sk+1, bk+1, ek+!) cannot be associated with the same element of P. I. e. non-adjacent intervals cannot be associated with the same element of P. 3. If psu+1 ⊆ psu, (sk-1, bk-1, ek-1) is associated with psu and psu+1 ⊆ sk, then (sk, bk, ek) will be associated with psu+1 only if psu {

sk. This is due to semantics of such patterns. A pattern <{A, B}, {A}> means that at first conditions A and B must hold for some time simultaneously and then there must be some time when A is true and B is false. If an existence of one of these intervals is not important, then the pattern is to be reduced to <{A}> or <{A, B}>.

Fig. 2 (b, c) shows how a sequence ({A}, {A, B, C}, {A, C, D}, {C, D}, {D}) matches patterns <{A}, {A, B}, {C, D}, {D}> and <{A}, {B}, {C}, {D}>.

a) b) c)

A A A

ABC AB B

ACD

CD C

D D

To find all subsequences of I that match P – occurrences of P – the following procedure is used: 1. Let u = 1. 2. Find and add to the resulting subsequence the earliest interval (sk(u), bk(u), ek(u)) so that psu ⊆ sk(u). 3. If psu+1 ⊆ psu, then find the maximum d(u) so that (sk(u), bk(u), ek(u)), (sk(u)+1, bk(u)+1, ek(u)+1), …, (sk(u)+d(u), bk(u)+d(u), ek(u)+d(u)) are adjacent intervals and psu ⊆ sk(u), psu ⊆ sk(u)+1, …, psu ⊆ sk(u)+d(u). Add these intervals to the resulting subsequence. 4. If psu+1 {

psu, then find the maximum d(u) so that (sk(u), bk(u), ek(u)), (sk(u)+1, bk(u)+1, ek(u)+1), …, (sk(u)+d(u), bk(u)+d(u), ek(u)+d(u)) are adjacent intervals and psu ⊆ sk(u), psu(u) ⊆ sk(u)+1, …, psu ⊆ sk(u)+d(u) and psu+1 { sk(u), psu+1 { sk(u)+1, …, psu+1 { sk(u)+d(u). Add these intervals to the resulting subsequence. 5. Repeat steps 2 – 4 with the next u and all intervals beginning later than ek(u)+d(u) until u > n or the end of I is reached. If all elements of P are processed, then the resulting sequence matches P. 6. Repeat steps 1 – 5 with all intervals beginning later than ek(1)+d(1) until the end of I is reached.

3.3 Frequent Patterns Discovery

In this section the procedure of frequent patterns discovery is described.

Let a time series under consideration be of length L. Consider a time “window” of width w ≤ L that overlaps [0, L]. There are (L + w) different windows of width w. Consider a set of all different windows of length ≤ L. This set is of size 3L2/2.

Consider a pattern P of cardinality n and a set of all intervals {[pb1, pe1], [pb2, pe2], …, [pbr, per]} so that for each 1 ≤ i ≤ r: − pbi < pbi+1 (if i < r); − there is a subsequence ((s1, b1, e1), (s2, b2, e2), …, (sm, bm, em)) of I matching P so that: • if n ≥ 2, then: − ex = pbi and by = pei (where (sx, bx, ex) is the last interval associated with ps1, and (sy, by, ey) is the first interval associated with psn during matching); − there are no shorter (i.e. with less |by – ex|) subsequences of I matching P within [pbi, pei]. • if n = 1, then pbi = b1 and pei = e1. n ≥ 2: n = 1:

r ⎛ supp(P) = ∑ ⎜⎜ ( pbk − pbk−1)(L − plk ) − k=1 ⎝ ( pbk − pbk−1)2 ⎞⎟ 2 ⎟⎠ (4) supp(P) = ∑⎜⎜( pek − pek−1)(L + plk ) − ( pek − pek−1) − plk 2 ⎞⎟ r ⎛ 2 k=1 ⎝ 2 2 ⎠⎟ supp(P) – a support of P – is a set of all windows of length ≤ L that contain at least one of [pbi, pei] (if n ≥ 2) or overlap with at least one of [pbi, pei] (if n = 1), 1 ≤ i ≤ r.

This definition is similar to a support definition given in [ 10 ], but considers “sliding windows” of any length between 0 and L, that allows not to limit a length of a pattern occurrence. In common words it can be expressed like this: if we take any window of length ≤ L, and consider it as a time series, and try to find occurrences of P in it, and succeed, then this window is an element of supp(P).

An important property of these definitions is that any subpattern of a frequent pattern is also frequent. This enables the following procedure of frequent pattern discovery: find all frequent patterns of length 1, then construct patterns of length k+1 by appending one mixed state to frequent patterns of length k and get a set Fk+1 – a set of all frequent patterns of length k+1. The process finishes when no new frequent patterns can be found. In more details the procedure is described below:

Step 1: Scan I and find all patterns <ps> of length 1 so that there exists an interval (s, b, e) ∈ I with ps ⊆ s. Add all frequent patterns to F1.

Step k+1: For each P=<ps1, …, psk> ∈ Fk find all patterns Q=<qs1, …, qsk> ∈ Fk so that qs1 = ps2, …, qsk-1 = psk, and generate patterns <ps1, …, psk, qsk> from P and each described Q. Test all generated patterns and add all frequent ones to Fk+1.

4 Association Rules Discovery

Once all frequent patterns are discovered, we can try to find association rules. Let’s denote a rule as A B. But first of all we need to define which patterns will be considered as associated. This must be some condition (cond) describing how antecedent pattern (A) and succedent pattern (B) are disposed. For example, this can be “B starts during A and ends within 2 hours after A ends” or “B starts within 5 minutes after A ends”. This condition is to be chosen by a domain expert so that discovered rules would be easy to interpret.

Consider all occurrences of A and B that satisfy condition cond. Let suppA B(A) and suppA B(B) be a supports of A and B respectively, that are built basing only on such occurrences. Support of A B is defined large set of relatively useless rules. For example, if occurrences of B cover almost all time series, then all rules with B as a succedent will be selected, but most of them will be of small interest.

⎛ ⎛1− p(B | A) ⎞⎟⎞⎟ J(B; A) = p(A)⎜⎜⎝ p(B | A)log⎜⎛⎝ p(pB( B|)A) ⎞⎟⎠ + (1− p(B | A))log⎜⎝ 1− p(B) ⎠⎟⎠ (6) [ 9 ] gives a survey of the most successful interestingness measures used in knowledge discovery. One of the most popular measures for association rules evaluation is a J-measure [ 15 ] (see (6)). It is a special case of Shannon’s mutual information and takes into account both rule frequency and rule “surprisingness”. If J-measure of a rule is less than some threshold value, then it is not interesting enough and should be excluded from consideration.

In our context p(B) is a probability of pattern B occurring in randomly selected window of length ≤ L, and p(A B) is a probability of the fact that B occurs in a randomly selected window of length ≤ L that contains an occurrence of A and cond is true for these occurrences. The probabilities are evaluated as: supp(B) There are a number of issues that should be discussed about the proposed method. One of them is a number of mixed states. In general case, having N simple states, we obtain 2N possible mixed states. But in practice some of them cannot take place simultaneously (or can hold only simultaneously) due to some reason. For example, consider 3 time series: pressure, temperature and humidity data, and 6 simple states for each of them: “increase”, “slow increase”, “fast increase”, “decrease”, “slow decrease”, “fast decrease”. There are 23*6 possible mixed states, but it is obvious that slowly increasing and fast increasing segments of the same time series cannot overlap, or decreasing and fast decreasing segments of the series always overlap, and so on. Thus the number of possible mixed states is reduced to 73 (or even 53 if any decreasing/increasing segment is either fast or slowly decreasing/increasing). Moreover, possibly not all of them have support exceeding suppmin, so they will be removed from consideration after the first step of frequent patterns discovery procedure.

Another problem is very short mixed-state intervals appearing due to inaccurate initial time series division (especially if it was performed by a human). On Fig. 3 (a) the bounds of intervals labeled with simple states A and B were defined roughly, and this resulted in a small interval labeled with a mixed state {A, B}. A simple solution is to ignore intervals shorter than some threshold ε when searching for pattern occurrences. For example, if l < ε, then a sequence from the Fig. 3 (a) does not match a pattern <{A}, {A, B}, {B}>. The same idea can be applied to situation when there is a small gap between intervals labeled with the same state (see Fig. 3 (b)). Such gaps may appear, for example, due to smoothing faults. In this case it may be ignored during patterns discovery and both intervals may be considered as one continuous interval.

But what if the user wants to have patterns that exactly define which intervals must be adjacent and which may have a gap between them in a matching sequence? For example, he wants to extend pattern <{A}, {B}, {C}> with some information like “in a matching sequence {A}-interval must be met by {B}interval, but there may be a gap between {B}-interval and {C}-interval”. Such patterns can be enabled by introducing of a special symbol “gap” ( ). A pattern described above can be expressed like <{A}, {B}, {C}>. “Gap” is not a mixed state, so pattern cardinality does not change, but a space of all candidate patterns increases. 10% 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 12% 14% 16% 18% 20% 22% 24% 26% 28% 30%

6 Conclusion and Future Work

This paper proposes a method and an algorithm for frequent patterns and rules discovery in labeled interval sequence. The interval sequence can be obtained, for example, from multivariate time series by dividing each variable’s behavior into labeled intervals. Since pattern is a sequence of mixed states, a domain expert can easily interpret all found frequent patterns and rules. Inaccuracy of series division and data smoothing defects can be compensated by filtering intervals with minimum interval length. The method was tested over simulated interval set and dependency of a frequent patterns number from minimum support and minimum interval length were investigated.

The problem of the proposed method is a large number of generated frequent patterns and rules. A Jmeasure can help to reduce it, but there still remains a problem of ranking patterns by their interpretability and “importance”. Also an effective algorithm for frequent patterns discovery needs to be developed. Although an initial interval set is converted to a simple sequence of mixed state intervals and a pattern is either a mixed states sequence, classical methods of finding subsequence in a sequence are not suitable for pattern matching, since this process is rather complicated.

[1] Agrawal , R. , Lin , K.-L. , Sawhney , H.S. , Shim , K. : Fast Similarity Search in The Presence of Noise, Scaling and Translation in Time-Series Databases . In: Proc. of the 21st Int. Conf. on Very Large Databases , Zurich, Switzerland ( 1995 )

[2] Allen , J.F. : Maintaining knowledge about temporal intervals . Communications of the ACM , 26 ( 11 ) ( 1983 ) 832 - 843

[3] Antunes , C. , Oliveira , A. : Temporal Data Mining: an Overview . In: KDD Workshop on Temporal Data Mining , San Francisco ( 2001 ) 1 - 13

[4] Box , G.E.P. , Jenkins , G.M. : Time Series Analysis: Forecasting and Control . San-Francisco, Holden-Day ( 1976 )

[5] Brendt , D.J. , Clifford , J.: Finding Patterns in Time Series: A Dynamic Programming Approach . In: Advances in Knowledge Discovery And Data Mining, chapter 9 . MIT Press ( 1996 ) 229 - 248

[6] Das , G. , Lin , K.-I. , Mannila , H. , Renganathan , G. , Smyth , P. : Rule Discovery from Time Series . In: Proc. of the 4th Int. Conf. on Knowledge Discovery and Data Mining , AAAI Press ( 1998 ) 16 - 22

[7] Fu , A.W.C. , Kam , P.S. : Discovering Temporal Patterns for Interval-Based Events . In: Kambayashi, Y. , Mohania , M.K. , Tjoa , A.M., eds.: Second International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2000 ). Volume 1874 ., London, UK, Springer ( 2000 ) 317 - 326

[8] Hetland , M.L. , Saetrom , P. : Temporal Rule Discovery Using Genetic Programming and Specialized Hardware . In: Proc. of 4th Int. Conf. on Recent Advances in Soft Computing ( 2002 ) 182 - 188

[9] Hilderman , R.J. , Hamilton , H.J. : Knowledge discovery and interestingness measures: A survey . Technical Report CS 99-04 , Department of Computer Science, University of Regina ( 1999 )

[10] Höppner , F. : Discovery of Temporal Patterns - Learning Rules about the Qualitative Behavior of Time Series . In: Proc. of the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases, Lecture Notes in Artificial Intelligence 2168 , Springer ( 2001 )

[11] Hua , K.A. , Maulik , B. , Tran , D. , Villafane , R.: Mining Interval Time Series. In: Data Warehousing and Knowledge Discovery . ( 1999 ) 318 - 330

[12] Lin , W. , Orgun , M.A. , Williams , G.J.: An Overview of Temporal Data Mining . In: Proceedings of the 1st Australian Data Mining Workshop , Canberra, Australia ( 2002 )

[13] Mannila , H. , Toivonen , H. , Verkamo , A.I. : Discovery of Frequent Episodes in Event Sequences . Data Mining and Knowledge Discovery 1 ( 1997 ) 259 - 289

[14] Moerchen , F. , Ultsch , A. : Mining Hierarchical Temporal Patterns in Multivariate Time Series . In: Proc. of the 27th German Conference on Artificial Intelligence (KI) . Ulm, Germany ( 2004 )

[15] Smyth , P. , Goodman , R. M. Rule Induction Using Information Theory . In: Knowledge Discovery in Databases, chapter 9 . MIT Press ( 1991 ) 159 - 176