<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Association Rules Discovery in Multivariate Time Series ♣</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Elena</forename><surname>Lutsiv</surname></persName>
							<email>eluciv@math.spbu.ru</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Mechanics</orgName>
								<orgName type="institution">University of St.-Petersburg</orgName>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Colloquium On Database and Information Systems SYRCoDIS</orgName>
								<address>
									<postCode>2007</postCode>
									<settlement>Moscow</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Association Rules Discovery in Multivariate Time Series ♣</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">006A9B188068CCA7ADE24E8FC9EDF0B5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T02:43+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>A problem of association rules discovery in a multivariate time series is considered in this paper. A method for finding interpretable association rules between frequent qualitative patterns is proposed. A pattern is defined as a sequence of mixed states. The multivariate time series is transformed into a set of labeled intervals and mined for frequently occurring patterns. Then these patterns are analyzed to find out which of them occur close to each other frequently. Some modifications and improvements of the method are proposed and discussed.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Many classes of data that we deal with in our everyday life are temporal in their nature: medical information about patients and their diseases, goods selling data, financial data from stock markets, etc. In most cases they describe development of some variables or objects over time. All these data provide relatively small amount of knowledge by themselves, but much more information can be obtained from objects behavior analysis. Such analysis that aimed at extracting previously unknown rules from temporal data is called Temporal Data Mining or Knowledge Discovery. A comprehensive and detailed overview of temporal data mining techniques can be found, for example, in <ref type="bibr" target="#b2">[3]</ref> and <ref type="bibr" target="#b11">[12]</ref>.</p><p>Discovered knowledge can be used both to improve understanding of underlying processes and for time series prediction given some observations in the past. In both cases it is important for all found regularities, patterns and rules to be understandable and interpretable by a domain expert. It seems that the best way for this is to create a model of time series behavior (e.g., ARMA or ARIMA models <ref type="bibr" target="#b3">[4]</ref>), but in many cases these models are difficult to understand for a human. Furthermore, for many series, such as stock market data, it is impossible to develop a global model due to their chaotic nature. On the other hand, when someone thinks of the system behavior, he rather keeps in mind some dependencies between typical local development pieces or scenarios. When an expert inspects system behavior over time, usually he tries to find some relatively short and simple pieces of history that occur frequently enough and are easy to interpret. The next natural step is to find out some cause-effect, coincidence or some other dependencies between frequent episodes. For example, he may find that "IF pressure falls and it is summer THEN rain will start in 24 hours with high confidence". In this paper we propose a method for such association rules discovery from a multivariate temporal sequence.</p><p>Automation of this process requires a definition of time series similarity. Frequent episodes obtained with different measures traditionally used for estimating similarity (e.g. Euclidian distance) provide a little information about system behavior that can be interpreted by a human ( <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b4">[5]</ref>). In this paper we use a different way to discover frequent time series episodes (or patterns). Similar approaches are used, for example, in <ref type="bibr" target="#b9">[10]</ref> and <ref type="bibr" target="#b13">[14]</ref>. This kind of process is believed to be much closer to a process used by a human (see <ref type="bibr" target="#b9">[10]</ref>). The main idea is to divide up the time series into a set of labeled intervals. Each interval is a time interval during which some condition is true in the original series (for example, "a series decreases" or "a patient has flu"). This can be done in different ways: intervals and their labels can be identified empirically or as a result of some automatic process, e.g. short subsequences clustering <ref type="bibr" target="#b5">[6]</ref>. Once we have the initial set of intervals, we build all intersections of all subsets of intervals and add them to our set. Finally we know all segments of time series where one or more simple conditions hold simultaneously.</p><p>This paper proposes the method of frequent patterns discovery from the described interval set. A pattern is defined as a sequence of state labels (simple or mixed). More formally it will be defined later. An algorithm of finding such pattern's occurrences has been developed and is described in the paper.</p><p>The paper is organized as follows: Section 2 contains a brief overview of similar works; Section 3 defines a notion of a pattern and describes frequent patterns discovery procedure; in Section 4 a process of association rules generation is briefly outlined; Section 5 is devoted to discussions of the method and some</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>There are a number of approaches dealing with symbol sequences. In <ref type="bibr" target="#b5">[6]</ref> a set of symbols is obtained from a time series by clustering short subsequences extracted with a sliding window. Then the sequence is mined for associations between symbols that are close enough to each other. In <ref type="bibr" target="#b12">[13]</ref> an a priori style algorithm is used to discover frequent Episodes in a symbol sequence. In <ref type="bibr" target="#b7">[8]</ref> authors apply genetic programming for pattern mining and use special hardware for efficient sequence matching.</p><p>The following approaches work with interval sequences produced from the time series, for example, using feature extraction. In <ref type="bibr" target="#b10">[11]</ref> a sequence of intervals is searched for containment relations. In <ref type="bibr" target="#b6">[7]</ref> a time series is divided into a set of labeled intervals. Then the interval set is mined for patterns described in terms of Allen's <ref type="bibr" target="#b1">[2]</ref> interval logic. Interesting patterns are restricted to A1 patterns, where operators can be appended only on the right hand side. Another method that uses Allen's operators for pattern definition is proposed in <ref type="bibr" target="#b9">[10]</ref>. A set of labeled intervals is obtained from a multivariate time series and mined for patterns with a sliding window to restrict pattern length. Then patterns are filtered by their interestingness using Jmeasure <ref type="bibr" target="#b14">[15]</ref>.</p><p>The last mentioned method is close to the proposed approach, but its has two main disadvantages. The first is that pattern's frequency strongly depends on the width of the sliding window. If an occurrence of a pattern is a little longer than the window, it's not taken into account at all. The second disadvantage is that patterns in <ref type="bibr" target="#b9">[10]</ref> are expressed in terms of interval logic, that makes them difficult to understand.</p><p>The approach proposed in this paper solves both of these problems. It uses "windows" of all widths, that significantly smooths the difference between frequencies of patterns with instances of close lengths. Patterns are expressed in terms of simultaneous statesin the way that is closer to the human reasoning. The author believes that a description like "pressure increases and temperature decreases for 1 hour, and then (may be after some time) pressure slightly decreases" is more natural then "an interval where pressure increases is overlapped by an interval where temperature decreases, and the latter one is met by an interval where pressure slightly decreases". Moreover, since the first statement involves only important sequence pieces, it matches more segments of the time series than the second one.</p><p>Ther is another work ( <ref type="bibr" target="#b13">[14]</ref>) that uses an idea of simultaneous states, but in rather simplified form: it's authors state that a labelled interval matches an "Event" (a set of simultaneously holding states) if a set of states holding on this interval exactly coincides with the Event. The method proposed in the current paper uses more flexible approach that allows to treat an interval as matching even if some conditions except ones defined by the pattern are held on it.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Frequent Patterns Discovery</head><p>The frequent patterns discovery procedure includes three general steps: 1. Extract a basic set of simple intervals from the time series (i.e. intervals labeled with simple states); 2. Convert an initial time series into a sequence of nonoverlapping intervals labeled with mixed states; 3. Mine the sequence for frequent patterns. This section gives a definition of a pattern and describes all these stages in details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Basic Interval Set</head><p>As was mentioned above, to be able to discover time series behavior qualitative patterns, we need to extract a set of labeled intervals from the series. Let's denote as S 0 a set of all simple states or conditions based on which we divide our series into intervals (for example, "a patient has a high temperature" or "humidity goes up"). States in S 0 are not required to be mutually exclusive, so intervals of the resulting set may overlap. This fact allows us not to restrict ourselves with only single univariate time series, but freely work with multivariate time series or even with several ones as well. Thus we obtain a set I 0 from our time series -a set of intervals labeled with simple states:</p><formula xml:id="formula_0">I 0 = {(s 1 , b 1 , e 1 ), (s 2 , b 2 , e 2 ), … , (s n , b n , e n )}<label>(1)</label></formula><p>Here (s i , b i , e i ) denotes an interval labeled with a state s i ∈ S 0 , beginning at b i and ending at e i . All these intervals are required to be maximal intervals. This means that in I 0 there are no adjacent intervals or intervals with non-empty intersection labeled with the same state. I.e.: ∀ i=1..n, j=1..n: s i = s j =&gt; e j &lt; b i ∨ e i &lt; b j <ref type="bibr" target="#b1">(2)</ref> For the next step we need to order states of S 0 in some way. A choice of ordering procedure is not very important. For example, states may be ordered lexicographically according to names of their labels or somehow else. This is needed only to enable ordering of all intervals of I 0 according to the rules below: <ref type="figure">, (s j , b j , e j</ref> ) ∈ I 0 :</p><formula xml:id="formula_1">∀ (s i , b i , e i )</formula><formula xml:id="formula_2">• If b i &lt; b j then (s i , b i , e i ) &lt; (s j , b j , e j ); • If b i &gt; b j then (s i , b i , e i ) &gt; (s j , b j , e j ); • If b i = b j then: − If s i &lt; s j then (s i , b i , e i ) &lt; (s j , b j , e j );</formula><p>− If s i &gt; s j then (s i , b i , e i ) &gt; (s j , b j , e j ); − Note that s i = s j is impossible due to the maximality requirement. I 0 -is a set of maximal simple intervals -that is ordered as described above, we call it a basic interval set. An example is shown on the Fig. <ref type="figure" target="#fig_0">1</ref>. Here S 0 = {A, B, C, D} and I 0 = {(A, 1, 4), (B, 2, 3), (C, 2, 6), (D, 3, 5)}. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Patterns</head><p>As was mentioned above, the main goal of this method is to find association rules for patterns expressed in terms of simultaneous or mixed states. A mixed state is any subset of S 0 . An interval i is labeled with a mixed state s = {s 1 , …, s n } if:</p><formula xml:id="formula_3">i = i 1 ∩ i 2 ∩ … ∩ i n : i k ∈ I 0 and i k is labeled with s k , k = 1..n<label>(3)</label></formula><p>In other words, if two or more simple intervals overlap, then their intersection is labeled with a mixed state, composed of all simple state labels of these intervals. Since I 0 consists of maximal intervals, all simple states in a mixed state are different.</p><p>For further description of patterns and matching procedure we need to define a sequence I that will be searched for pattern occurrences. Let D = {d 1 , d 2 , …, d m } be an ordered sequence of all different beginning and ending points of I 0 intervals. I consists of all intervals (s, d k , d k+1 ), where s is a mixed state that includes all simple states holding during this time interval. For example, Fig. <ref type="figure" target="#fig_4">2</ref>  Note that I consists of maximal intervals due to maximality of all intervals of I 0 .</p><p>A pattern is defined as a sequence of mixed states. For example, &lt;{A}, {A, B}, {C}&gt; denotes a pattern that consists of mixed states {A}, {A, B} and {C}, where A, B and C are simple states. An algorithm that matches P and Q is outlined below. After successful matching the sequence Q is broken into n groups of consecutive intervals so that kth group is associated with ps k . Q matches P only if u &gt; n, v &gt; m and all intervals of Q are associated with states of P when this process finishes. There are some facts that should be noted: 1. If an interval (s k , b k , e k ) belongs to a group associated with ps u , then ps u ⊆ s k . cannot be associated with the same element of P. I. e. non-adjacent intervals cannot be associated with the same element of P. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">If ps</head><formula xml:id="formula_4">u+1 ⊆ ps u , (s k-1 , b k-1 , e k-</formula><formula xml:id="formula_5">ABC ACD D CD a) b) A A B D CD c) A B D C A</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 2. Pattern matching</head><p>To find all subsequences of I that match Poccurrences of P -the following procedure is used: 1. Let u = 1. 2. Find and add to the resulting subsequence the earliest interval (s k(u) , b k(u) , e k(u) ) so that ps u ⊆ s k(u) . 3. If ps u+1 ⊆ ps u , then find the maximum d(u) so that</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>(s k(u) , b k(u) , e k(u) ), (s k(u)+1 , b k(u)+1 , e k(u)+1 ), …, (s k(u)+d(u) ,</head><p>b k(u)+d(u) , e k(u)+d(u) ) are adjacent intervals and ps u ⊆ s k(u) , ps u ⊆ s k(u)+1 , …, ps u ⊆ s k(u)+d(u) . Add these intervals to the resulting subsequence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">If ps u+1</head><p>ps u , then find the maximum d(u) so that</p><formula xml:id="formula_6">(s k(u) , b k(u) , e k(u) ), (s k(u)+1 , b k(u)+1 , e k(u)+1 ), …, (s k(u)+d(u) ,</formula><p>b k(u)+d(u) , e k(u)+d(u) ) are adjacent intervals and ps u ⊆ s k(u) , ps u(u) ⊆ s k(u)+1 , …, ps u ⊆ s k(u)+d(u) and ps u+1 s k(u) , ps u+1 s k(u)+1 , …, ps u+1 s k(u)+d(u) . Add these intervals to the resulting subsequence. 5. Repeat steps 2 -4 with the next u and all intervals beginning later than e k(u)+d(u) until u &gt; n or the end of I is reached. If all elements of P are processed, then the resulting sequence matches P. 6. Repeat steps 1 -5 with all intervals beginning later than e k( <ref type="formula" target="#formula_0">1</ref>)+d <ref type="bibr" target="#b0">(1)</ref> until the end of I is reached.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Frequent Patterns Discovery</head><p>In this section the procedure of frequent patterns discovery is described. Let a time series under consideration be of length L. Consider a time "window" of width w ≤ L that overlaps [0, L]. There are (L + w) different windows of width w. Consider a set of all different windows of length ≤ L. This set is of size 3L 2 /2.</p><p>Consider a pattern P of cardinality n and a set of all intervals {[pb 1 , pe 1 ], [pb 2 , pe 2 ], …, [pb r , pe r ]} so that for each 1 ≤ i ≤ r: </p><formula xml:id="formula_7">− pb i &lt; pb i+1 (if i &lt;</formula><formula xml:id="formula_8">∑ = − − ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − − − − = r k k k k k k pb pb pl L pb pb P supp 1 2 1 1 2 ) ( ) )( ( ) ( n = 1: ∑ = − − ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − − − + − = r k k k k k k k pl pe pe pl L pe pe P supp 1 2 2 1 1 2 2 ) ( ) )( ( ) (<label>(4)</label></formula><p>supp(P) -a support of P -is a set of all windows of length ≤ L that contain at least one of [pb i , pe i ] (if n ≥ 2) or overlap with at least one of [pb i , pe i ]</p><formula xml:id="formula_9">(if n = 1), 1 ≤ i ≤ r.</formula><p>This definition is similar to a support definition given in <ref type="bibr" target="#b9">[10]</ref>, but considers "sliding windows" of any length between 0 and L, that allows not to limit a length of a pattern occurrence. In common words it can be expressed like this: if we take any window of length ≤ L, and consider it as a time series, and try to find occurrences of P in it, and succeed, then this window is an element of supp(P).</p><p>|supp(P)| is a cardinality of supp(P). A pattern is called frequent if |supp(P)| ≥ supp min . Assuming pb 0 = -L + pe 1 and pe 0 = -L + pb 1 , ( <ref type="formula" target="#formula_8">4</ref>) is a formula for |supp(P)| evaluation.</p><p>An important property of these definitions is that any subpattern of a frequent pattern is also frequent. This enables the following procedure of frequent pattern discovery: find all frequent patterns of length 1, then construct patterns of length k+1 by appending one mixed state to frequent patterns of length k and get a set F k+1 -a set of all frequent patterns of length k+1. The process finishes when no new frequent patterns can be found. In more details the procedure is described below:</p><p>Step 1: Scan I and find all patterns &lt;ps&gt; of length 1 so that there exists an interval (s, b, e) ∈ I with ps ⊆ s.</p><p>Add all frequent patterns to F 1 .</p><p>Step k+1: For each P=&lt;ps 1 , …, ps k &gt; ∈ F k find all patterns Q=&lt;qs 1 , …, qs k &gt; ∈ F k so that qs 1 = ps 2 , …, qs k-1 = ps k , and generate patterns &lt;ps 1 , …, ps k , qs k &gt; from P and each described Q. Test all generated patterns and add all frequent ones to F k+1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Association Rules Discovery</head><p>Once all frequent patterns are discovered, we can try to find association rules. Let's denote a rule as A B. But first of all we need to define which patterns will be considered as associated. This must be some condition (cond) describing how antecedent pattern (A) and succedent pattern (B) are disposed. For example, this can be "B starts during A and ends within 2 hours after A ends" or "B starts within 5 minutes after A ends". This condition is to be chosen by a domain expert so that discovered rules would be easy to interpret. Consider all occurrences of A and B that satisfy condition cond. Let supp A B (A) and supp A B (B) be a supports of A and B respectively, that are built basing only on such occurrences. Support of A B is defined in <ref type="bibr" target="#b4">(5)</ref>.</p><formula xml:id="formula_10">supp(A B) = supp A B (A) ∩ supp A B (B) (5)</formula><p>The simplest way to generate association rules is to evaluate a confidence:</p><formula xml:id="formula_11">supp(A) B) supp(A B) conf(A =</formula><p>for each pair of frequent patterns and select the rule only if conf(A B) ≥ conf min . But this will result in a large set of relatively useless rules. For example, if occurrences of B cover almost all time series, then all rules with B as a succedent will be selected, but most of them will be of small interest. <ref type="bibr" target="#b5">(6)</ref> [9] gives a survey of the most successful interestingness measures used in knowledge discovery. One of the most popular measures for association rules evaluation is a J-measure <ref type="bibr" target="#b14">[15]</ref> (see <ref type="bibr" target="#b5">(6)</ref>). It is a special case of Shannon's mutual information and takes into account both rule frequency and rule "surprisingness". If J-measure of a rule is less than some threshold value, then it is not interesting enough and should be excluded from consideration.</p><formula xml:id="formula_12">( ) ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − − − + = ) ( 1 ) | ( 1 log ) | ( 1 ) ( ) | ( log ) | ( ) ( ) ; ( B p A B p A B p B p A B p A B p A p A B J</formula><p>In our context p(B) is a probability of pattern B occurring in randomly selected window of length ≤ L, and p(A B) is a probability of the fact that B occurs in a randomly selected window of length ≤ L that contains an occurrence of A and cond is true for these occurrences. The probabilities are evaluated as:</p><formula xml:id="formula_13">) ( ) ( ) | ( ; 2 / 3 ) ( ) ( 2 A supp B A supp A B p L B supp B p = = (7)</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Discussion and Evaluation</head><p>There are a number of issues that should be discussed about the proposed method. One of them is a number of mixed states. In general case, having N simple states, we obtain 2 N possible mixed states. But in practice some of them cannot take place simultaneously (or can hold only simultaneously) due to some reason. For example, consider 3 time series: pressure, temperature and humidity data, and 6 simple states for each of them: "increase", "slow increase", "fast increase", "decrease", "slow decrease", "fast decrease". There are 2 3*6 possible mixed states, but it is obvious that slowly increasing and fast increasing segments of the same time series cannot overlap, or decreasing and fast decreasing segments of the series always overlap, and so on. Thus the number of possible mixed states is reduced to 7 3 (or even 5 3 if any decreasing/increasing segment is either fast or slowly decreasing/increasing). Moreover, possibly not all of them have support exceeding supp min , so they will be removed from consideration after the first step of frequent patterns discovery procedure.</p><p>Another problem is very short mixed-state intervals appearing due to inaccurate initial time series division (especially if it was performed by a human). On Fig. <ref type="figure">3</ref> (a) the bounds of intervals labeled with simple states A and B were defined roughly, and this resulted in a small interval labeled with a mixed state {A, B}. A simple solution is to ignore intervals shorter than some threshold ε when searching for pattern occurrences. For example, if l &lt; ε, then a sequence from the Fig. <ref type="figure">3 (a)</ref> does not match a pattern &lt;{A}, {A, B}, {B}&gt;. The same idea can be applied to situation when there is a small gap between intervals labeled with the same state (see Fig. <ref type="figure">3 (b)</ref>). Such gaps may appear, for example, due to smoothing faults. In this case it may be ignored during patterns discovery and both intervals may be considered as one continuous interval.</p><p>But what if the user wants to have patterns that exactly define which intervals must be adjacent and which may have a gap between them in a matching sequence? For example, he wants to extend pattern &lt;{A}, {B}, {C}&gt; with some information like "in a matching sequence {A}-interval must be met by {B}interval, but there may be a gap between {B}-interval and {C}-interval". Such patterns can be enabled by introducing of a special symbol "gap" ( ). A pattern described above can be expressed like &lt;{A}, {B}, , {C}&gt;. "Gap" is not a mixed state, so pattern cardinality does not change, but a space of all candidate patterns increases. To simulate it the following parameters were used: − number of variables: 3; − number of labels: 9; − random interval length from 1 to 100; − total series length: 30000. Fig. <ref type="figure">4</ref> shows that the number of patterns is inversely proportional to supp min , and it seems that this number decreases close to linearly if ε grows (Fig. <ref type="figure">5</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion and Future Work</head><p>This paper proposes a method and an algorithm for frequent patterns and rules discovery in labeled interval sequence. The interval sequence can be obtained, for example, from multivariate time series by dividing each variable's behavior into labeled intervals. Since pattern is a sequence of mixed states, a domain expert can easily interpret all found frequent patterns and rules. Inaccuracy of series division and data smoothing defects can be compensated by filtering intervals with minimum interval length. The method was tested over simulated interval set and dependency of a frequent patterns number from minimum support and minimum interval length were investigated. The problem of the proposed method is a large number of generated frequent patterns and rules. A Jmeasure can help to reduce it, but there still remains a problem of ranking patterns by their interpretability and "importance". Also an effective algorithm for frequent patterns discovery needs to be developed. Although an initial interval set is converted to a simple sequence of mixed state intervals and a pattern is either a mixed states sequence, classical methods of finding subsequence in a sequence are not suitable for pattern matching, since this process is rather complicated.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. An example of interval set</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>(a) illustrates I for intervals shown on Fig. 1: D = {1, 2, 3, 4, 5, 6}; I = (({A}, 1, 2), ({A, B, C}, 2, 3), ({A, C, D}, 3, 4), ({C, D}, 4, 5), ({C}, 5, 6))</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>Consider: − a pattern P = &lt;ps 1 , …, ps n &gt; − a subsequence Q of I: Q = ((s 1 , b 1 , e 1 ), (s 2 , b 2 , e 2 ), … , (s m , b m , e m )), where (s k , b k , e k ) ∈ I and e k ≤ b k+1 .</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>u := 1; v := 1; while (u ≤ n and v ≤ m and ps u ⊆ s v ) while ((v = 1 or e v-1 = b v or (s v-1 ,b v-1 ,e v-1 ) is associated with ps u-1 ) and v ≤ m and ps u ⊆ s v and (u = n or ps u+1 s v or ps u+1 ⊆ ps u )) associate (s v ,b v ,e v ) with ps u ; v</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>2 .</head><label>2</label><figDesc>If e k &lt; b k+1 , then (s k , b k , e k ) and (s k+1 , b k+1 , e k+! )</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 3 .Fig. 4 .Fig. 5 .Fig. 4 and</head><label>3454</label><figDesc>Fig. 3. Too short intervals</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>1 ) is associated with ps u and ps u+1 ⊆ s k , then (s k , b k , e k ) will be associated with ps u+1 only if ps u s k . This is due to semantics of such patterns. A pattern &lt;{A, B}, {A}&gt; means that at first conditions A and B must hold for some time simultaneously and then there must be some time when A is true and B is false. If an existence of one of these intervals is not important, then the pattern is to be reduced to &lt;{A}&gt; or &lt;{A, B}&gt;. Fig. 2 (b, c) shows how a sequence ({A}, {A, B, C}, {A, C, D}, {C, D}, {D}) matches patterns &lt;{A}, {A, B}, {C, D}, {D}&gt; and &lt;{A}, {B}, {C}, {D}&gt;.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>r); − there is a subsequence ((s 1 , b 1 , e 1 ), (s 2 , b 2 , e 2 ), …, (s m , b m , e m )) of I matching P so that: • if n ≥ 2, then: − e x = pb i and b y = pe i (where (s x , b x , e x ) is the last interval associated with ps 1 , and (s y , b y , e y ) is the first interval associated with ps n during matching); − there are no shorter (i.e. with less |b y -e x |) subsequences of I matching P within [pb i , pe i ]. • if n = 1, then pb i = b 1 and pe i = e 1 .</figDesc><table /><note>n ≥ 2:</note></figure>
		</body>
		<back>

			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>♣ This work was partially supported by RFBR (grant 07-07-00268a).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Proceedings of the Spring Young Researcher's</head></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Fast Similarity Search in The Presence of Noise, Scaling and Translation in Time-Series Databases</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-L</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Sawhney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Shim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 21st Int. Conf. on Very Large Databases</title>
				<meeting>of the 21st Int. Conf. on Very Large Databases<address><addrLine>Zurich, Switzerland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Maintaining knowledge about temporal intervals</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Allen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="832" to="843" />
			<date type="published" when="1983">1983</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Temporal Data Mining: an Overview</title>
		<author>
			<persName><forename type="first">C</forename><surname>Antunes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oliveira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">KDD Workshop on Temporal Data Mining</title>
				<meeting><address><addrLine>San Francisco</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="1" to="13" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E P</forename><surname>Box</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Jenkins</surname></persName>
		</author>
		<title level="m">Time Series Analysis: Forecasting and Control</title>
				<meeting><address><addrLine>San-Francisco, Holden-Day</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1976">1976</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Finding Patterns in Time Series: A Dynamic Programming Approach</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Brendt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clifford</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Knowledge Discovery And Data Mining, chapter 9</title>
				<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="229" to="248" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Rule Discovery from Time Series</title>
		<author>
			<persName><forename type="first">G</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-I</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mannila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Renganathan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Smyth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 4th Int. Conf. on Knowledge Discovery and Data Mining</title>
				<meeting>of the 4th Int. Conf. on Knowledge Discovery and Data Mining</meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="16" to="22" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Discovering Temporal Patterns for Interval-Based Events</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">W C</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Kam</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Second International Conference on Data Warehousing and Knowledge Discovery (DaWaK</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Kambayashi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Mohania</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Tjoa</surname></persName>
		</editor>
		<meeting><address><addrLine>London, UK</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2000">2000. 2000</date>
			<biblScope unit="volume">1874</biblScope>
			<biblScope unit="page" from="317" to="326" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Temporal Rule Discovery Using Genetic Programming and Specialized Hardware</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Hetland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Saetrom</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of 4th Int. Conf. on Recent Advances in Soft Computing</title>
				<meeting>of 4th Int. Conf. on Recent Advances in Soft Computing</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="182" to="188" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Knowledge discovery and interestingness measures: A survey</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Hilderman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">J</forename><surname>Hamilton</surname></persName>
		</author>
		<idno>CS 99-04</idno>
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
		<respStmt>
			<orgName>Department of Computer Science, University of Regina</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Discovery of Temporal Patterns -Learning Rules about the Qualitative Behavior of Time Series</title>
		<author>
			<persName><forename type="first">F</forename><surname>Höppner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases</title>
		<title level="s">Lecture Notes in Artificial Intelligence</title>
		<meeting>of the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2001">2001</date>
			<biblScope unit="volume">2168</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Mining Interval Time Series</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Hua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Maulik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Villafane</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Data Warehousing and Knowledge Discovery</title>
				<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="318" to="330" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">An Overview of Temporal Data Mining</title>
		<author>
			<persName><forename type="first">W</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Orgun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J</forename><surname>Williams</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Australian Data Mining Workshop</title>
				<meeting>the 1st Australian Data Mining Workshop<address><addrLine>Canberra, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Discovery of Frequent Episodes in Event Sequences</title>
		<author>
			<persName><forename type="first">H</forename><surname>Mannila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Toivonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Verkamo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="259" to="289" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Mining Hierarchical Temporal Patterns in Multivariate Time Series</title>
		<author>
			<persName><forename type="first">F</forename><surname>Moerchen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ultsch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 27th German Conference on Artificial Intelligence (KI)</title>
				<meeting>of the 27th German Conference on Artificial Intelligence (KI)<address><addrLine>Ulm, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Rule Induction Using Information Theory</title>
		<author>
			<persName><forename type="first">P</forename><surname>Smyth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Goodman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Knowledge Discovery in Databases, chapter 9</title>
				<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="1991">1991</date>
			<biblScope unit="page" from="159" to="176" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
