-

Pattern Structures for Understanding Episode Patterns

Keisuke Otaki?

Ikeda Madori

m.ikedag@iip.ist.i.kyoto-u.ac.jp 0

Akihiro Yamamoto

akihiro@i.kyoto-u.ac.jp 0 0 Department of Intelligence Science and Technology Graduate School of Informatics, Kyoto University , Japan

We investigate an application of pattern structures for understanding episodes, which are labeled directed acyclic graphs representing event transitions. Since typical episode mining algorithms generate a huge number of similar episodes, we need to summarize them or to obtain compact representations of them for applying the outcome of mining to various problems. Though such problems have been well-studied for itemsets, summarization of episodes is still understudied. For a class called diamond episodes, we rst provide a pattern structure based on hierarchy of events to obtain small groups of episodes in the form of pattern concepts and lattice structures. To nd a summary via pattern concepts, we design an utility function for scoring concepts. After ranking concepts using some function and lattice structures, we try to sample a set of pattern concepts of high scores as a summary of episodes. We report our experimental results of our patten structure, and a ranking result of our simple utility function. Last we discuss pattern concept lattices and their applications for summarization problems.

formal concept analysis pattern structure episode pattern pattern summarization

Knowledge Discovery from binary databases is a fundamental problem setting, where binary databases represent that some objects have some features by their 1 entries. Because such a situation can be seen in many practical problems, both theoretical and practical aspects of the problems have been studied.

On a mathematical viewpoint, Formal Concept Analysis (FCA) [ 5 ] has been studied as a model of analyzing such binary databases. We deal with a context K = (O; A; I) consisting of 1) a set O of objects, 2) a set A of attributes, and 3) a binary relation I O A representing that an i-th object has a j-th attribute. FCA adopts two functions f and g for analyzing O and A; f receives a set of objects and returns a set of attributes which are commonly possessed by given objects, and g receives a set of attributes and returns a set of objects which ? Keisuke Otaki is supported as a JSPS research fellow (DC2, 26 4555). have commonly the input attributes. For X O and Y A, a tuple (X; Y ) is called a concept if f (X) = Y and X = g(Y ). Computing the set of all concepts is a fundamental but important task in FCA, which help us to analyze binary databases. On a practical viewpoint, it is well-known that formal concepts are related to closed itemsets studied in frequent itemset mining [ 13 ], which are also known as compact representations of itemsets.

To deal with non-binary data in an FCA manner, pattern structures [ 4 ] have been studied. A key idea is generalizing both the set intersection \ and the subset relation , which are used in two functions f and g in FCA. The set intersection \ is replaced with a meet operator u that extracts common substructures of two objects. The subset relation is also replaced with a partial order v induced by u, where v represents some embedding from an object into another. We now assume that they would help us to understand complex data.

In this paper, we investigate pattern structures and their applications for understanding patterns, motivated by a requirement of summarization techniques because a large numbers of patterns is always generated by some mining algorithms. As an example in this paper, we deal with some classes of episode patterns, which represent event transitions in the form of labeled graphs. From such patterns, we can compute lattice structures based on pattern structures (Section 3). Since such lattices represent mutual relations among patterns and several small clusters as pattern concepts, analyzing them would be helpful to obtain a small set of pattern concepts. We regard a subset of all concepts as a summary of features often used in describing patterns, and develop a way of obtaining a small set of concepts as a summary. When we construct descriptions of objects, we also introduce the wildcard ? as a special symbol representing all events to take into account some hierarchy of labels based on our knowledge. It would be a strong merit of pattern structures for summarization in which similar patterns could be merged into some descriptions with the wildcard ?. After providing pattern structures, we provide preliminary experimental results (Section 4) and discuss it on the viewpoint of summarization by giving a utility function for ranking pattern concepts (Section 5). 2

Formal Concept Analysis and Episode Mining

FCA and Pattern Structures We adopt the standard notations of FCA from [ 5 ] and pattern structures from [ 4 ], respectively. Here we refer the notations of FCA which we have already used in Section 1. For a context K = (O; A; I), X O and Y A, two functions f and g in FCA are formally de ned by f (X) = fa 2 A j (o; a) 2 I for all o 2 Xg and g(Y ) = fo 2 O j (o; a) 2 I for all a 2 Y g, respectively. Recall that a pair (X; Y ) is a (formal) concept if f (X) = Y and g(Y ) = X. Two operators f g( ) and g f ( ) are closure operators on 2O and 2A, respectively. Note that a concept (X; Y ) is in the form either (g(f (X)); f (X)) or (g(Y ); f (g(Y )). For two concepts (X1; Y1) and (X2; Y2), the partial order is introduced by X1 X2 (, Y2 Y1).

An important aspect of pattern structures is generalization of two operations \ and used in f and g. They are characterized by meet semi-lattices : A meet semi-lattice (D; u) of a set D and a meet operator u is an algebraic structure satisfying: 1) Associativity; x u (y u z) = (x u y) u z for x; y; z 2 D, 2) Commutativity; x u y = y u x for x; y 2 D, and 3) Idempotency; x u x = x for x 2 D. Elements in D are called descriptions. A partial order v is induced by u as x v y whenever x u y = x for two elements x; y 2 D: Example 1 (A meet semi-lattice with sets). Let D = 2N. For two sets of integers X = f1; 2g and Y = f1; 2; 3; 4g, it holds that X \ Y = X, which induces X Y . Example 2 (A meet semi-lattice for closed intervals [ 8 ]). Let D be a set of all closed intervals [a; b] with integers a; b 2 N. They de ne u of two closed intervals by keeping its convexity, that is, [a1; b1] u [a2; b2] = [min(a1; a2); max(b1; b2)].

By generalizing a meet semi-lattice (2A; \) used in FCA, a pattern structure P is de ned by a triple (O; (D; u); ), where O is the set of objects, (D; u) is a meet semi-lattice of descriptions, and : O ! D is a mapping of giving a description for each object. For analyzing pattern structures, we obtain the following Galois connection f( ) ; ( ) g corresponding to f and g in FCA: A = uo2A (o) for A

O; d = fo 2 O j d v (o)g for d 2 D: Pattern concepts based on P are de ned by Equations (1) and (2): De nition 1 (Pattern concepts) A pattern concept of P is a pair (A; d) of a set A O and a pattern d 2 D satisfying A = d and d = A. Two pattern concepts are partially ordered by (A1; d1) (A2; d2) by A1 A2 , d2 v d1.

Note that by its partial order, the set of all pattern concepts forms a lattice structure. We denote the set of all pattern concepts by P(P). To obtain P(P), we need to compute two functions ( ) and ( ) . For example, we can adopt the AddIntent proposed in [ 12 ] and used in [ 8 ].

Episode Mining We brie y review episode mining based on [ 7 ]. Let E = f1 : : : ; mg N be the set of events. We call a set S E of events an event set. An input of episode mining is a long sequence of event sets; an input event sequence S on E is a nite sequence hS1; : : : ; Sni 2 (2E ) , where each set Si E is the i-th event set. For S of length n, we assume that Si = ; if i < 0 or i > n.

Episodes are labeled directed graphs (DAGs). An episode G is a triple (V; E; ), where V is the set of vertices, E is the set of directed edges, and is the labeling function from V and E to the set of labels, that is, E . Several classes of episodes have been studied since episode mining is rstly introduced by Mannila et. al. [ 11 ]. We follow subclasses of episodes studied by Katoh et al. [ 7 ]. An (1) (2) A

D C

3%serial,episode A

B E B E

Diamond,episode A

E B C D

2%serial,episode A

C 1%serial,episode E 3%serial,episode example of episodes is illustrated in Figure 1. In designing pattern mining algorithms, we need 1) a search space of patterns and a partial order for enumerating patterns, and 2) interestingness measure to evaluate them. For episode mining, we often adopt occurrences of episodes de ned with windows.

De nition 1 (Windows). For a sequence S = hS1; : : : ; Sni, an window W of S is a contiguous subsequence hSi; ; Si+w 1i of length n, called width, for some index i ( w + 1 i n) of S and a positive integer w 0. De nition 2 (Embedding of Episodes). Let G = (V; E; ) be an episode, and W = hS1; : : : ; Swi be a window of width w. We say that G occurs in W if there exists a mapping h : V ! f1; : : : ; wg satisfying 1) for all v 2 V , h(v) 2 Sh(x), and 2) for all (u; v) 2 E with u 6= v, it holds that h(u) < h(v). The map h is called an embedding of G into W , and it is denoted by G W .

For an input event sequence S and a episode G, we say that G occurs at position i of S if G Wi, where Wi = hSi; : : : ; Si+w 1i is the i-th window of width w in S. We then call the index i an occurrence of G in S. The domain of the occurrences is given by WS;w = fi j w + 1 i ng. In addition, WS;w(G) is the occurrence window list of an episode G, de ned by f w + 1 i n j G Wig. Then we can de ne an interestingness measure frequency of episodes. De nition 3 (Frequency of Episodes). The frequency of an episode G in S and w, denoted by freq S;w(G), is de ned by the number of windows of width w containing G. That is, freq S;w(G) = jWS;w(G)j. For a threshold 1, a width w and an input event sequence S, if freq S;w(G) , G is called -frequent on S. The frequent episode mining problem is de ned as follows: Let P be a class of episodes. Given an input event sequence S, a width w 1, and a frequency threshold 1, the problem is to nd all -frequent episodes G belonging to the class P. The simplest strategy of nding all -frequent episodes is traversing P by using the anti-monotonicity of the frequency count freq ( ). For details, we would like to refer to both [ 7 ] and [ 11 ].

For our examples of classes, we introduce m-serial episodes and diamond episodes. An m-serial episode over E is a sequence of events in the form of a1 7! a2 7! 7! am. A diamond episode over E is either 1) a 1-serial episode e 2 E or 2) a proper diamond episode represented by a triple Q = ha; X; bi 2 E 2E E , where a; b are events and X E is an event set occurring after a and before b. For short, we write a diamond episode as a 7! X 7! b. On the one hand de nitions of episodes by graphs are much general, on the another hand classes of episode patterns are often restricted.

Example 3 (Episodes). In Figure 1, we show some serial episodes; A 7! B 7! E, A 7! D 7! E, B 7! E, and C on the set of events E = fA; B; C; D; Eg. All of them are included in a diamond episode A 7! fB; C; Dg 7! E.

We explain a merit of introducing pattern structures for summarization of structured patterns. As we mentioned above, a common strategy adopted in pattern mining is traversing the space P in a breadth- rst manner with checking some interestingness measure. When generating next candidates of frequent patterns, algorithms always check a parent-child relation between two patterns. This order is essential for pattern mining and we thus conjecture that this parentchild relation used in pattern mining can be naturally adopted in constructing a pattern structure for analyzing patterns only by introducing a similarity operation u. After constructing a lattice, it would be helpful to analyze a set of all patterns using it because they represent all patterns compactly.

A crucial problem of pattern structures is the computational complexity concerning both u and v. Our idea is to adopt trees of height 1 (also called stars in Graph Theory). That is, we here assume that trees are expressive enough to represent features of episodes. Our idea is similar that used in designing graph kernels [ 14 ]1 and that is inspired by previous studies on pattern structures [ 2, 4 ]. 3

Diamond Episode Pattern Structures

In the following, we focus on diamond episodes as our objects, and trees of height 1 as our descriptions. They have two special vertices; the source and the sink. They can be regarded as important features for representing event transitions. We generate rooted labeled trees from them by putting the node in the root of a tree, and regarding neighbors as children of it. Since heights of all trees here are 1, we can represent them by tuples without using explicit graph notations. De nition 4 (Rooted Trees of Height 1). Let (E ; uE ) be a meet semi-lattice of event labels. A rooted labeled tree of height 1 is represented by a tuple 2 (e; C) 2 E 2E . We represent the set of all rooted labeled trees of height 1 by T. Note that in (E ; uE ), we assume that uE compares labels based on our background knowledge. We need to take care that this meet semi-lattice (E ; uE ) is independent and di erent from a meet semi-lattice D of descriptions of a pattern structure P. This operation uE is also adopted when de ning an embedding of trees of height 1, that is, a partial order between trees de ned as follows. 1 It intuitively generates a sequence of graphs by relabeling all vertices of a graph. One focus on a label of a vertex v 2 V (G) and sees labels LN G(v) of its neighbors NG(v). For a tuple (lv; LN G(v)) for all vertices v 2 V (G), we sort all labels lexicographically, and we assign a new label according to its representation. Details are seen in [ 14 ]. 2 On the viewpoint of graphs, this tuple (e; C) should represent a graph G = (V; E; ) of V = f0; 1; : : : ; jCjg, E = f(0; i) j 1 i jCjg, (0) = e, f (i) j 1 i jCjg = C.

An-episode G0

source A

E B

An-episode G1

A source B D C

B D neighbors

D neighbors

δ δ (G0)

E D B A (G1)

D A B C D

Label-comparing-at-root

E E D = * Generalization-of-children

D B A A B C D =

D B A (G0) t (G1)

D B A Semi>lattice-for-events

A B C D E *

De nition 5 (Partial Order on Trees). A tree t1 = (e1; C1) is a generalized subtree of t2 = (e2; C2), denoted by t1 vT t2, i e1 vE e2 and there exists an injection mapping : C1 ! C2 satisfying for all v 2 C1, there exists (v) 2 C2 satisfying v vE (v), where vE is the induced partial order by uE .

For de ning a similarity operator uT between trees, this partial order vT is helpful because uT is closely related to vT in our scenario. Since all trees here are height 1, this computation is easy to describe; For labels of root nodes, a similarity operator is immediately given by using uE . For their children, it is implemented by using an idea of least general generalization (LGG), which is used in Inductive Logic Programming [ 10 ], of two sets of labels. A practical implementation of LGG depends on whether or not sets are multisets, but it is computationally tractable. An example is seen in Figure 2.

We give formal de nitions of and D. For a graph G = (V; E; ), we denote the neighbors of v 2 V by NG(v). For some proper diamond episode pattern G, the source vertex s 2 V and the sink vertex t 2 V , computed trees of height 1 corresponding s and t are de ned as Ts = ( s f g [ NG(s); f(s; u) j u 2 NG(s)g; ), and Tt = (ftg [ NG(t); f(u; t) j u 2 NG(t)g; ), respectively. By using those trees, ( ) can be de ned according to vertices s and t: If we see both Ts and Tt, (G) = (Ts; Tt) and then uT is adopted element-wise, and D is de ned by T T. If we focus on either s or t, (G) = Ts or Tv, and we can use uT directly by assuming D = I.

Last we explain relations between our pattern structures and previous studies shortly. This partial order vT is inspired from a generalized subgraph isomorphism [ 4 ] and a pattern structure for analyzing sequences [ 2 ]. We here give another description of similarity operators based on de nition used in [ 4, 9 ]. De nition 6 (Similarity Operation u based on [ 9 ]). The similarity operation u is de ned by the set of all maximal common subtrees based on the generalized subtree isomorphism vT ; For two trees s1 and s2 in T, fu j u vT s1; s2; and 8u0 vT s1; s2 satisfying u 6vT u0g: Input*sequence *tseA A A AB B A A AB AB AB AB AB AB A tenC C C C C C C C C C C C C C ev D D D D D D D D D D D

E E E E E E

C C D D D E E time

Examples*of*episodes B

A C D

B A C D Note that we can regard that our operator uT is a special case of the similarity operation u above. On the viewpoint of pattern structures, our trees of height 1 can be regarded as an example of projections from graphs into trees, studied in [ 4, 9 ], such as both k-chains (paths on graphs of length k) and k-cycles. 4

Experiments and Discussion for Diamond Episodes

Data and Experiments We gathered data from MLB baseball logs, where a system records all pitching and plays for all games in a season. We used what types of balls are used in pitching, which can be represented by histograms per batter. For a randomly selected game, we generated an input event sequence of episode mining by transforming each histogram to a set of types of balls used types of balls3. In forming (E ; uE ), we let E be the set of types of balls, and de ne uE naturally (See Example in Fig. 2). For this S, we applied a diamond episode mining algorithm proposed by [ 7 ] and obtain a set of diamond episodes. The algorithm have two parameters; the window size w and the frequency threshold . We always set = 1 and varied w 2 f3; 4; 5g. After generating a set G of frequent proper diamond episodes, we sampled M 2 f100; 200; : : : ; 700g episodes from G as a subset O of G (that is, satisfying jOj = M and O G). We used O as a set of objects in our pattern structure P. From it we computed all pattern concepts P(P) based on our discussions in Section 3. In this experiments we set (G) = Ts for a proper diamond episode G and its source vertex s. 3 In baseball games, pitchers throw many kinds of balls such as fast balls, cut balls, curves, sinkers, etc. They are recorded together with its movements by MLB systems.

A G0 E B D

A G1 B B B

A G2 D CB B

C G3 E D A

A G4 A B A

G0G2G3 *

D * G0G3 E

D * G0 E A B D

For experiments, we adopted an implementation by Katoh et al. [ 7 ] for mining diamond episodes, which is written in C++4. We implemented the Galois connection f( ) ; ( ) g by the AddIntent [ 12 ] algorithm using Python5. Results and Discussion Table 1 shows the results of numbers of pattern concepts and those of proper diamond episodes for each set corresponding w = 3; 4, and 5, with varying M 2 f100; 200; 300; 400; 500; 600; 700g. In Figure 4, we show a result of pattern concepts and a pattern concept lattice computed from only rst 5 episodes for the case w = 5, as an example of our lattice structures.

Because we assume a semi-lattice (E ; uE ) of events in episode patterns, we can obtain pattern concepts in which some vertices are represented by the wildcard ?. If we implement ut without the wildcard ?, we can only obtain a much smaller number of pattern concepts compared with our results including the wildcard ?. We thus conjecture that the wildcard ? is useful to represent similar patterns in Figure 4. On such a viewpoint, pattern structures and pattern concepts help us to make some group of patterns, which are basically similar to clustering patterns. Here we do not discuss details of computational complexity of constructing pattern concept lattices, but the complexity basically depends on the number of concepts. Thus it would be interesting to investigate and compare several possible projections and computations of pattern concepts. 5

Pattern Summarization and Pattern Structures

In this section, we discuss pattern summarization based on pattern concept lattices. As we mentioned, closed itemsets [ 13 ] have been studied as compact 4 We adopt gcc4.7 as our compiler including c++11 features (-std=c++11). The code is compiled on a machine of Mac OS X 10.9 with two 2.26 GHz Quad-Core Intel Xeon Processors and 64GB main memory. 5 We used Python 2.7 without any additional packages (that is, pure python). representations of itemsets, and they are closely related to the closure operator g f in FCA with (O; A; I), where O is the set of transaction identi ers and A is the set of all items. The di culty of closed patterns for complex data is there are no common de nitions of closure operators, where we usually use the closeness with respect to the frequency. Here we assume that pattern concepts are helpful in the same correspondence between closed itemsets and concepts.

To obtain some compact representations, we need to decide how to evaluate each pattern. The problem here is how to deal with the wildcard ? in descriptions. When we obtain a concept (X; Y ) for X O; Y A, this concept (X; Y ) corresponds to a rectangle on I, and there are no 0 entries in the sub-database I0 = f(x; y) 2 I j x 2 X; y 2 Y g of I induced by (X; Y ) because of its de nitions. If (X; Y ) is not a concept, a rectangle r by (X0; Y 0) contains a few 0 entries in it. We denote the relative ratio of 1 entries in a rectangle r by (X0; Y 0) as r1(X0; Y 0; I) = (1

jf(x; y) 62 I j x 2 X0; y 2 Y 0gj) (jX0jjY 0j) 1 ; where 0 r1(X0; Y 0; I) 1 and r1(X0; Y 0; I) = 1 if (X0; Y 0) is a concept. These r1(X; Y; I), jXj, and jY j are applicable for evaluating itemsets. If we only use the cardinality jAj of a set A of objects, this equals to the support counts computed in Iceberg concept lattices [ 15 ]. For a concept (X; Y ) of a context K = (O; A; I), we compute the support count supp(X; Y ) = jg(Y )j=jOj and prune redundant concepts by using some threshold. For formalizing evaluations of patterns, such values are generalized by introducing a utility function u : P ! R+. A typical and well-studied utility function is, of course, the frequency count, or the area function area( ) which evaluates the size of a rectangle (X; Y ) [ 6 ].

Based on discussions above, if we can de ne a utility function u( ) for evaluating pattern concepts, a similar discussion for pattern concepts are possible; choosing a few number of pattern concepts and constructing summary of patterns with them. Of course, there are no simple way of giving such functions. We try to introduce a simple and straightforward utility function uP ( ) for pattern concepts as a rst step of developing pattern summarization via pattern concept lattices. In this paper, we follow the idea used in tiling databases [ 6 ], where a key criterion is given by area( ). We consider how to compute the value which corresponds to the area in binary databases. To take into account the wildcard ? used in descriptions, we de ne the following simple function. For d 2 D, we let s(d) and n(d) be the numbers of non wildcard and all vertices in a description d, respectively. Note that if s(d) = n(d), d contains no wildcard labels. By using these functions, we compute utility values as follows:

uP (A; d) = jAj log (1 + s(d)) : 5.1

Experiments and Discussions We compare results of ranking pattern concepts by 1) using only jAj (similar to the Iceberg concept lattices), and 2) using uP ( ) as a utility function. From the list of pattern concepts generated in experiments of Section 4, we rank all jAj (?; f?g), (2; f?g), (0; f?g), (3; f?g), (1; f?g) uP ( ) (?; f0; ?g), (?; f0; 2; 3g), (?; f0; 1; 2g), (?; f0; 1; 3g), (?; f1; 2; 3g) pattern concepts by using a utility function, and sort the list in an ascending order, and compare two lists. We remove patterns appearing commonly in both lists to highlight di erences. We give our results in Table 2.

In the result with uP ( ), larger descriptions appear with higher utility values compared with those by j j

A . We can see that by modifying terms concerning ?, results contain more informative nodes, which are labeled by non-wildcard labels. Here we implicitly assume that descriptions contains less ? would be more useful for understanding data themselves. On this viewpoint, considering two terms s(d) and n(d) for description d would be interesting and useful way to design utility functions for pattern concepts. We conclude that the Iceberg lattice based support counts are less e ective if descriptions admit the wildcard ? for pattern summarization problems.

Not only the simple computation in uP (A; d) used above, also many alternatives could be applicable for ranking. Some probabilistic methods such as the minimum description length (MDL), information-theoretic criteria would be also helpful to analyze our study more clearly. Since pattern structures have no explicit representations of binary cross tables, the di culty lies on how to deal with a meet semi-lattice (D; u). For some pattern concept (A; d) and an object o 2 O, we say that (A; d) subsumes o if and only if d v (o). This subsumption relation would be simple and helpful to evaluate concepts, but they does not adopt any complex information concerning hierarchy of events, or distances between two descriptions. In fact in the experiments, we always assume that all events except ? have the same weight and ? is the minimum of all events. They could be important to take into account similarity measures of events for more developments of ranking methods of pattern concepts. 5.2

Related Work There are several studies concerning our study. It is well-known that closed itemsets correspond to maximal bipartite cliques on bipartite graphs constructed from K = (O; A; I). Similarly, we sometimes deal with so called pseudo bipartite cliques [ 16 ], where it holds that r1(X0; Y 0; I) 1 " with a user-speci ed constance ". Obviously, pseudo bipartite cliques correspond to rectangles containing a few 0. We can regard them as some summarization or approximation of closed itemsets or concepts. Intuitively, if we use some pseudo bipartite cliques as summarization, the value r1(X; Y; I) can be considered in evaluating (X; Y ). Pseudo bipartite cliques can be regarded as noisy tiles, which is an extension of tiles [ 6 ].

Another typical approach for summarization is clustering patterns [ 18, 1 ]. A main problem there is how to interpret clusters or centroids, where we need to design a similarity measure and a space in which we compute the similarity. On the viewpoint of probabilistic models, there is an analysis via the maximum entropy principle [ 3 ]. However they assume that entries in a database are independently sampled, and thus we cannot apply those techniques to our setting. 6

Toward Generalizations for Bipartite Episodes

In this paper we assume that our descriptions by trees of height 1 are rich enough to apply many classes of episode patterns. We here show how to apply our pattern structure for other types of episodes, called bipartite episodes, as an example. An episode G = (V; E; ) is a a partial bipartite episode if 1) V = V1[V2 for mutually disjoint sets V1 and V2, 2) for every directed edge (x; y) 2 E, (x; y) 2 V1 V2. If E = V1 V2, an episode G is called a proper bipartite episode. Obviously, vertices in a bipartite episode G are separated into V1 and V2, and we could regard them as generalizations of the source vertex and the sink vertex of diamond episodes. This indicates that the same way is applicable for bipartite episodes by de ning u between sets of tress. Fortunately, [ 9 ] gives the de nition u for sets of graphs. ft1; : : : ; tkg u fs1; : : : ; smg 0

1 MAXvT @[(ftig u fsj g)A ; i;j where MAXvT (S) returns only maximal elements in S with respect to vT . Since our generalized subtree isomorphism is basically a special case of that for graphs, we can also apply this meet operation. This example suggest that if we have some background knowledge concerning a partition of V , it can be taken into account for and (D; u) in a similar manner of diamond and bipartite episodes. 7

Conclusions and Future Work

In this paper we propose a pattern structure for diamond episodes based on an idea used in graph kernels and projections of pattern structures. Since we do not directly compute graph matching operations we conjecture that our computation could be e cient. With a slight modi cation of u, our method is also applicable for many classes of episodes, not only for diamond patterns as we mentioned above. Based on our pattern structure, we discussed summarization by using mined pattern concepts and show small examples and experimental results.

Since problems of this type are unsupervised and there is no common way of obtaining good results and of evaluating whether or not the results are good. It would be interesting to study more about this summarization problem based on concept lattices by taking into account theoretical backgrounds such as probabilistic distributions. In our future work, we try to analyze theoretical aspects on summarization via pattern structures including the wildcard ? and its optimization problem to obtain compact and interesting summarization of many patterns based on our important merit of a partial order v between descriptions.

Acknowledgments

This work was supported by Grant-in-Aid for JSPS Fellows (26 4555) and JSPS KAKENHI Grant Number 26280085.

Hasan , M. , Chaoji , V. , Salem , S. , Besson , J. , Zaki , M. : Origami: Mining representative orthogonal graph patterns . In: Proc. of the 7th ICDM . pp. 153 { 162 ( 2007 )

2. Buzmakov , A. , Egho , E. , Jay , N. , Kuznetsov , S.O. , Napoli , A. , Rassi, C.: The representation of sequential patterns and their projections within Formal Concept Analysis . In: Workshop Notes for LML (ECML/PKDD2013) ( 2013 )

3. De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases . Data Mining and Knowledge Discovery 23 ( 3 ), 407 { 446 ( 2011 )

4. Ganter , B. , Kuznetsov , S.O. : Pattern structures and their projections . In: Proc. of the 9th ICCS . pp. 129 { 142 ( 2001 )

5. Ganter , B. , Wille , R.: Formal concept analysis - mathematical foundations . Springer ( 1999 )

6. Geerts , F. , Goethals , B. , Mielik ainen, T.: Tiling databases . In: Proc. of the 7th DS . pp. 278 { 289 ( 2004 )

7. Katoh , T. , Arimura , H. , Hirata , K. : A polynomial-delay polynomial-space algorithm for extracting frequent diamond episodes from event sequences . In: Proc. of the 13th PAKDD . pp. 172 { 183 . Springer Berlin Heidelberg ( 2009 )

8. Kaytoue , M. , Kuznetsov , S.O. , Napoli , A. : Revisiting Numerical Pattern Mining with Formal Concept Analysis . In: Proc. of the 24th IJCAI ( 2011 )

9. Kuznetsov , S.O. , Samokhin , M.V. : Learning closed sets of labeled graphs for chemical applications . In: Proc. of the 15th ILP , pp. 190 { 208 ( 2005 )

10. Lloyd , J.W. : Foundations of Logic Programming . Springer-Verlag New York, Inc.

11. Mannila , H. , Toivonen , H. , Inkeri

Verkamo

, A. : Discovery of frequent episodes in event sequences . Data Mining and Knowledge Discovery 1 ( 3 ), 259 { 289 ( 1997 )

12. Merwe , D. , Obiedkov , S. , Kourie , D.: AddIntent: A New Incremental Algorithm for Constructing Concept Lattices . In: Proc. of the 2nd ICFCA . pp. 372 { 385 ( 2004 )

13. Pasquier , N. , Bastide , Y. , Taouil , R. , Lakhal , L. : Discovering frequent closed itemsets for association rules . In: Prof. of the 7th ICDT . pp. 398 { 416 ( 1999 )

14. Shervashidze , N. , Schweitzer , P., van Leeuwen , E.J. , Mehlhorn , K. , Borgwardt , K.M. : Weisfeiler-lehman graph kernels . Journal of Machine Learning Research 12 , 2539 { 2561 ( 2011 )

15. Stumme , G. , Taouil , R. , Bastide , Y. , Pasquier , N. , Lakhal , L. : Computing iceberg concept lattices with titanic . Data & Knowledge Engineering 42 ( 2 ), 189 { 222 ( 2002 )

16. Uno , T. : An e cient algorithm for solving pseudo clique enumeration problem . Algorithmica 56 ( 1 ), 3 { 16 (Jan 2010 )

17. Vreeken , J., van Leeuwen, M. , Siebes , A. : Krimp: mining itemsets that compress . Data Mining and Knowledge Discovery 23 ( 1 ), 169 { 214 ( 2011 )

18. Xin , D., Cheng, H., Yan , X. , Han, J .: Extracting redundancy-aware top-k patterns . In: Proc. of the 12th KDD . pp. 444 { 453 . ACM ( 2006 )