<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pattern Structures for Understanding Episode Patterns</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Keisuke Otaki?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ikeda Madori</string-name>
          <email>m.ikedag@iip.ist.i.kyoto-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akihiro Yamamoto</string-name>
          <email>akihiro@i.kyoto-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Intelligence Science and Technology Graduate School of Informatics, Kyoto University</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We investigate an application of pattern structures for understanding episodes, which are labeled directed acyclic graphs representing event transitions. Since typical episode mining algorithms generate a huge number of similar episodes, we need to summarize them or to obtain compact representations of them for applying the outcome of mining to various problems. Though such problems have been well-studied for itemsets, summarization of episodes is still understudied. For a class called diamond episodes, we rst provide a pattern structure based on hierarchy of events to obtain small groups of episodes in the form of pattern concepts and lattice structures. To nd a summary via pattern concepts, we design an utility function for scoring concepts. After ranking concepts using some function and lattice structures, we try to sample a set of pattern concepts of high scores as a summary of episodes. We report our experimental results of our patten structure, and a ranking result of our simple utility function. Last we discuss pattern concept lattices and their applications for summarization problems.</p>
      </abstract>
      <kwd-group>
        <kwd>formal concept analysis</kwd>
        <kwd>pattern structure</kwd>
        <kwd>episode pattern</kwd>
        <kwd>pattern summarization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Knowledge Discovery from binary databases is a fundamental problem setting,
where binary databases represent that some objects have some features by their
1 entries. Because such a situation can be seen in many practical problems, both
theoretical and practical aspects of the problems have been studied.</p>
      <p>
        On a mathematical viewpoint, Formal Concept Analysis (FCA) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] has been
studied as a model of analyzing such binary databases. We deal with a context
K = (O; A; I) consisting of 1) a set O of objects, 2) a set A of attributes, and 3)
a binary relation I O A representing that an i-th object has a j-th attribute.
FCA adopts two functions f and g for analyzing O and A; f receives a set of
objects and returns a set of attributes which are commonly possessed by given
objects, and g receives a set of attributes and returns a set of objects which
? Keisuke Otaki is supported as a JSPS research fellow (DC2, 26 4555).
have commonly the input attributes. For X O and Y A, a tuple (X; Y ) is
called a concept if f (X) = Y and X = g(Y ). Computing the set of all concepts
is a fundamental but important task in FCA, which help us to analyze binary
databases. On a practical viewpoint, it is well-known that formal concepts are
related to closed itemsets studied in frequent itemset mining [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which are also
known as compact representations of itemsets.
      </p>
      <p>
        To deal with non-binary data in an FCA manner, pattern structures [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] have
been studied. A key idea is generalizing both the set intersection \ and the subset
relation , which are used in two functions f and g in FCA. The set intersection
\ is replaced with a meet operator u that extracts common substructures of two
objects. The subset relation is also replaced with a partial order v induced
by u, where v represents some embedding from an object into another. We now
assume that they would help us to understand complex data.
      </p>
      <p>In this paper, we investigate pattern structures and their applications for
understanding patterns, motivated by a requirement of summarization techniques
because a large numbers of patterns is always generated by some mining
algorithms. As an example in this paper, we deal with some classes of episode
patterns, which represent event transitions in the form of labeled graphs. From such
patterns, we can compute lattice structures based on pattern structures
(Section 3). Since such lattices represent mutual relations among patterns and several
small clusters as pattern concepts, analyzing them would be helpful to obtain a
small set of pattern concepts. We regard a subset of all concepts as a summary
of features often used in describing patterns, and develop a way of obtaining a
small set of concepts as a summary. When we construct descriptions of objects,
we also introduce the wildcard ? as a special symbol representing all events to
take into account some hierarchy of labels based on our knowledge. It would be
a strong merit of pattern structures for summarization in which similar patterns
could be merged into some descriptions with the wildcard ?. After providing
pattern structures, we provide preliminary experimental results (Section 4) and
discuss it on the viewpoint of summarization by giving a utility function for
ranking pattern concepts (Section 5).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Formal Concept Analysis and Episode Mining</title>
      <p>
        FCA and Pattern Structures We adopt the standard notations of FCA
from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and pattern structures from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], respectively. Here we refer the notations
of FCA which we have already used in Section 1. For a context K = (O; A; I),
X O and Y A, two functions f and g in FCA are formally de ned by f (X) =
fa 2 A j (o; a) 2 I for all o 2 Xg and g(Y ) = fo 2 O j (o; a) 2 I for all a 2 Y g,
respectively. Recall that a pair (X; Y ) is a (formal) concept if f (X) = Y and
g(Y ) = X. Two operators f g( ) and g f ( ) are closure operators on 2O and 2A,
respectively. Note that a concept (X; Y ) is in the form either (g(f (X)); f (X))
or (g(Y ); f (g(Y )). For two concepts (X1; Y1) and (X2; Y2), the partial order
is introduced by X1 X2 (, Y2 Y1).
      </p>
      <p>
        An important aspect of pattern structures is generalization of two operations
\ and used in f and g. They are characterized by meet semi-lattices : A meet
semi-lattice (D; u) of a set D and a meet operator u is an algebraic structure
satisfying: 1) Associativity; x u (y u z) = (x u y) u z for x; y; z 2 D, 2)
Commutativity; x u y = y u x for x; y 2 D, and 3) Idempotency; x u x = x for x 2 D.
Elements in D are called descriptions. A partial order v is induced by u as
x v y whenever x u y = x for two elements x; y 2 D:
Example 1 (A meet semi-lattice with sets). Let D = 2N. For two sets of integers
X = f1; 2g and Y = f1; 2; 3; 4g, it holds that X \ Y = X, which induces X Y .
Example 2 (A meet semi-lattice for closed intervals [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Let D be a set of all
closed intervals [a; b] with integers a; b 2 N. They de ne u of two closed intervals
by keeping its convexity, that is, [a1; b1] u [a2; b2] = [min(a1; a2); max(b1; b2)].
      </p>
      <p>By generalizing a meet semi-lattice (2A; \) used in FCA, a pattern structure
P is de ned by a triple (O; (D; u); ), where O is the set of objects, (D; u) is
a meet semi-lattice of descriptions, and : O ! D is a mapping of giving
a description for each object. For analyzing pattern structures, we obtain the
following Galois connection f( ) ; ( ) g corresponding to f and g in FCA:
A
= uo2A (o) for A</p>
      <p>O;
d = fo 2 O j d v (o)g for d 2 D:
Pattern concepts based on P are de ned by Equations (1) and (2):
De nition 1 (Pattern concepts) A pattern concept of P is a pair (A; d) of a
set A O and a pattern d 2 D satisfying A = d and d = A. Two pattern
concepts are partially ordered by (A1; d1) (A2; d2) by A1
A2 , d2 v d1.</p>
      <p>
        Note that by its partial order, the set of all pattern concepts forms a lattice
structure. We denote the set of all pattern concepts by P(P). To obtain P(P),
we need to compute two functions ( ) and ( ) . For example, we can adopt the
AddIntent proposed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and used in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Episode Mining We brie y review episode mining based on [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Let E =
f1 : : : ; mg N be the set of events. We call a set S E of events an event
set. An input of episode mining is a long sequence of event sets; an input event
sequence S on E is a nite sequence hS1; : : : ; Sni 2 (2E ) , where each set Si E
is the i-th event set. For S of length n, we assume that Si = ; if i &lt; 0 or i &gt; n.
      </p>
      <p>
        Episodes are labeled directed graphs (DAGs). An episode G is a triple (V; E; ),
where V is the set of vertices, E is the set of directed edges, and is the
labeling function from V and E to the set of labels, that is, E . Several classes of
episodes have been studied since episode mining is rstly introduced by
Mannila et. al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We follow subclasses of episodes studied by Katoh et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. An
(1)
(2)
A
      </p>
      <p>D
C</p>
      <sec id="sec-2-1">
        <title>3%serial,episode A</title>
        <p>B
E
B
E</p>
      </sec>
      <sec id="sec-2-2">
        <title>Diamond,episode A</title>
        <p>E
B
C
D</p>
        <p>2%serial,episode
A</p>
        <p>D</p>
        <p>C 1%serial,episode
E 3%serial,episode
example of episodes is illustrated in Figure 1. In designing pattern mining
algorithms, we need 1) a search space of patterns and a partial order for enumerating
patterns, and 2) interestingness measure to evaluate them. For episode mining,
we often adopt occurrences of episodes de ned with windows.</p>
        <p>De nition 1 (Windows). For a sequence S = hS1; : : : ; Sni, an window W of
S is a contiguous subsequence hSi; ; Si+w 1i of length n, called width, for
some index i ( w + 1 i n) of S and a positive integer w 0.
De nition 2 (Embedding of Episodes). Let G = (V; E; ) be an episode,
and W = hS1; : : : ; Swi be a window of width w. We say that G occurs in W if
there exists a mapping h : V ! f1; : : : ; wg satisfying 1) for all v 2 V , h(v) 2
Sh(x), and 2) for all (u; v) 2 E with u 6= v, it holds that h(u) &lt; h(v). The map
h is called an embedding of G into W , and it is denoted by G W .</p>
        <p>
          For an input event sequence S and a episode G, we say that G occurs at
position i of S if G Wi, where Wi = hSi; : : : ; Si+w 1i is the i-th window of
width w in S. We then call the index i an occurrence of G in S. The domain of
the occurrences is given by WS;w = fi j w + 1 i ng. In addition, WS;w(G)
is the occurrence window list of an episode G, de ned by f w + 1 i n j G
Wig. Then we can de ne an interestingness measure frequency of episodes.
De nition 3 (Frequency of Episodes). The frequency of an episode G in S
and w, denoted by freq S;w(G), is de ned by the number of windows of width w
containing G. That is, freq S;w(G) = jWS;w(G)j. For a threshold 1, a width
w and an input event sequence S, if freq S;w(G) , G is called -frequent on S.
The frequent episode mining problem is de ned as follows: Let P be a class of
episodes. Given an input event sequence S, a width w 1, and a frequency
threshold 1, the problem is to nd all -frequent episodes G belonging to
the class P. The simplest strategy of nding all -frequent episodes is traversing
P by using the anti-monotonicity of the frequency count freq ( ). For details, we
would like to refer to both [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>For our examples of classes, we introduce m-serial episodes and diamond
episodes. An m-serial episode over E is a sequence of events in the form of a1 7!
a2 7! 7! am. A diamond episode over E is either 1) a 1-serial episode e 2 E or
2) a proper diamond episode represented by a triple Q = ha; X; bi 2 E 2E E ,
where a; b are events and X E is an event set occurring after a and before
b. For short, we write a diamond episode as a 7! X 7! b. On the one hand
de nitions of episodes by graphs are much general, on the another hand classes
of episode patterns are often restricted.</p>
        <p>Example 3 (Episodes). In Figure 1, we show some serial episodes; A 7! B 7! E,
A 7! D 7! E, B 7! E, and C on the set of events E = fA; B; C; D; Eg. All of
them are included in a diamond episode A 7! fB; C; Dg 7! E.</p>
        <p>We explain a merit of introducing pattern structures for summarization of
structured patterns. As we mentioned above, a common strategy adopted in
pattern mining is traversing the space P in a breadth- rst manner with checking
some interestingness measure. When generating next candidates of frequent
patterns, algorithms always check a parent-child relation between two patterns. This
order is essential for pattern mining and we thus conjecture that this
parentchild relation used in pattern mining can be naturally adopted in constructing
a pattern structure for analyzing patterns only by introducing a similarity
operation u. After constructing a lattice, it would be helpful to analyze a set of all
patterns using it because they represent all patterns compactly.</p>
        <p>
          A crucial problem of pattern structures is the computational complexity
concerning both u and v. Our idea is to adopt trees of height 1 (also called stars
in Graph Theory). That is, we here assume that trees are expressive enough to
represent features of episodes. Our idea is similar that used in designing graph
kernels [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]1 and that is inspired by previous studies on pattern structures [
          <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Diamond Episode Pattern Structures</title>
      <p>
        In the following, we focus on diamond episodes as our objects, and trees of height
1 as our descriptions. They have two special vertices; the source and the sink.
They can be regarded as important features for representing event transitions.
We generate rooted labeled trees from them by putting the node in the root of
a tree, and regarding neighbors as children of it. Since heights of all trees here
are 1, we can represent them by tuples without using explicit graph notations.
De nition 4 (Rooted Trees of Height 1). Let (E ; uE ) be a meet semi-lattice
of event labels. A rooted labeled tree of height 1 is represented by a tuple 2
(e; C) 2 E 2E . We represent the set of all rooted labeled trees of height 1 by T.
Note that in (E ; uE ), we assume that uE compares labels based on our
background knowledge. We need to take care that this meet semi-lattice (E ; uE ) is
independent and di erent from a meet semi-lattice D of descriptions of a pattern
structure P. This operation uE is also adopted when de ning an embedding of
trees of height 1, that is, a partial order between trees de ned as follows.
1 It intuitively generates a sequence of graphs by relabeling all vertices of a graph. One
focus on a label of a vertex v 2 V (G) and sees labels LN G(v) of its neighbors NG(v).
For a tuple (lv; LN G(v)) for all vertices v 2 V (G), we sort all labels lexicographically,
and we assign a new label according to its representation. Details are seen in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
2 On the viewpoint of graphs, this tuple (e; C) should represent a graph G = (V; E; )
of V = f0; 1; : : : ; jCjg, E = f(0; i) j 1 i jCjg, (0) = e, f (i) j 1 i jCjg = C.
      </p>
      <sec id="sec-3-1">
        <title>An-episode G0</title>
        <p>source A</p>
        <p>E B</p>
      </sec>
      <sec id="sec-3-2">
        <title>An-episode G1</title>
        <p>A
source B
D
C</p>
        <p>D</p>
        <p>B
D neighbors</p>
      </sec>
      <sec id="sec-3-3">
        <title>D neighbors</title>
        <p>δ
δ
(G0)</p>
        <p>E
D B A
(G1)</p>
        <p>D
A B C D</p>
        <p>Label-comparing-at-root</p>
        <p>E E D = *
Generalization-of-children</p>
        <p>D B A
A B C D
=</p>
        <p>D B A
(G0) t (G1)</p>
        <p>*</p>
        <p>D B A
Semi&gt;lattice-for-events</p>
        <p>A B C D E
*</p>
        <p>De nition 5 (Partial Order on Trees). A tree t1 = (e1; C1) is a generalized
subtree of t2 = (e2; C2), denoted by t1 vT t2, i e1 vE e2 and there exists an
injection mapping : C1 ! C2 satisfying for all v 2 C1, there exists (v) 2 C2
satisfying v vE (v), where vE is the induced partial order by uE .</p>
        <p>
          For de ning a similarity operator uT between trees, this partial order vT
is helpful because uT is closely related to vT in our scenario. Since all trees
here are height 1, this computation is easy to describe; For labels of root nodes,
a similarity operator is immediately given by using uE . For their children, it
is implemented by using an idea of least general generalization (LGG), which
is used in Inductive Logic Programming [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], of two sets of labels. A practical
implementation of LGG depends on whether or not sets are multisets, but it is
computationally tractable. An example is seen in Figure 2.
        </p>
        <p>We give formal de nitions of and D. For a graph G = (V; E; ), we denote
the neighbors of v 2 V by NG(v). For some proper diamond episode pattern G,
the source vertex s 2 V and the sink vertex t 2 V , computed trees of height 1
corresponding s and t are de ned as Ts = ( s
f g [ NG(s); f(s; u) j u 2 NG(s)g; ),
and Tt = (ftg [ NG(t); f(u; t) j u 2 NG(t)g; ), respectively. By using those
trees, ( ) can be de ned according to vertices s and t: If we see both Ts and
Tt, (G) = (Ts; Tt) and then uT is adopted element-wise, and D is de ned by
T T. If we focus on either s or t, (G) = Ts or Tv, and we can use uT directly
by assuming D = I.</p>
        <p>
          Last we explain relations between our pattern structures and previous
studies shortly. This partial order vT is inspired from a generalized subgraph
isomorphism [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and a pattern structure for analyzing sequences [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. We here give
another description of similarity operators based on de nition used in [
          <xref ref-type="bibr" rid="ref4 ref9">4, 9</xref>
          ].
De nition 6 (Similarity Operation u based on [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]). The similarity
operation u is de ned by the set of all maximal common subtrees based on the
generalized subtree isomorphism vT ; For two trees s1 and s2 in T,
fu j u vT s1; s2; and 8u0 vT s1; s2 satisfying u 6vT u0g:
Input*sequence
*tseA A A AB B A A AB AB AB AB AB AB A
tenC C C C C C C C C C C C C C
ev D D D D D D D D D D D
        </p>
        <p>E E E E E E</p>
        <p>A</p>
        <p>B</p>
        <p>C C
D D D
E E
time</p>
        <p>Examples*of*episodes
B</p>
        <p>A
C
D</p>
        <p>A</p>
        <p>B</p>
        <p>
          B
A
C
D
Note that we can regard that our operator uT is a special case of the similarity
operation u above. On the viewpoint of pattern structures, our trees of height
1 can be regarded as an example of projections from graphs into trees, studied
in [
          <xref ref-type="bibr" rid="ref4 ref9">4, 9</xref>
          ], such as both k-chains (paths on graphs of length k) and k-cycles.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Discussion for Diamond Episodes</title>
      <p>
        Data and Experiments We gathered data from MLB baseball logs, where a
system records all pitching and plays for all games in a season. We used what
types of balls are used in pitching, which can be represented by histograms per
batter. For a randomly selected game, we generated an input event sequence of
episode mining by transforming each histogram to a set of types of balls used
types of balls3. In forming (E ; uE ), we let E be the set of types of balls, and de ne
uE naturally (See Example in Fig. 2). For this S, we applied a diamond episode
mining algorithm proposed by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and obtain a set of diamond episodes. The
algorithm have two parameters; the window size w and the frequency threshold
. We always set = 1 and varied w 2 f3; 4; 5g. After generating a set G of
frequent proper diamond episodes, we sampled M 2 f100; 200; : : : ; 700g episodes
from G as a subset O of G (that is, satisfying jOj = M and O G). We used O
as a set of objects in our pattern structure P. From it we computed all pattern
concepts P(P) based on our discussions in Section 3. In this experiments we set
(G) = Ts for a proper diamond episode G and its source vertex s.
3 In baseball games, pitchers throw many kinds of balls such as fast balls, cut balls,
curves, sinkers, etc. They are recorded together with its movements by MLB systems.
      </p>
      <p>A
G0 E B D</p>
      <p>D</p>
      <p>A
G1 B B B</p>
      <p>C</p>
      <p>A
G2 D CB B</p>
      <p>D</p>
      <p>C
G3 E D A</p>
      <p>A
G4 A B A</p>
      <p>C</p>
      <p>G0G2G3 *</p>
      <p>D *
G0G3 E</p>
      <p>D *
G0 E
A B D</p>
      <p>
        For experiments, we adopted an implementation by Katoh et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for
mining diamond episodes, which is written in C++4. We implemented the Galois
connection f( ) ; ( ) g by the AddIntent [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] algorithm using Python5.
Results and Discussion Table 1 shows the results of numbers of pattern
concepts and those of proper diamond episodes for each set corresponding w =
3; 4, and 5, with varying M 2 f100; 200; 300; 400; 500; 600; 700g. In Figure 4, we
show a result of pattern concepts and a pattern concept lattice computed from
only rst 5 episodes for the case w = 5, as an example of our lattice structures.
      </p>
      <p>Because we assume a semi-lattice (E ; uE ) of events in episode patterns, we
can obtain pattern concepts in which some vertices are represented by the
wildcard ?. If we implement ut without the wildcard ?, we can only obtain a much
smaller number of pattern concepts compared with our results including the
wildcard ?. We thus conjecture that the wildcard ? is useful to represent
similar patterns in Figure 4. On such a viewpoint, pattern structures and pattern
concepts help us to make some group of patterns, which are basically similar to
clustering patterns. Here we do not discuss details of computational complexity
of constructing pattern concept lattices, but the complexity basically depends on
the number of concepts. Thus it would be interesting to investigate and compare
several possible projections and computations of pattern concepts.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Pattern Summarization and Pattern Structures</title>
      <p>
        In this section, we discuss pattern summarization based on pattern concept
lattices. As we mentioned, closed itemsets [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] have been studied as compact
4 We adopt gcc4.7 as our compiler including c++11 features (-std=c++11). The
code is compiled on a machine of Mac OS X 10.9 with two 2.26 GHz Quad-Core
Intel Xeon Processors and 64GB main memory.
5 We used Python 2.7 without any additional packages (that is, pure python).
representations of itemsets, and they are closely related to the closure operator
g f in FCA with (O; A; I), where O is the set of transaction identi ers and
A is the set of all items. The di culty of closed patterns for complex data is
there are no common de nitions of closure operators, where we usually use the
closeness with respect to the frequency. Here we assume that pattern concepts
are helpful in the same correspondence between closed itemsets and concepts.
      </p>
      <p>To obtain some compact representations, we need to decide how to evaluate
each pattern. The problem here is how to deal with the wildcard ? in descriptions.
When we obtain a concept (X; Y ) for X O; Y A, this concept (X; Y )
corresponds to a rectangle on I, and there are no 0 entries in the sub-database
I0 = f(x; y) 2 I j x 2 X; y 2 Y g of I induced by (X; Y ) because of its de nitions.
If (X; Y ) is not a concept, a rectangle r by (X0; Y 0) contains a few 0 entries in
it. We denote the relative ratio of 1 entries in a rectangle r by (X0; Y 0) as
r1(X0; Y 0; I) = (1</p>
      <p>
        jf(x; y) 62 I j x 2 X0; y 2 Y 0gj) (jX0jjY 0j) 1 ;
where 0 r1(X0; Y 0; I) 1 and r1(X0; Y 0; I) = 1 if (X0; Y 0) is a concept. These
r1(X; Y; I), jXj, and jY j are applicable for evaluating itemsets. If we only use the
cardinality jAj of a set A of objects, this equals to the support counts computed
in Iceberg concept lattices [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. For a concept (X; Y ) of a context K = (O; A; I),
we compute the support count supp(X; Y ) = jg(Y )j=jOj and prune redundant
concepts by using some threshold. For formalizing evaluations of patterns, such
values are generalized by introducing a utility function u : P ! R+. A typical
and well-studied utility function is, of course, the frequency count, or the area
function area( ) which evaluates the size of a rectangle (X; Y ) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Based on discussions above, if we can de ne a utility function u( ) for
evaluating pattern concepts, a similar discussion for pattern concepts are possible;
choosing a few number of pattern concepts and constructing summary of
patterns with them. Of course, there are no simple way of giving such functions. We
try to introduce a simple and straightforward utility function uP ( ) for pattern
concepts as a rst step of developing pattern summarization via pattern concept
lattices. In this paper, we follow the idea used in tiling databases [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where a
key criterion is given by area( ). We consider how to compute the value which
corresponds to the area in binary databases. To take into account the wildcard
? used in descriptions, we de ne the following simple function. For d 2 D, we let
s(d) and n(d) be the numbers of non wildcard and all vertices in a description
d, respectively. Note that if s(d) = n(d), d contains no wildcard labels. By using
these functions, we compute utility values as follows:
      </p>
      <p>uP (A; d) = jAj log (1 + s(d)) :
5.1</p>
      <p>Experiments and Discussions
We compare results of ranking pattern concepts by 1) using only jAj (similar
to the Iceberg concept lattices), and 2) using uP ( ) as a utility function. From
the list of pattern concepts generated in experiments of Section 4, we rank all
jAj (?; f?g), (2; f?g), (0; f?g), (3; f?g), (1; f?g)
uP ( ) (?; f0; ?g), (?; f0; 2; 3g), (?; f0; 1; 2g), (?; f0; 1; 3g), (?; f1; 2; 3g)
pattern concepts by using a utility function, and sort the list in an ascending
order, and compare two lists. We remove patterns appearing commonly in both
lists to highlight di erences. We give our results in Table 2.</p>
      <p>In the result with uP ( ), larger descriptions appear with higher utility values
compared with those by j j</p>
      <p>A . We can see that by modifying terms concerning
?, results contain more informative nodes, which are labeled by non-wildcard
labels. Here we implicitly assume that descriptions contains less ? would be
more useful for understanding data themselves. On this viewpoint, considering
two terms s(d) and n(d) for description d would be interesting and useful way
to design utility functions for pattern concepts. We conclude that the Iceberg
lattice based support counts are less e ective if descriptions admit the wildcard
? for pattern summarization problems.</p>
      <p>Not only the simple computation in uP (A; d) used above, also many
alternatives could be applicable for ranking. Some probabilistic methods such as the
minimum description length (MDL), information-theoretic criteria would be also
helpful to analyze our study more clearly. Since pattern structures have no
explicit representations of binary cross tables, the di culty lies on how to deal
with a meet semi-lattice (D; u). For some pattern concept (A; d) and an object
o 2 O, we say that (A; d) subsumes o if and only if d v (o). This
subsumption relation would be simple and helpful to evaluate concepts, but they does
not adopt any complex information concerning hierarchy of events, or distances
between two descriptions. In fact in the experiments, we always assume that all
events except ? have the same weight and ? is the minimum of all events. They
could be important to take into account similarity measures of events for more
developments of ranking methods of pattern concepts.
5.2</p>
      <p>
        Related Work
There are several studies concerning our study. It is well-known that closed
itemsets correspond to maximal bipartite cliques on bipartite graphs constructed
from K = (O; A; I). Similarly, we sometimes deal with so called pseudo bipartite
cliques [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], where it holds that r1(X0; Y 0; I) 1 " with a user-speci ed
constance ". Obviously, pseudo bipartite cliques correspond to rectangles containing
a few 0. We can regard them as some summarization or approximation of closed
itemsets or concepts. Intuitively, if we use some pseudo bipartite cliques as
summarization, the value r1(X; Y; I) can be considered in evaluating (X; Y ). Pseudo
bipartite cliques can be regarded as noisy tiles, which is an extension of tiles [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Another typical approach for summarization is clustering patterns [
        <xref ref-type="bibr" rid="ref1 ref18">18, 1</xref>
        ]. A
main problem there is how to interpret clusters or centroids, where we need to
design a similarity measure and a space in which we compute the similarity. On the
viewpoint of probabilistic models, there is an analysis via the maximum entropy
principle [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However they assume that entries in a database are independently
sampled, and thus we cannot apply those techniques to our setting.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Toward Generalizations for Bipartite Episodes</title>
      <p>
        In this paper we assume that our descriptions by trees of height 1 are rich enough
to apply many classes of episode patterns. We here show how to apply our pattern
structure for other types of episodes, called bipartite episodes, as an example. An
episode G = (V; E; ) is a a partial bipartite episode if 1) V = V1[V2 for mutually
disjoint sets V1 and V2, 2) for every directed edge (x; y) 2 E, (x; y) 2 V1 V2. If
E = V1 V2, an episode G is called a proper bipartite episode. Obviously, vertices
in a bipartite episode G are separated into V1 and V2, and we could regard them
as generalizations of the source vertex and the sink vertex of diamond episodes.
This indicates that the same way is applicable for bipartite episodes by de ning
u between sets of tress. Fortunately, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] gives the de nition u for sets of graphs.
ft1; : : : ; tkg u fs1; : : : ; smg
0
      </p>
      <p>1
MAXvT @[(ftig u fsj g)A ;
i;j
where MAXvT (S) returns only maximal elements in S with respect to vT . Since
our generalized subtree isomorphism is basically a special case of that for graphs,
we can also apply this meet operation. This example suggest that if we have some
background knowledge concerning a partition of V , it can be taken into account
for and (D; u) in a similar manner of diamond and bipartite episodes.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions and Future Work</title>
      <p>In this paper we propose a pattern structure for diamond episodes based on an
idea used in graph kernels and projections of pattern structures. Since we do not
directly compute graph matching operations we conjecture that our computation
could be e cient. With a slight modi cation of u, our method is also applicable
for many classes of episodes, not only for diamond patterns as we mentioned
above. Based on our pattern structure, we discussed summarization by using
mined pattern concepts and show small examples and experimental results.</p>
      <p>Since problems of this type are unsupervised and there is no common way of
obtaining good results and of evaluating whether or not the results are good. It
would be interesting to study more about this summarization problem based on
concept lattices by taking into account theoretical backgrounds such as
probabilistic distributions. In our future work, we try to analyze theoretical aspects
on summarization via pattern structures including the wildcard ? and its
optimization problem to obtain compact and interesting summarization of many
patterns based on our important merit of a partial order v between descriptions.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was supported by Grant-in-Aid for JSPS Fellows (26 4555) and JSPS
KAKENHI Grant Number 26280085.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Al</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Chaoji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Salem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Besson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Zaki</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          : Origami:
          <article-title>Mining representative orthogonal graph patterns</article-title>
          .
          <source>In: Proc. of the 7th ICDM</source>
          . pp.
          <volume>153</volume>
          {
          <issue>162</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Buzmakov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Egho</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jay</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Rassi, C.:
          <article-title>The representation of sequential patterns and their projections within Formal Concept Analysis</article-title>
          .
          <source>In: Workshop Notes for LML (ECML/PKDD2013)</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. De Bie, T.:
          <article-title>Maximum entropy models and subjective interestingness: an application to tiles in binary databases</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>23</volume>
          (
          <issue>3</issue>
          ),
          <volume>407</volume>
          {
          <fpage>446</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          :
          <article-title>Pattern structures and their projections</article-title>
          .
          <source>In: Proc. of the 9th ICCS</source>
          . pp.
          <volume>129</volume>
          {
          <issue>142</issue>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <source>Formal concept analysis - mathematical foundations</source>
          . Springer (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Geerts</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goethals</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Mielik ainen, T.:
          <article-title>Tiling databases</article-title>
          .
          <source>In: Proc. of the 7th DS</source>
          . pp.
          <volume>278</volume>
          {
          <issue>289</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Katoh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arimura</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hirata</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A polynomial-delay polynomial-space algorithm for extracting frequent diamond episodes from event sequences</article-title>
          .
          <source>In: Proc. of the 13th PAKDD</source>
          . pp.
          <volume>172</volume>
          {
          <fpage>183</fpage>
          . Springer Berlin Heidelberg (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kaytoue</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Napoli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Revisiting Numerical Pattern Mining with Formal Concept Analysis</article-title>
          .
          <source>In: Proc. of the 24th IJCAI</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samokhin</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          :
          <article-title>Learning closed sets of labeled graphs for chemical applications</article-title>
          .
          <source>In: Proc. of the 15th ILP</source>
          , pp.
          <volume>190</volume>
          {
          <issue>208</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lloyd</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          :
          <article-title>Foundations of Logic Programming</article-title>
          . Springer-Verlag New York, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mannila</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toivonen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inkeri</surname>
            <given-names>Verkamo</given-names>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Discovery of frequent episodes in event sequences</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <volume>259</volume>
          {
          <fpage>289</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Merwe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Obiedkov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kourie</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>AddIntent: A New Incremental Algorithm for Constructing Concept Lattices</article-title>
          .
          <source>In: Proc. of the 2nd ICFCA</source>
          . pp.
          <volume>372</volume>
          {
          <issue>385</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pasquier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Discovering frequent closed itemsets for association rules</article-title>
          .
          <source>In: Prof. of the 7th ICDT</source>
          . pp.
          <volume>398</volume>
          {
          <issue>416</issue>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Shervashidze</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schweitzer</surname>
          </string-name>
          , P.,
          <string-name>
            <surname>van Leeuwen</surname>
            ,
            <given-names>E.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehlhorn</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgwardt</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          :
          <article-title>Weisfeiler-lehman graph kernels</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <volume>2539</volume>
          {
          <fpage>2561</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Stumme</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastide</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasquier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Computing iceberg concept lattices with titanic</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          <volume>42</volume>
          (
          <issue>2</issue>
          ),
          <volume>189</volume>
          {
          <fpage>222</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Uno</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>An e cient algorithm for solving pseudo clique enumeration problem</article-title>
          .
          <source>Algorithmica</source>
          <volume>56</volume>
          (
          <issue>1</issue>
          ),
          <volume>3</volume>
          {
          <fpage>16</fpage>
          (Jan
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Vreeken</surname>
            , J., van Leeuwen,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siebes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Krimp: mining itemsets that compress</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>23</volume>
          (
          <issue>1</issue>
          ),
          <volume>169</volume>
          {
          <fpage>214</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Xin</surname>
          </string-name>
          , D., Cheng, H.,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , Han,
          <string-name>
            <surname>J</surname>
          </string-name>
          .:
          <article-title>Extracting redundancy-aware top-k patterns</article-title>
          .
          <source>In: Proc. of the 12th KDD</source>
          . pp.
          <volume>444</volume>
          {
          <fpage>453</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>