<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LCM ver. 2: E cient Mining Algorithms for Frequent/Closed/Maximal Itemsets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Takeaki Uno</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masashi Kiyomi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hiroki Arimura</string-name>
          <email>arim@ist.hokudai.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>National Institute of Informatics</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hitotsubashi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiyoda-ku</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tokyo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Japan e-mail: uno@nii.jp</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>masashi@grad.nii.ac.jp</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Science and Technology, Hokkaido University Kita 14-jo Nishi 9-chome</institution>
          ,
          <addr-line>060-0814 Sapporo</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>For a transaction database, a frequent itemset is an itemset included in at least a speci ed number of transactions. A frequent itemset P is maximal if P is included in no other frequent itemset, and closed if P is included in no other itemset included in the exactly same transactions as P . The problems of nding these frequent itemsets are fundamental in data mining, and from the applications, fast implementations for solving the problems are needed. In this paper, we propose e cient algorithms LCM (Linear time Closed itemset Miner), LCMfreq and LCMmax for these problems. We show the e ciency of our algorithms by computational experiments compared with existing algorithms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Frequent item set mining is one of the fundamental
problems in data mining and has many applications
such as association rule mining, inductive databases,
and query expansion. From these applications, fast
implementations of frequent itemset mining problems
are needed. In this paper, we propose the second
versions of LCM, LCMfreq and LCMmax, for
enumerating closed, all and maximal frequent itemsets.
LCM is an abbreviation of Linear time Closed item
set Miner .</p>
      <p>
        In FIMI03[
        <xref ref-type="bibr" rid="ref3">7</xref>
        ], we proposed the rst version of
LCM, which is for enumerating frequent closed
itemsets. LCM uses pre x preserving closure extension
(ppc extension in short), which is an extension from
a closed itemset to another closed itemset. The
extension induces a search tree on the set of frequent
closed itemsets, thereby we can completely
enumerate closed itemsets without duplications. Generating
a ppc extension needs no previously obtained closed
itemset. Hence, the memory use of LCM does not
depend on the number of frequent closed itemsets,
even if there are many frequent closed itemsets.
      </p>
      <p>The time complexity of LCM is theoretically
bounded by a linear function in the number of
frequent closed itemsets, while the existing algorithms
are not. We further developed algorithms for the
frequency counting, occurrence deliver and hybrid of
di sets. They reduce the practical computation time
e ciently. Moreover, the framework of LCM is
simple. Generating ppc extensions needs no
sophisticated data structure such as binary trees. LCM is
implemented with only arrays. Therefore, LCM is
fast, and outperforms than other algorithms for some
sparse datasets.</p>
      <p>However, LCM does not have any routine for
reducing the database, while many existing algorithms
have. Thus, the performance of LCM is not good for
dense datasets with large minimum supports, which
involve many unnecessary items and transactions.
At FIMI03, we also proposed modi cations of LCM,
LCMfreq and LCMmax, for enumerating all frequent
itemsets and maximal frequent itemsets. Although
they are fast for some instances, if LCM is not fast
for an instance, they are also not fast for the instance.
Existing maximal frequent itemset mining algorithms
have e cient pruning methods to reduce the number
of iterations, while LCMmax does not have. It is also
a reason of the slowness of LCMmax.</p>
      <p>This paper proposes the second version of LCM
algorithms. We added database reduction to LCM,
so that problems of dense datasets can be solved in
short time. The second version of LCMmax includes
a pruning method, thus the computation time is
reduced when the number of maximal frequent itemsets
is small. We further developed new algorithms for
checking the maximality of a frequent itemset and
for taking the closure of an itemset. We compare
the performance of LCM algorithms and other
algorithms submitted to FIMI03 by computational
experiments. In many instances, LCM algorithms perform
above other algorithms.</p>
      <p>The organization of the paper is as follows.
Section 2 introduces preliminaries. The main algorithms
and practical techniques of LCM algorithms are
described in Section 3. Section 4 shows the results of
computational experiments, and Section 5 concludes
the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>Let I = f1; :::; ng be the set of items. A transaction
database on I is a set T = ft1; : : : ; tmg such that each
ti is included in I. Each ti is called a transaction. We
denote by jjT jj the sum of sizes of all transactions in
T , that is, the size of database T . A set P I is
called an itemset.</p>
      <p>For itemset P , a transaction including P is called
an occurrence of P . The denotation of P , denoted
by T (P ) is the set of the occurrences of P . jT (P )j is
called the frequency of P; and denoted by f rq(P ): For
given constant , called a minimum support, itemset
P is frequent if f rq(P ) . If a frequent itemset P
is included in no other frequent itemset, P is called
maximal. For any itemsets P and Q, T (P [ Q) =
T (P )\T (Q) holds, and if P Q then T (Q) T (P ).
An itemset P is called closed if no other itemset Q
satis es T (P ) = T (Q); P Q.</p>
      <p>
        Given set S T of transactions, let I(S) be the set
of items common to all transactions in S, i.e., I(S) =
TT 2S T . Then, we de ne the closure of itemset P
in T , denoted by clo(P ), by I(T (P ))(= Tt2T (P ) t).
For every pair of itemsets P and Q, the following
properties hold[
        <xref ref-type="bibr" rid="ref10 ref9">13, 14</xref>
        ].
      </p>
      <p>(1) If P Q, then clo(P ) clo(Q).
(2) If T (P ) = T (Q); then clo(P ) = clo(Q).
(3) clo(clo(P )) = clo(P ).
(4) clo(P ) is the unique smallest closed itemset
including P .
(5) A itemset P is a closed itemset if and only
if clo(P ) = P .</p>
      <p>For itemset P and item i 2 P , let P (i) = P \
f1; : : : ; ig be the subset of P consisting only of
elements no greater than i, called the i-pre x of P .
An itemset Q is a closure extension of an itemset
P if Q = clo(P [ fig) holds for some i 62 P . If
Q is a closure extension of P , then Q P; and
f rq(Q) &lt; f rq(P ): We call the item with the
maximum index in P the tail of P , and denote by tail(P ).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Algorithms for E meration cient Enu</title>
      <p>In this section, we explain the techniques used in the
second versions of LCM algorithms. We explain them
one-by-one with comparing to the techniques used
by the other algorithms, in the following subsections.
The new techniques used in the second version are:
3.2. new database reduction (reduce the frequency
counting cost)
3.6. database reduction for fast checking closedness
3.8. database reduction for fast checking maximality
3.7. new pruning algorithm for backtracking-based
maximal frequent itemset mining.</p>
      <p>The techniques also used in the rst versions are:
3.4. occurrence deliver (compute frequency in
linear time)
3.5. ppc extension (generates closed itemsets with
neither memory nor duplication)
3.3. hypercube decomposition (fast enumeration by
grouping frequent itemsets by equivalence class).</p>
      <p>The techniques used in the existing algorithms and
the rst version have citations to the previous papers.
3.1</p>
      <sec id="sec-3-1">
        <title>Enumerating Frequent Itemsets</title>
        <p>Any itemset included in a frequent itemset is
itself frequent. Thereby, the property \frequent" is
monotone. From this, we can construct any frequent
itemset from the empty set by adding items
one-byone without passing through any infrequent itemset.
Roughly speaking, the existing algorithms are
classied into two groups, and algorithms in both groups
use this property.</p>
        <p>The rst group is so called apriori or level-by-level
algorithms [1, 2]. Let Dk be the set of frequent
itemsets of size k. Apriori algorithms start with D0, that
is f;g, and compute Dk from Dk 1 in the increasing
order of k from k = 1. Any itemset in Dk is obtained
from an itemset of Dk 1 by adding an item. Apriori
algorithms add every item to each itemset of Dk 1,
and choose frequent itemsets among them. If Dk = ;
holds for some k, then Dk0 = ; holds for any k0 &gt; k.
Thus, apriori algorithms stop at such k. This is the
scheme of apriori algorithms.</p>
        <p>
          The other group is so called backtracking
algorithms[
          <xref ref-type="bibr" rid="ref14 ref15">3, 18, 19</xref>
          ]. Backtracking algorithm is
based on recursive calls. An iteration of a
backtracking algorithm inputs a frequent itemset P , and
generates itemsets by adding every items to P . Then, for
each itemset being frequent among them, the
iteration generates recursive calls with respect to it. To
avoid duplications, an iteration of backtracking
algorithms adds items with indices larger than the tail
of P . We describe the framework of backtracking
algorithms as follows.
        </p>
        <p>ALGORITHM BackTracking (P :current solution)
1. Output P
2. For each e 2 I; e &gt; tail(P ) do
3. If P [ feg is frequent then</p>
        <p>call BackTracking (P [ feg)</p>
        <p>An execution of backtracking algorithms gives a
tree structure such that the vertices of the tree are
iterations, and edges connect two iterations if one of
the iteration calls the other. If an iteration I
recursively calls another iteration I 0, then we say that I
is the parent of I0, and I0 is a child of I. For an
iteration, the itemset received from the parent is called
the current solution.</p>
        <p>Apriori algorithms use much memory for storing
Dk in memory, while backtracking algorithms use
less memory since they keep only the current
solution. Backtracking algorithms need no computation
for maintaining previously obtained itemsets, so the
computation time of backtracking algorithms is
generally short. However, apriori algorithms have
advantages for the frequency counting.</p>
        <p>LCM algorithms are based on backtracking
algorithms, and use an e cient techniques for the
frequency counting, which are occurrence deliver
and anytime database reduction described below.
Hence, LCM algorithms compute the frequency e
ciently without keeping previously obtained itemsets
in memory.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Maintaining Databases</title>
        <p>In the existing studies, database reduction is said to
be important to reduce the computation time. It is
to reduce the input database as the following rules:
1. remove each item included in less than transactions
2. remove each item included in all transactions
3. merge the identical transactions into one.</p>
        <p>Database reduction performs well when the
minimum support is large, and many existing algorithms
use it. LCM algorithms also use database reduction.</p>
        <p>
          In the existing studies, the input databases are
often stored and maintained by using FP-tree
(frequent pattern tree), which is a version of pre x tree
(trie) [
          <xref ref-type="bibr" rid="ref5">9</xref>
          ]. By using FP-tree, we can search speci ed
transactions from the datasets e ciently. FP-tree
compresses the common pre x, so we can decrease
the memory use. In addition, FP-tree can detect the
identical transactions, thus we can merge them into
one. This merge accelerates the frequency counting.
From these reasons, FP-trees are used in many
algorithms and implementations.
        </p>
        <p>Although FP-tree has many good advantages, we
do not use it in the implementation of LCM, but use
simple arrays. The main reason is that LCM does
not have to search transactions in the database. The
main operation of LCM is tracing the transactions in
the the denotation of the current solution. Thus, we
do not need to use sophisticated data structures for
searching.</p>
        <p>The other reason is the computation time for the
initialization. If we use a standard binary tree for
implementing FP-tree, the initialization of the
input database takes O(jjT jj + jT j log jT j) time. for
constructing FP-tree in memory. Compared to this,
LCM detects the identical transactions and stores
the database in memory within linear time of the
database size. This is because that LCM uses radix
sort for this task, which sorts the transactions in a
lexicographic order in linear time. In general, the
datasets of data mining problems have many
transactions, and each transaction has few items. Thus,
jjT jj is usually smaller than jT j log jT j, and LCM has
an advantage. The constant factors of the
computation time of binary tree operations are relatively
larger than that of array operations. LCM also has
an advantage at this point. Again, we recall that
LCM never search the transactions, so each
operation required by LCM can be done in constant time.</p>
        <p>FP-tree has an advantage in reducing the memory
use. This memory reduction can also reduce the
computation time of the frequency counting. To
check the e ciency of the reduction, we checked
the reduction ratio by FP-tree for some datasets
examined in FIMI03. The result is shown in Table
1. Each cell shows the ratio of the number of items
needed to be stored by arrays and FP-tree. Usually,
the input database is reduced in each iteration,
hence we sum up the numbers over all iterations to
compute the ratio. In the results of our experiments,
the ratio was not greater 3 in many instances. If
jIj is small, jT j is large, and the dataset has a
randomness, such as accidents, the ratio was up to 6.
Generally, a binary tree uses memory three times as
much as an array. Thus, the performance of FP-tree
seems to be not quite good rather than an array in
both memory use and computation time, for many
datasets.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Hypercube Decomposition</title>
        <p>
          LCM nds a number of frequent itemsets at once for
reducing the computation time[
          <xref ref-type="bibr" rid="ref14">18</xref>
          ]. Since the
itemsets obtained at once compose a hypercube in the
itemset lattice, we call the technique hypercube
decomposition. For a frequent itemset P , let H(P )
be the set of items e satisfying e &gt; tail(P ) and
T (P ) = T (P [ feg). Then, for any Q H(P ),
T (P [ Q) = T (P ) holds, and P [ Q is frequent.
LCMfreq uses this property. For two itemsets P and
P [ Q, we say that P 0 is between P and P [ Q if
P P 0 P [ Q: In the iteration with respect to
P , we output all P 0 between P and P [ H(P ). This
saves about 2jH(P )j times of the frequency counting.
        </p>
        <p>To avoid duplications, we do not generate
recursive calls with respect to items included in H(P ).
Instead of generating these recursive calls, we output
frequent itemsets including items of H(P ) in
recursive calls with respect to items not included in H(P ).
When the algorithm generates a recursive call with
respect to e 62 H(P ); we pass H(P ) to it. In the
recursive call, we output all itemsets between P [ feg
and P [ feg [ H(P ) [ H(P [ feg). Since any itemset
Q satis es T (P [ Q [ H(P )) = T (P [ Q), the
itemsets output in the recursive calls are frequent. We
describe hypercube decomposition as follows.
ALGORITHM HypercubeDecomposition
(P :current solution, S:itemset)
S0 := S [ H(P )
Output all itemsets including P</p>
        <p>and included in P [ S0
For each item e 2 I n (P [ S0), e &gt; tail(P ) do
If P [ feg is frequent then</p>
        <p>call HypercubeDecomposition (P [ feg; S0)
End for
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Frequency Counting</title>
        <p>Generally, the most heavy part of the frequent
itemset mining is the frequency counting, which is to
count the number of transactions including a newly
generated itemset. To reduce the computation time,
existing algorithms uses down project. For an itemset
P , down project computes its denotation T (P ) by
using two subsets P1 and P2 of P . If P = P1 [ P2, then
T (P ) = T (P1)\T (P2). Under the condition that the
items of P1 and P2 are sorted by their indices, the
intersection can be computed in O(jT (P1)j + jT (P2)j)
time. Down project uses this property, and computes
the denotations quickly. Moreover, if jT (P1)j &lt; or
jT (P2)j &lt; holds, we can see that P never be
frequent. It also helps to reduce the computation time.</p>
        <p>Apriori-type algorithms accelerates the frequency
counting by nding a good pair P1 and P2 of subsets
of P , such that jT (P1)j + jT (P2)j is small, or either
P1 or P2 is infrequent. Backtracking algorithm adds
an item e to the current solution P in each
iteration, and compute its denotation. By using T (feg),
the computation time for the frequency counting is
reduced to O(Pe&gt;tail(P )(jT (P )j + jT (feg)j)).</p>
        <p>
          The bitmap method[
          <xref ref-type="bibr" rid="ref1">5</xref>
          ] is a technique for
speeding up the computation of taking the intersection in
down project. It uses a bitmap image (the
characteristic vector) of the denotations. To take the
intersection, we have to take O(jT j) time with bitmap.
However, a 32bit CPU can take the intersection of
32bits at once, thus roughly speaking the
computation time is reduced to 1/32. This method has a
disadvantage for sparse datasets, and is not
orthogonal to anytime database reduction described in the
below. From the results of the experiments in FIMI
03, bitmap method seems to be not good for sparse
large datasets.
        </p>
        <p>
          LCM algorithms use another method for the
frequency counting, called occurrence deliver[
          <xref ref-type="bibr" rid="ref14 ref15">18, 19</xref>
          ].
Occurrence deliver computes the denotations of P [
feg for e = tail(P ) + 1; :::; jIj at once by tracing
transactions in T (P ). It use a bucket for each e to
be added, and set them to empty set at the
beginning. Then, for each transaction t 2 T (P ),
occurrence deliver inserts t to the bucket of e for each
e 2 t; e &gt; tail(P ). After these insertions, the bucket
of e is equal to T (P [ feg). For each transaction t,
occurrence deliver takes O(jt \ ftail(P ) + 1; :::; jIjgj)
time. Thus, the computation time is O(PT 2T (P ) jT \
ftail(P )+1; :::; jIjgj) =O(jT (P )j+Pe&gt;tail(P ) jT (P [
feg)j). This time complexity is smaller than down
project. We describe the pseudo code of occurrence
deliver in the following.
        </p>
        <p>ALGORITHM OccurrenceDeliver</p>
        <p>(T:database, P :itemset)
1. Set Bucket[e] := ; for each item e &gt; tail(P )
6.01</p>
        <p>1</p>
        <p>Let F (P ) be the set of items e such that e &gt;
tail(P ) and P [ feg is frequent. Apriori algorithms
have possibility to nd out in short time that P [ feg
is infrequent, thus, in the best case, their
computation time can be reduced to O(Pe2F (P ) jT (P [feg)j).
If Pe&gt;tail(P );e62F (P ) jT (P [ feg)j is large, occurrence
deliver will be slow.</p>
        <p>To decrease Pe&gt;tail(P );e62F (P ) jT (P [ feg)j; LCM
algorithms sort indices of items e in the increasing
order of jT (feg)j. As we can see in Table 1, this
sort reduces Pe&gt;tail(P );e62F (P ) jT (P [ feg)j to 1/4 of
Pe&gt;tail(P ) jT (P [ feg)j in many cases. Since apriori
algorithms take much time to maintain previously
obtained itemsets, the possibility of speeding up by
apriori algorithms is not so large.</p>
        <p>LCM algorithms further speeds up the frequency
counting by iteratively reducing the database.
Suppose that an iteration I of a backtracking algorithm
receives a frequent itemset P from its parent. Then,
in any descendant iteration of I, no item of indices
smaller than tail(P ) is added. Hence, any such item
can be removed from the database while the
execution of the descendant iterations. Similarly, the
transactions not including P never include the
current solution of any descendant iteration, thus such
transactions can be removed while the execution of
the descendant iterations. Indeed, infrequent items
can be removed, and the identical transactions can
be merged.</p>
        <p>According to this, LCM algorithms recursively
reduce the database while the execution of recursive
calls. Before the recursive call, LCM algorithms
generate a reduced database according to the above
discussion, and pass it to the recursive call. We call this
technique anytime database reduction.</p>
        <p>Anytime database reduction reduces the
computation time of the iterations located at the lower
levels of the recursion tree. In the recursion tree,
many iterations are on the lower levels and few
iterations are on the upper levels. Thus, anytime
database reduction is expected to be e cient. In
our experiments, anytime database reduction works
quite well. The following table shows the e ciency
of anytime database reduction. We sum up over all
iterations the sizes of the database received from
the parent, in both cases with anytime database
reduction and without anytime database reduction.
Each cell shows the sum. The reduction ratio is large
especially if the dataset is dense and the minimum
support is large.
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Pre x Preserving Closure Extension</title>
        <p>Many existing algorithms for mining closed itemsets
are based on frequent itemset mining. That is, the
algorithms enumerate frequent itemsets, and output
those being closed. This approach is e cient when
the number of frequent itemsets and the number of
frequent closed itemsets di er not so much.
However, if the di erence between them is large, the
algorithms generate many non-closed frequent itemsets,
thus they will be not e cient. Many pruning
methods have been developed for speeding up, however
they are not complete. Thus, the computation time
is not bounded by a linear function in the number of
frequent closed itemsets. There is a possibility of over
linear increase of computation time in the number of
output.</p>
        <p>
          LCM uses pre x preserving closure extension
(ppcextension in short) for generating closed itemsets[
          <xref ref-type="bibr" rid="ref14 ref15">18,
19</xref>
          ]. For a closed itemset P , we de ne the closure
tail clo tail(P ) by the item i of the minimum index
satisfying clo(P (i)) = P . clo tail(P ) is always
included in P . We say that P 0 is a ppc extension of P
if P 0 = clo(P [ feg) and P 0(e 1) = P (e 1) hold for
an item e &gt; clo tail(P ): Let P0 be the itemset
satisfying T (P 0) = T . Any closed itemset P 0 6= P0 is a
ppc extension of another closed itemset P , and such
P is unique for P 0. Moreover, the frequency of P is
strictly larger than P 0, hence ppc extension induces a
rooted tree on frequent closed itemsets. LCM starts
from P0, and nds all frequent closed itemsets in a
depth rst manner by recursively generating ppc
extensions. The proof of ppc extension algorithms are
described in [
          <xref ref-type="bibr" rid="ref14 ref15">18, 19</xref>
          ].
        </p>
        <p>By ppc extension, the time complexity is bounded
by a linear function in the number of frequent closed
itemsets. Hence, the computation time of LCM never
be super linear in the number of frequent closed
itemsets.
3.6</p>
      </sec>
      <sec id="sec-3-6">
        <title>Closure Operation</title>
        <p>To enumerate closed itemsets, we have to check
whether the current solution P is a closed itemset
or not. In the existing studies, there are two
methods for this task. The rst method is to store in
memory previously obtained itemsets which are
currently maximal among itemsets having the identical
denotation. In this method, we nd frequent itemsets
one-by-one, and store them in memory with
removing itemsets included in another itemset having the
identical denotation. After nding all frequent
itemsets, only closed itemsets remain in memory. We call
this storage method. The second method is to
generate the closure of P . By adding to P all items e such
that f rq(P ) = f rq(P [ feg), we can construct the
closure of P . We call the second closure operation.</p>
        <p>LCM uses closure operations for generating ppc
extensions. Similar to the frequency counting, we use
database reduction for closure operation. Suppose
that the current solution is P , the reduced database
is composed of transactions S1; :::; Sh, and each Sl is
obtained from transactions T1l; :::; Tkl of the original
database. For each Sl, we de ne the interior
intersection In(Sl) by TT 2fT1l;:::;Tklg T . Here the closure
of P is equal to TS2fS1;:::;Shg In(S). Thus, by using
interior intersections, we can e ciently construct the
closure of P .</p>
        <p>When we merge transactions to reduce the
database, interior intersections can be updated e
ciently, by taking the intersection of their interior
intersections. In the same way as the frequency
counting, we can remove infrequent items from the interior
intersections for more reduction. The computation
time for the closure operation in LCM depends on
the size of database, but not on the number of
previously obtained itemsets. Thus, storage method has
advantages if the number of frequent closed itemsets
is small. However, for the instances with a lot of
frequent closed itemsets, which take long time to be
solved, LCM has an advantage.
3.7</p>
      </sec>
      <sec id="sec-3-7">
        <title>Enumerating</title>
      </sec>
      <sec id="sec-3-8">
        <title>Itemsets</title>
      </sec>
      <sec id="sec-3-9">
        <title>Maximal Frequent</title>
        <p>Many existing algorithms for maximal frequent
itemset enumeration are based on the enumeration of
frequent itemsets. In breadth- rst manner or
depthrst manner, they enumerate frequent itemsets and
output maximal itemsets among them. To reduce the
computation time, the algorithms prune the
unnecessary itemsets and recursive calls.</p>
        <p>Similar to these algorithms, LCMmax enumerates
closed itemsets by backtracking, and outputs
maximal itemsets among them. It uses a pruning to cut
o unnecessary branches of the recursion. The
pruning is based on a re-ordering of the indices of items,
in each iteration. We explain the re-ordering in the
following.</p>
        <p>Let us consider a backtracking algorithm for
enumerating frequent itemsets. Let P be the current
solution of an iteration of the algorithm. Suppose that
P 0 is a maximal frequent itemset including P .
LCMmax puts new indices to items with indices larger
than tail(P ) so that any item in P 0 has an index
larger than any item not in P 0. Note that this
reordering of indices has no e ect to the correctness of
the algorithm.</p>
        <p>Let e &gt; tail(P ) be an item in P 0, and consider the
recursive call with respect to P [ feg. Any frequent
itemset P^ found in the recursive call is included in
P 0, since every item having an index larger than e
is included in P 0, and the recursive call adds to P
items only of indices larger than e. From this, we
can see that by the re-ordering of indices, recursive
calls with respect to items in P 0 \ H generates no
maximal frequent itemset other than P 0.</p>
        <p>According to this, an iteration of LCMmax chooses
an item e 2 H, and generates a recursive call with
respect to P [ fe g to obtain a maximal frequent
itemset P 0. Then, re-orders the indices of items other
than e as the above, and generates recursive calls
with respect to each e &gt; tail(P ) not included in P 0 [
fe g. In this way, we save the computation time for
nding P 0, and by nding a large itemset, increase
the e ciency of this approach. In the following, we
describe LCMmax.</p>
        <p>ALGORITHM LCMmax (P :itemset, H:items to
be added)
1. H0 := the set of items e in H s.t. P [ feg is frequent
2. If H0 = ; then
3. If P [ feg is infrequent for any e then
output P ; return
4. End if
5. End if
6. Choose an item e 2 H0 ; H0 := H0 n fe g
7. LCMmax (P [ feg, H0)
8. P 0 := frequent itemset of the maximum size
found in the recursive call in 7
9. For each item e 2 H n P 0 do
10. H0 := H0 n feg
11. LCMmax (P [ feg, H0)
12. End for
3.8</p>
      </sec>
      <sec id="sec-3-10">
        <title>Checking Maximality</title>
        <p>When LCMmax nds a frequent itemset P , it checks
the current solution is maximal or not. We call
this operation maximality check. Maximality check
is a heavy task, thus many existing algorithms avoid
it. They store in memory maximal itemsets among
previously obtained frequent itemsets, and update
them when they nd a new itemset. When the
algorithms terminate and obtain all frequent itemsets,
only maximal frequent itemsets remain in memory.
We call this storage method. If the number of
maximal frequent itemsets is small, storage method is
e cient. However, if the number is large, storage
method needs much memory. When a frequent
itemset is newly found, storage method checks whether
the itemset is included in some itemsets in the
memory or not. If the number of frequent itemsets is large,
the operation takes long time.</p>
        <p>To avoid the disadvantage of storage method,
LCMmax operates maximality check. LCMmax
checks the maximality by nding an item e such
that P [ feg is frequent. If and only if such e
exists, P is not maximal. To operate this e ciently,
we reduce the database. Let us consider an
iteration of LCMmax with respect to a frequent itemset
P . LCM algorithms reduce the database by anytime
database reduction for the frequency counting.
Suppose that the reduced database is composed of
transactions S1; :::; Sh, and each Sl is obtained by
merging transactions T1l; :::; Tkl of the original database.
Let H be the set of items to be added in the
iteration. Suppose that we remove all items e from H
such that P [ feg is infrequent. Then, for any l;
T1l \ H = T2l \ H =; :::; = Tkl \ H holds. For an item e
and a transaction Sl, we de ne the weight w(e; Sl) by
the number of transactions in T1l; :::; Tkl including e.
Here the frequency of P [feg is PS2fS1;:::;Shg w(e; S).
Thus, by using the weights, we can e ciently check
the maximality, in linear time of the size of the
reduced database.</p>
        <p>When we merge transactions to reduce the
database, the weights can be updated easily. For each
item e, we take the sum of w(e; S) over all
transactions S to be merged. In the same way as frequency
counting, we can remove infrequent items from the
database for maximality checking, for more
reduction.</p>
        <p>The computation time for maximality check in
LCMmax depends on the size of database, but not
on the number of previously obtained itemsets. Thus,
storage method has advantages if the number of
maximal frequent itemsets is small, but for the instances
with a lot of maximal frequent itemsets, which take
long time to be solved, LCMmax has an advantage.
BBBMMMSSS--W-WWeebebVbVVieieiweww--2-2-2-a-alalll
BBBMMMSSS--W-WWeebebVbVVieieiweww--2-2-2-c-clcololsosesededd
afaofapoftpo_tpa_tla_llal ll afopt_all
fpfgpf_gpa_gla_llal ll fpg_all
lclmcm_a_lal ll lcm_all
mafia_all mafia_all
dci_all dci_all
patriciamine_all
10100
)
c
e
s
(
e
m
it
u
p
c
100
10
1
afaofapoftpo_tpc_tlc_olcsolesodesded
fpfgpf_gpc_glc_olcsolesodesded
lclmcm_c_lcolsoesded
mafia_closed</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Computational Experiments</title>
      <p>
        In this section, we show the results of our
computational experiments. We implemented our three
algorithms LCM, LCMfreq, and LCMmax. They are
coded by ANSI C, and complied by gcc. The
experiments were executed on a notebook PC, with AMD
athron XP 1600+ of 224MB memory. The
performance of LCM algorithms are compared with the
algorithms which marked good score on FIMI 03:
fpgrowth[
        <xref ref-type="bibr" rid="ref4">8</xref>
        ], afopt[
        <xref ref-type="bibr" rid="ref7">11</xref>
        ], MAFIA[
        <xref ref-type="bibr" rid="ref1 ref2">5, 6</xref>
        ], kDCI[
        <xref ref-type="bibr" rid="ref8">12</xref>
        ], and
PATRICIAMINE[
        <xref ref-type="bibr" rid="ref12">16</xref>
        ]. We note that kDCI and
PATRICIAMINE are only for all frequent itemset
mining. To reduce the time for experiments, we stop
the execution when an algorithm takes more than
10 minute. The following gures show the results.
We do not plot if the computation time is over 10
minutes, or abnormal terminations. The results are
displayed in Figure 1 and 2. In each graph, the
horizontal axis is the size of minimum supports, and the
virtical axis is the CPU time written in a log scale.
      </p>
      <p>From the performances of implementations, the
instances were classi ed into three groups, in which the
results are similar. Due to the space limitation, we
show one instance as a representative for each group.</p>
      <p>The rst group is composed of BMS-WebView1,
BMS-WebView2, BMS-POS, T10I4D100K, kosarak,
and retail. These datasets have many items and
transactions but are sparse. We call these datasets
sparse datasets. We chosen BMS-WebView2 as the
representative.</p>
      <p>The second group is composed of datasets taken
10100
10</p>
      <p>1
)c 10
e
s
(
e
m
it
u
p 1
c</p>
      <p>0.1
0.101</p>
      <p>9090
100
)c 10
e
s
(
e
m
it
u
p 1
c
0.1
1000
)c 100
e
s
(
e 10
m
it
u
p 10
c
afaofpotp_ta_lal ll afopt_all
fpfgp_ga_lal ll fpg_all
lcm_all lcm_all
mafia_all mafia_all
dci_all dci_all
patriciamine_all
afopt_maximal
fpg_maximal
lcm_maximal
mafia_maximal
afaofpotp_tc_lcolsoesded
fpfgp_gc_lcolsoesded
lcm_closed
mafia_closed
8800
7700
6600
5500
4400
30
20
90
8800
7070 60 60 50 5040
4300
230
mmininssuupp((%%))
chess-maximal
90
80
70
40
30
20
60</p>
      <p>50
minsup (%)
aaccccidideenntsts--cclolosseedd
11
9090 8800 7700 6600
5500
4400
3300
20
10
mmininssuupp((%%))
accidents-maximal
80
7700 6060 50 50 40 4030 3200
120
80
70
60
50
40
30
20
10
0.1
1000
)c 100
e
s
(
e 10
m
it
u
p 10
c</p>
      <p>1000
)c 100
e
s
(
e
m
it
u
p 10
c
afopt_closed
fpg_closed
lcm_closed
mafia_closed
minsup (%)
aaccccidideenntsts--aalll
afaofpotp_ta_lal ll afopt_all
fpfgp_ga_lal ll fpg_all
lcm_all lcm_all
mafia_all mafia_all
dci_all dci_all
patriciamine_all
afopt_maximal
fpg_maximal
lcm_maximal
mafia_maximal
from UCI-Machine Learning Repository1, connect,
chess, mushrooms, pumsb, and pumsb-star. These
datasets have many transactions but few items. We
call these datasets middle density datasets. As a
representative, we show the result of chess.</p>
      <p>The third group is accidents. It is di erent from
any other dataset. It has huge number of
transactions, but few items. Transactions includes many
items, so the dataset is very dense. We call this
dataset very dense dataset.</p>
      <p>In almost instances and minimum supports, LCM
algorithms perform well. When the minimum
support is large, LCM algorithms are the fastest for all
instances, because of the fast initialization. For all
instances with any minimum support, LCM
outperforms other closed itemset mining algorithms. This
shows the e ciency of ppc extension.</p>
      <p>For sparse datasets, LCM algorithms are the
fastest, for any minimum support. The e ciency of
FP-tree is not large, and occurrence deliver works
efciently. The performances of afopt and fp-growth
are quite similar for these problems. They are the
second bests, and 2 to 10 times slower than LCM
algorithms. For enumerating frequent closed
itemsets, they take much time when the number of closed
itemsets is large. Although PATRICIAMINE is fast
as much as fp-groth and afopt, it abnormally
terminated for some instances. kDCI is slow when the
number of frequent itemsets is large. MAFIA was the
slowest for these instances, for any minimum support.</p>
      <p>For middle density datasets, LCM is the fastest
for all instances on closed itemset mining. On all
and maximal frequent itemset mining, LCMfreq and
LCMmax are the fastest for large minimum supports,
for any dataset. For small minimum supports, for
half instances LCMfreq and LCMmax are the fastest.
For the other instances, the results are case by case:
each algorithm won in some cases.</p>
      <p>For accidents, LCM algorithms are the fastest
when the minimum support is large. For small
supports, LCM(closed) is the fastest, however LCMfreq
and LCMmax are slower than fp-growth For this
dataset, the e ciency of FP-tree is large, and the
compression ratio is up to 6. Bitmap is also e cient
from the density. Hence, the computation time for
the frequency counting is short in the execution of
existing implementations. However, by ppc
extension, LCM has an advantage for closed itemset
mining. hence LCM(closed) is the fastest.</p>
      <p>1http://www.ics.uci.edu/ mlearn/MLRepository.html</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we proposed a fast implementation of
LCM for enumerating frequent closed itemsets, which
is based on pre x preserving closure extension. We
further gave implementations LCMfreq and
LCMmax for enumerating all frequent itemsets and
maximal frequent itemsets by modifying LCM. We show
by computational experiments that our implements
of LCM, LCMfreq and LCMmax perform above the
other algorithms for many datasets, especially for
sparse datasets. There is a possibility of speeding up
LCM algorithms by developing more e cient
maximality checking algorithms, or developing a hybrid of
array and FP-tree like data structures.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>This research is supported by joint-research funds of
National Institute of Informatics.</p>
    </sec>
    <sec id="sec-7">
      <title>References</title>
      <p>[1] R. Agrawal and R. Srikant, \Fast Algorithms for
Mining Association Rules in Large Databases,"
In Proceedings of VLDB '94, pp. 487{499, 1994.
[2] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen
and A. I. Verkamo, \Fast Discovery of
Association Rules," In Advances in Knowledge
Discovery and Data Mining, MIT Press, pp. 307{328,
1996.
[3] R. J. Bayardo Jr., \E ciently Mining Long
Patterns from Databases", In Proc. SIGMOD'98,
pp. 85{93, 1998.
[4] E. Boros, V. Gurvich, L. Khachiyan, and
K. Makino, \On the Complexity of
Generating Maximal Frequent and Minimal Infrequent
Sets," STACS 2002, pp. 133-141, 2002.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Calimlim</surname>
          </string-name>
          , J. Gehrke, \
          <article-title>MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases,"</article-title>
          <source>In Proc. ICDE</source>
          <year>2001</year>
          , pp.
          <fpage>443</fpage>
          -
          <lpage>452</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Calimlim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Flannick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          , and T. Yiu, \
          <article-title>MAFIA: A Performance Study of Mining Maximal Frequent Itemsets,"</article-title>
          <source>In Proc. IEEE ICDM'03 Workshop FIMI'03</source>
          ,
          <year>2003</year>
          . (Available
          <source>as CEUR Workshop Proc. series</source>
          , Vol.
          <volume>90</volume>
          , http://ceur-ws.
          <source>org/</source>
          vol-
          <volume>90</volume>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Goethals</surname>
          </string-name>
          ,
          <source>the FIMI'03 Homepage</source>
          , http://fimi.cs.helsinki.fi/,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Grahne</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , \
          <article-title>E ciently Using Pre x-trees in Mining Frequent Itemsets,"</article-title>
          <source>In Proc. IEEE ICDM'03 Workshop FIMI'03</source>
          ,
          <year>2003</year>
          . (Available
          <source>as CEUR Workshop Proc. series</source>
          , Vol.
          <volume>90</volume>
          , http://ceur-ws.
          <source>org/</source>
          vol-
          <volume>90</volume>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Pei,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          , \
          <article-title>Mining Frequent Patterns without Candidate Generation,"</article-title>
          <source>SIGMOD Conference</source>
          <year>2000</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>2000</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kohavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Brodley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Frasca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mason</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , \KDD-Cup
          <source>2000 Organizers' Report: Peeling the Onion," SIGKDD Explorations</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>86</fpage>
          -
          <lpage>98</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Guimei</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Hongjun Lu,
          <article-title>Je rey Xu Yu, Wang Wei, and Xiangye Xiao, \AFOPT: An E cient Implementation of Pattern Growth Approach,"</article-title>
          <source>In Proc. IEEE ICDM'03 Workshop FIMI'03</source>
          ,
          <year>2003</year>
          . (Available
          <source>as CEUR Workshop Proc. series</source>
          , Vol.
          <volume>90</volume>
          , http://ceur-ws.
          <source>org/</source>
          vol-
          <volume>90</volume>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Orlando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lucchese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Palmerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Perego</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          , \
          <article-title>kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets,"</article-title>
          <source>In Proc. IEEE ICDM'03 Workshop FIMI'03</source>
          ,
          <year>2003</year>
          . (Available
          <source>as CEUR Workshop Proc. series</source>
          , Vol.
          <volume>90</volume>
          , http://ceur-ws.
          <source>org/</source>
          vol-
          <volume>90</volume>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pasquier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bastide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taouil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lakhal</surname>
          </string-name>
          ,
          <article-title>E cient Mining of Association Rules Using Closed Itemset Lattices, Inform</article-title>
          . Syst.,
          <volume>24</volume>
          (
          <issue>1</issue>
          ),
          <volume>25</volume>
          {
          <fpage>46</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pasquier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bastide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taouil</surname>
          </string-name>
          , L. Lakhal,
          <article-title>Discovering Frequent Closed Itemsets for Association Rules</article-title>
          ,
          <source>In Proc. ICDT</source>
          '
          <volume>99</volume>
          ,
          <fpage>398</fpage>
          -
          <lpage>416</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          , J. Han,
          <string-name>
            <surname>R</surname>
          </string-name>
          . Mao, \
          <article-title>CLOSET: An E cient Algorithm for Mining Frequent Closed Itemsets,"</article-title>
          <source>ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery</source>
          <year>2000</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pietracaprina</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Zandolin</surname>
          </string-name>
          , \
          <article-title>Mining Frequent Itemsets using Patricia Tries,"</article-title>
          <source>In Proc. IEEE ICDM'03 Workshop FIMI'03</source>
          ,
          <year>2003</year>
          . (Available
          <source>as CEUR Workshop Proc. series</source>
          , Vol.
          <volume>90</volume>
          , http://ceur-ws.
          <source>org/</source>
          vol-
          <volume>90</volume>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tsukiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ariyoshi</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Shirakawa</surname>
          </string-name>
          , \
          <article-title>A New Algorithm for Generating All the Maximum Independent Sets,"</article-title>
          <source>SIAM Journal on Computing</source>
          , Vol.
          <volume>6</volume>
          , pp.
          <volume>505</volume>
          {
          <issue>517</issue>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Uno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Asai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Uchida</surname>
          </string-name>
          , H. Arimura, \
          <article-title>LCM: An E cient Algorithm for Enumerating Frequent Closed Item Sets,"</article-title>
          <source>In Proc. IEEE ICDM'03 Workshop FIMI'03</source>
          ,
          <year>2003</year>
          . (Available
          <source>as CEUR Workshop Proc. series</source>
          , Vol.
          <volume>90</volume>
          , http://ceur-ws.
          <source>org/</source>
          vol-
          <volume>90</volume>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Uno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Asai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Uchida</surname>
          </string-name>
          , H. Arimura, \
          <article-title>An E cient Algorithm for Enumerating Closed Patterns in Transaction Databases,"</article-title>
          to appear
          <source>in Proc. of Discovery Science</source>
          <year>2004</year>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [20]
          <string-name>
            <surname>M. J. Zaki</surname>
          </string-name>
          , C. Hsiao, \
          <article-title>CHARM: An E cient Algorithm for Closed Itemset Mining,"</article-title>
          <source>2nd SIAM International Conference on Data Mining (SDM'02)</source>
          , pp.
          <fpage>457</fpage>
          -
          <lpage>473</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kohavi</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Mason</surname>
          </string-name>
          , \
          <article-title>Real World Performance of Association Rule Algorithms,"</article-title>
          <source>KDD</source>
          <year>2001</year>
          , pp.
          <fpage>401</fpage>
          -
          <lpage>406</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>