<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Efficiently Using Prefix-trees in Mining Frequent Itemsets</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Go ̈ sta Grahne and Jianfei Zhu Concordia University Montreal</institution>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Efficient algorithms for mining frequent itemsets are crucial for mining association rules. Methods for mining frequent itemsets and for iceberg data cube computation have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper we present a novel array-based technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FPtree based algorithms. Our technique works especially well for sparse datasets. Furthermore, we present new algorithms for a number of common data mining problems. Our algorithms use the FP-tree data structure in combination with our array technique efficiently, and incorporates various optimization techniques. We also present experimental results which show that our methods outperform not only the existing methods that use the FP-tree structure, but also all existing available algorithms in all the common data mining problems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>A fundamental problem for mining association rules is
to mine frequent itemsets (FI’s). In a transaction database,
if we know the support of all frequent itemsets, the
association rules generation is straightforward. However, when
a transaction database contains large number of large
frequent itemsets, mining all frequent itemsets might not be a
good idea. As an example, if there is a frequent itemset with
size `, then all 2` nonempty subsets of the itemset have to
be generated. Thus, a lot of work is focused on
discovering only all the maximal frequent itemsets (MFI’s).
Unfortunately, mining only MFI’s has the following deficiency.
From an MFI and its support s, we know that all its subsets
are frequent and the support of any of its subset is not less
than s, but we do not know the exact value of the support.
To solve this problem, another type of a frequent itemset,
the Closed Frequent Itemset (CFI), has been proposed. In
most cases, though, the number of CFI’s is greater than the
number of MFI’s, but still far less than the number of FI’s.</p>
      <p>
        In this work we mine FI’s, MFI’s and CFI’s by efficiently
using the FP-tree, the data structure that was first introduced
in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The FP-tree has been shown to be one of the most
efficient data structures for mining frequent patterns and for
“iceberg” data cube computations [
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6, 7, 9, 8</xref>
        ].
      </p>
      <p>The most important contribution of our work is a novel
technique that uses an array to greatly improve the
performance of the algorithms operating on FP-trees. We first
demonstrate that the use of our array-based technique
drastically speeds up the FP-growth method, since it now needs
to scan each FP-tree only once for each recursive call
emanating from it. We then use this technique and give a new
algorithm FPmax*, which extends our previous algorithm
FPmax, for mining maximal frequent itemsets. In FPmax*,
we use a variant of the FP-tree structure for subset testing,
and give number of optimizations that further reduce the
running time. We also present an algorithm, FPclose, for
mining closed frequent itemsets. FPclose uses yet another
variation of the FP-tree structure for checking the
closedness of frequent itemsets.</p>
      <p>Finally, we present experimental results that demonstrate
the fact that all of our FP-algorithms outperform previously
known algorithms practically always.</p>
      <p>The remaining of the paper is organized as follows. In
Section 2, we briefly review the FP-growth method, and
present our novel array technique that results in the greatly
improved method FPgrowth*. Section 3 gives algorithm
FPmax*, which is an extension of our previous algorithm
FPmax, for mining MFI’s. Here we also introduce our
approach of subset testing needed in mining MFI’s and CFI’s.
In Section 4 we give algorithm FPclose, for mining CFI’s.
Experimental results are given in Section 5. Section 6
concludes, and outlines directions of future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Discovering FI’s</title>
      <p>2.1. The FP-tree and FP-growth method</p>
      <p>
        The FP-growth method by Han et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] uses a data
structure called the FP-tree (Frequent Pattern tree). The
FPtree is a compact representation of all relevant frequency
information in a database. Every branch of the FP-tree
represents a frequent itemset, and the nodes along the branches
are stored in decreasing order of frequency of the
corresponding items, with leaves representing the least frequent
items. Compression is achieved by building the tree in such
a way that overlapping itemsets share prefixes of the
corresponding branches.
      </p>
      <p>The FP-tree has a header table associated with it. Single
items and their counts are stored in the header table in
decreasing order of their frequency. The entry for an item also
contains the head of a list that links all the corresponding
nodes of the FP-tree.</p>
      <p>
        Compared with Apriori [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and its variants which need
several database scans, the FP-growth method only needs
two database scans when mining all frequent itemsets. The
first scan counts the number of occurrences of each item.
The second scan constructs the initial FP-tree which
contains all frequency information of the original dataset.
Mining the database then becomes mining the FP-tree.
a b c e f o
a c g
e i
a c d e g
a c e g l
e j
a b c e f p
a c d
a c e g m
a c e g n
(a)
      </p>
      <p>Header table</p>
      <p>Head of
item node−links
e:8
c:8
a:8
g:5
b:2
f:2
d:2
b:2
f:2
e:8
c:6
a:6
(b)
root
g:4
d:1
c:2
a:2
g:1
d:1
To construct the FP-tree, first find all frequent items by
an initial scan of the database. Then insert these items in the
header table, in decreasing order of their count. In the next
(and last) scan, as each transaction is scanned, the set of
frequent items in it are inserted into the FP-tree as a branch.
If an itemset shares a prefix with an itemset already in the
tree, the new itemset will share a prefix of the branch
representing that itemset. In addition, a counter is associated
with each node in the tree. The counter stores the number of
transactions containing the itemset represented by the path
from the root to the node in question. This counter is
updated during the second scan, when a transaction causes the
insertion of a new branch. Figure 1 (a) shows an example
of a database and Figure 1 (b) the FP-tree for that database.
Note that there may be more than one node corresponding
to an item in the FP-tree. The frequency of any one item
i is the sum of the count associated with all nodes
representing i, and the frequency of an itemset equals the sum
of the counts of the least frequent item in it, restricted to
those branches that contain the itemset. For instance, from
Figure 1 (b) we can see that the frequency of the itemset
fc; a; gg is 5.</p>
      <p>Thus the constructed FP-tree contains all frequency
information of the database. Mining the database becomes
mining the FP-tree. The FP-growth method relies on the
following principle: if X and Y are two itemsets, the count
of itemset X [ Y in the database is exactly that of Y in
the restriction of the database to those transactions
containing X. This restriction of the database is called the
conditional pattern base of X, and the FP-tree constructed from
the conditional pattern base is called X’s conditional
FPtree, which we denote by TX . We can view the FP-tree
constructed from the initial database as T;, the conditional
FP-tree for ;. Note that for any itemset Y that is frequent in
the conditional pattern base of X, the set X [Y is a frequent
itemset for the original database.</p>
      <p>Given an item i in the header table of an FP-tree TX ,
by following the linked list starting at i in the header table
of TX , all branches that contain item i are visited. These
branches form the conditional pattern base of X [ fig, so
the traversal obtains all frequent items in this conditional
pattern base. The FP-growth method then constructs the
conditional FP-tree TX[fig, by first initializing its header
table based on the found frequent items, and then visiting
the branches of TX along the linked list of i one more time
and inserting the corresponding itemsets in TX[fig. Note
that the order of items can be different in TX and TX[fig.
The above procedure is applied recursively, and it stops
when the resulting new FP-tree contains only one single
path. The complete set of frequent itemsets is generated
from all single-path FP-trees.</p>
    </sec>
    <sec id="sec-3">
      <title>2.2. An array technique</title>
      <p>The main work done in the FP-growth method is
traversing FP-trees and constructing new conditional FP-trees after
the first FP-tree is constructed from the original database.
From numerous experiments we found out that about 80%
of the CPU time was used for traversing FP-trees. Thus,
the question is, can we reduce the traversal time so that the
method can be sped up?</p>
      <p>The answer is yes, by using a simple additional data
structure. Recall that for each item i in the header of a
conditional FP-tree TX , two traversals of TX are needed for
constructing the new conditional FP-tree TX[fig. The first
traversal finds all frequent items in the conditional pattern
base of X [ fig, and initializes the FP-tree TX[fig by
constructing its header table. The second traversal constructs
the new tree TX[fig. We can omit the first scan of TX by
constructing an array AX while building TX . The
following example will explain the idea. In Figure 1 (a), supposing
that the minimum support is 20%, after the first scan of the
original database, we sort the frequent items as e:8, c:8, a:8,
g:5, b:2, f :2, d:2. This order is also the order of items in the
header table of T;. During the second scan of the database
we will construct T;, and an array A;. This array will store
the counts of all 2-itemsets. All cells in the array are
initialized as 0.</p>
      <p>In A;, each cell is a counter of a 2-itemset, cell
A;[d; e] is the counter for itemset fd; eg, cell A;[d; c]
is the counter for itemset fd; cg, and so forth.
During the second scan for constructing T;, for each
transaction, first all frequent items in the transaction are
extracted. Suppose these items form itemset I. To insert
I into T;, the items in I are sorted according to the
order in header table of T;. When we insert I into T;,
at the same time A;[i; j] is incremented by 1 if fi; jg
is contained in I. For example, for the first transaction,
fa; b; c; e; f g is extracted (item o is infrequent) and sorted
as e; c; a; b; f . This itemset is inserted into T; as usual,
and at the same time, A;[f; e]; A;[f; c], A;[f; a]; A;[f; b],
A;[b; a]; A;[b; c], A;[b; e]; A;[a; e], A;[a; c]; A;[c; e] are all
incremented by 1. After the second scan, array A; keeps the
counts of all pairs of frequent items, as shown in table (a)
of Figure 2.</p>
      <p>Next, the FP-growth method is recursively called to mine
frequent itemsets for each item in header table of T;.
However, now for each item i, instead of traversing T; along
the linked list starting at i to get all frequent items in i’s
conditional pattern base, A; gives all frequent items for i.
For example, by checking the third line in the table for A;,
frequent items e; c; a for the conditional pattern base of g
can be obtained. Sorting them according to their counts, we
get a; c; e. Therefore, for each item i in T; the array A;
makes the first traversal of T; unnecessary, and Tfig can be
initialized directly from A;.</p>
      <p>For the same reason, from a conditional FP-tree TX ,
when we construct a new conditional FP-tree for X [ fig,
for an item i, a new array AX[fig is calculated.
During the construction of the new FP-tree TX[fig, the array
AX[fig is filled. For instance, in Figure 1, the cells of
array Afgg is shown in table (b) of Figure 2. This array
is constructed as follows. From the array A;, we know
that the frequent items in the conditional pattern base of
fgg are, in order, a; c; e. By following the linked list of
g, from the first node we get fe; c; ag : 4, so it is inserted as
(a : 4; c : 4; e : 4) into the new FP-tree Tfgg. At the same
time, Afgg[e; c]; Afgg[e; a] and Afgg[c; a] are incremented
by 4. From the second node in the linked list, fc; ag : 1 is
extracted, and it is inserted as (a : 1; c : 1) into Tfgg. At the
same time, Afgg[c; a] is incremented by 1. Since there are
no other nodes in the linked list, the construction of Tfgg is
finished, and array Afgg is ready to be used for construction
of FP-trees in next level of recursion. The construction of
arrays and FP-trees continues until the FP-growth method
terminates.</p>
      <p>
        Based on above discussion, we define a variation of the
FP-tree structure in which besides all attributes given in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
an FP-tree also has an attribute, array, which contains the
corresponding array.
      </p>
      <p>Now let us analyze the size of an array. Suppose the
number of frequent items in the first FP-tree is n. Then
the size of the associated array is Pin=11 i = n(n 1)=2.
We can expect that FP-trees constructed from the first
FPtree have fewer frequent items, so the sizes of the associated
arrays decrease. At any time, since an array is an attribute
of an FP-tree, when the space for the FP-tree is freed, the
space for the array is also freed.</p>
    </sec>
    <sec id="sec-4">
      <title>2.3. Discussion</title>
      <p>The array technique works very well especially when the
dataset is sparse. The FP-tree for a sparse dataset and the
recursively constructed FP-trees will be big and bushy, due to
the fact that they do not have many shared common
prefixes. The arrays save traversal time for all items and the
next level FP-trees can be initialized directly. In this case,
the time saved by omitting the first traversals is far greater
than the time needed for accumulating counts in the
associated array.</p>
      <p>However, when a dataset is dense, the FP-trees are more
compact. For each item in a compact FP-tree, the traversal
is fairly rapid, while accumulating counts in the associated
array may take more time. In this case, accumulating counts
may not be a good idea.</p>
      <p>Even for the FP-trees of sparse datasets, the first levels of
recursively constructed FP-trees are always conditional
FPtrees for the most common prefixes. We can therefore expect
the traversal times for the first items in a header table to be
fairly short, so the cells for these first items are unnecessary
in the array. As an example, in Figure 2 table (a), since
e; c; and a are the first 3 items in the header table, the first
two lines do not have to be calculated, thus saving counting
time.</p>
      <p>Note that the datasets (the conditional pattern bases)
change during the different depths of the recursion. In order
to estimate whether a dataset is sparse or dense, during the
construction of each FP-tree we count the number of nodes
in each level of the tree. Based on experiments, we found
that if the upper quarter of the tree contains less than 15% of
the total number of nodes, we are most likely dealing with
a dense dataset. Otherwise the dataset is likely to be sparse.</p>
      <p>If the dataset appears to be dense, we do not calculate
the array for the next level of the FP-tree. Otherwise, we
calculate array for each FP-tree in the next level, but the
cells for the first several (say 5) items in its header table are
not set.
2.4. FPgrowth* : an improved FP-growth method</p>
      <p>Figure 3 contains the pseudocode for our new method
FPgrowth*. The procedure has an FP-tree T as parameter.
The tree has attributes: base, header and array. T:base
contains the itemset X, for which T is a conditional FP-tree,
the attribute header contains the head table, and T:array
contains the array AX .</p>
      <sec id="sec-4-1">
        <title>Procedure FPgrowth*(T )</title>
      </sec>
      <sec id="sec-4-2">
        <title>Input: A conditional FP-tree T</title>
        <sec id="sec-4-2-1">
          <title>Output: The complete set of FI’s</title>
          <p>corresponding to T .</p>
          <p>Method:
1. if T only contains a single path P
2. then for each subpath Y of P
3. output pattern Y [ T:base with
count = smallest count of nodes
in Y
4. else for each i in T:header
5. output Y = T:base [ fig with i:count</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>6. if T:array is not NULL</title>
        <sec id="sec-4-3-1">
          <title>7. construct a new header table</title>
          <p>for Y ’s FP-tree from T:array</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>8. else construct a new header table</title>
        <p>from T ;</p>
      </sec>
      <sec id="sec-4-5">
        <title>9. construct Y ’s conditional</title>
      </sec>
      <sec id="sec-4-6">
        <title>FP-tree TY and its array AY ;</title>
        <p>10. if TY 6= ;
11. call FPgrowth*(TY );</p>
        <p>
          In FPgrowth*, line 6 tests if the array of the current
FPtree is NULL. If the FP-tree corresponds to a sparse dataset,
its array is not NULL, and line 7 will be used to construct
the header table of the new conditional FP-tree from the
array directly. One FP-tree traversal is saved for this item
compared with the FP-growth method in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In line 9,
during the construction, we also count the nodes in the different
levels of the tree, in order to estimate whether we shall
really calculate the array, or just set TY :array = N U LL.
        </p>
        <p>From our experimental results we found that an FP-tree
could have millions of nodes, thus, allocating and
deallocating those nodes takes plenty of time. In our
implementation, we used our own main memory management for
allocating and deallocating nodes. Since all memory for nodes
in an FP-tree is deallocated after the current recursion ends,
a chunk of memory is allocated for each FP-tree when we
create the tree. The chunk size is changeable. After
generating all frequent itemsets from the FP-tree, the chunk is
discarded. Thus we successfully avoid freeing nodes in the
FP-tree one by one, which is more time-consuming.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3. FPmax*: Mining MFI’s</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] we developed FPmax, a variation of the FP-growth
method, for mining maximal frequent itemsets. Since the
array technique speeds up the FP-growth method for sparse
datasets, we can expect that it will be useful in FPmax too.
This gives us an improved method, FPmax*. Compared to
FPmax, the improved method FPmax* also has a more
efficient subset test, as well as some other optimizations. It
turns out that FPmax* outperforms GenMax[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and MAFIA
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for all cases we discussed in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>3.1. The MFI-Tree</title>
      <p>
        Since FPmax is a depth-first algorithm, a frequent
itemset can be a subset only of an already discovered MFI. In
FPmax we introduced a global data structure, the
Maximal Frequent Itemset tree (MFI-tree), to keep the track of
MFI’s. A newly discovered frequent itemset is inserted into
the MFI-tree, unless it is a subset of an itemset already in
the tree. However, for large datasets, the MFI-tree will be
quite large, and sometimes one itemset needs thousands of
comparisons for subset testing. Inspired by the way subset
checking is done in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], in FPmax*, we still use the
MFItree structure, but for each conditional FP-tree TX , a small
MFI-tree MX is created. The tree MX will contain all
maximal itemsets in the conditional pattern base of X. To see if
a local MFI Y generated from a conditional FP-tree TX is
maximal, we only need to compare Y with the itemsets in
MX . This achieves a significant speedup of FPmax.
      </p>
      <p>Each MFI-tree is associated with a particular FP-tree.
Children of the root of the MFI-tree are item prefix
subtrees. In an MFI-tree, each node in the subtree has three
fields: item-name, level and node-link. The level-field will
be useful for subset testing. All nodes with same item-name
are linked together, as in an FP-tree. The MFI-tree also
has a header table. However, unlike the header table in an
FP-tree, which is constructed from traversing the previous
FP-tree or using the associated array, the header table of an
MFI-tree is constructed based on the item order in the
table of the FP-tree it is associated with. Each entry in the
header table consists of two fields, item-name and head of a
linked list. The head points to the first node with the same
item-name in the MFI-tree.</p>
      <p>The insertion of an MFI into an MFI-tree is similar to the
insertion of a frequent set into an FP-tree. Figure 4 shows
the insertions of three MFI’s into an MFI-tree associated
with the FP-tree in Figure 1 (b). In Figure 4, a node x : `
means that the node is for item x and its level is `. Figure 4
(a) shows the tree after (c; a; d) and (e; c; a; b; f ) have been
inserted. In Figure 4 (b), since new MFI (e; c; a; b; g) shares
prefix (e; c; a) with (e; c; a; b; f ), only one new node for g
is inserted.
3.2. FPmax*</p>
      <p>Figure 5 gives algorithm FPmax*. The first call will be
for the FP-tree constructed from the original database, and
it will have an empty MFI-tree. Before a recursive call
FPmax*(T,M), we already know from line 10 that the set
containing T:base and the items in the current FP-tree is not a
subset of any existing MFI. During the recursion, if there
is only one single path in T , this single path together with
T:base is an MFI of the database. In line 2, the MFI is
inserted into M . If the FP-tree is not a single-path tree, then
for each item i in the header table, we start preparing for the
recursive call FPmax*(TY ; MY ), for Y = T:base [ fig.
The items in the header table of T are processed in
increasing order of frequency, so that maximal frequent itemsets
will be found before any of their frequent subsets. Lines
5 to 8 use the array technique, and line 10 calls function
subset checking to check if Y together with all frequent
items in Y ’s conditional pattern base is a subset of any
existing MFI in M (thus we do superset pruning here). If
subset checking return false, FPmax* will be called
recursively, with (TY ; MY ). The implementation of function
subset checking will be explained shortly.</p>
      <p>Note that before and after calling subset checking, if Y [
T ail is not subset of any MFI, we still do not know whether
Y [ T ail is frequent. If, by constructing Y ’s conditional</p>
      <sec id="sec-6-1">
        <title>Procedure FPmax*(T; M )</title>
      </sec>
      <sec id="sec-6-2">
        <title>Input: T , an FP-tree</title>
      </sec>
      <sec id="sec-6-3">
        <title>M , the MFI-tree for T:base</title>
      </sec>
      <sec id="sec-6-4">
        <title>Updated M</title>
        <sec id="sec-6-4-1">
          <title>Output:</title>
          <p>Method:</p>
        </sec>
      </sec>
      <sec id="sec-6-5">
        <title>1. if T only contains a single path P</title>
      </sec>
      <sec id="sec-6-6">
        <title>2. insert P into M</title>
        <p>3. else for each i in T:header
4. set Y = T:base [ fig;</p>
      </sec>
      <sec id="sec-6-7">
        <title>5. if T:array is not NULL</title>
        <p>6. Tail=ffrequent items for i in</p>
        <p>T:arrayg
7. else
8. Tail=ffrequent items in i’s
conditional pattern baseg</p>
        <sec id="sec-6-7-1">
          <title>9. sort Tail in decreasing order of the items’ counts</title>
          <p>10. if not subset checking(Y [ T ail; M )
11. construct Y ’s conditional</p>
        </sec>
      </sec>
      <sec id="sec-6-8">
        <title>FP-tree TY and its array AY ;</title>
        <p>12. initialize Y ’s conditional</p>
      </sec>
      <sec id="sec-6-9">
        <title>MFI-tree MY ; 13. call FPmax*(TY ; MY ); 14. merge MY with M</title>
        <p>FP-tree TY , we find out that TY only has a single path, we
can conclude that Y [ T ail is frequent. Since Y [ T ail was
not a subset of any previously discovered MFI, it is a new
MFI and will be inserted into MY .</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>3.3. Implementation of subset testing</title>
      <p>The function subset checking works as follows. Suppose
Tail = i1i2; : : : ik, in decreasing order of frequency
according to the header table of M . By following the linked list of
i, for each node n in the list, we test if Tail is a subset of the
ancestors of n. Here, the level of n can be used for saving
comparison time. First we test if the level of n is smaller
than k. If it is, the comparison stops because there are not
enough ancestors of n for matching the rest of Tail. This
pruning technique is also applied as we move up the branch
and towards the front of Tail.</p>
      <p>Unlike an FP-tree, which is not changed during the
execution of the algorithm, an MFI-tree is dynamic. At line
12, for each Y , a new MFI-tree MY is initialized from the
predecessor MFI-tree M . Then after the recursive call, M
is updated on line 14 to contain all newly found frequent
itemsets. In the actual implementation, we however found
that it was more efficient to update all MFI-trees along the
recursive path, instead of merging only at the current level.
In other words, we omitted line 14, and instead on line 2, P
is inserted into the current M , and also into all predecessor
MFI-trees that the implementation of the recursion needs to
keep in main memory in any case.</p>
      <p>Since FPmax* is a depth-first algorithm, it is
straightforward to show that the above subset checking is correct.
Based on the correctness of the FP-growth method, we can
conclude that FPmax* returns all and only the maximal
frequent itemsets in a given dataset.</p>
    </sec>
    <sec id="sec-8">
      <title>3.4. Optimizations</title>
      <p>In the method FPmax*, one more optimization is used.
Suppose, that at some level of the recursion, the header table
of the current FP-tree is i1; i2; : : : ; im. Then starting from
im, for each item in the header table, we may need to do
the work from line 4 to line 14. If for any item, say ik,
where k m, its maximal frequent itemset contains items
i1; i2; : : : ; ik 1, i.e., all the items that have not yet called
FPmax* recursively, these recursive calls can be omitted.
This is because for those items, their tails must be subsets
of fi1; i2; : : : ; ik 1g, so subset checking(Y [T ail) would
always return true.</p>
      <p>FPmax* also uses the memory management described in
Section 2.4, for allocating and deallocating space for
FPtrees and MFI-trees.</p>
    </sec>
    <sec id="sec-9">
      <title>3.5. Discussion</title>
      <p>
        One may wonder if the space required for all the
MFItrees of a recursive branch is too large. Actually, before
the first call of FPmax*, the first FP-tree has to fit in main
memory. This is also required by the FP-growth method.
The corresponding MFI-tree is initialized as empty.
During recursive calls of FPmax*, new conditional FP-trees are
constructed from the first FP-tree or from an ancestors
FPtree. From the experience of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we know the recursively
constructed FP-trees are relatively small. We can expect
that the total size of these FP-trees is not greater than the
final size of the MFI-tree for ;. Similarly, the MFI-trees
constructed from ancestors are also small. All MFI-trees
grow gradually. Thus we can conclude that the total main
memory requirement for running FPmax* on a dataset is
proportional to the sum of the size of the FP-tree and the
MFI-tree for ;.
      </p>
    </sec>
    <sec id="sec-10">
      <title>4. FPclose: Mining CFI’s</title>
      <p>For mining frequent closed itemsets, FPclose works
similarly to FPmax*. They both mine frequent patterns from
FP-trees. Whereas FPmax* needs to check that a newly
found frequent itemset is maximal, FPclose needs to verify
that the new frequent itemset is closed. For this we use a
CFI-tree, which is another variation of an FP-tree.</p>
      <p>
        One of the first attempts to use FP-trees in CFI mining
was the algorithm CLOSET+ [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This algorithm uses one
global prefix-tree for keeping track of all closed itemsets.
As we pointed out before, one global tree will be quite big,
and thus slows down searches. In FPclose we will therefore
use multiple, conditional CFI-trees for checking closedness
of itemsets. We can thus expect that FPclose outperforms
CLOSET+.
4.1. The CFI-tree and algorithm FPclose
      </p>
      <p>Similar to an MFI-tree, a CFI-tree is related to an FP-tree
and an itemset X, and we will denote the CFI-tree as CX .
The CFI-tree CX always stores all already found CFI’s
containing itemset X, and their counts. A newly found frequent
itemset Y that contains X only needs to be compared with
the CFI’s in CX . If in CX , there is no superset of Y with
same count as Y , Y is closed.</p>
      <p>In a CFI-tree, each node in the subtree has four fields:
item-name, count, node-link and level. Here, the count field
is needed because when comparing a Y with a set Z in the
tree, we are trying to verify that it is not the case that Y
Z, and Y and Z have the same count. The order of the items
in a CFI-tree’s header table is same as the order of items in
header table of its corresponding FP-tree.
root
b:4:2
f:5:2
(b)
Header table</p>
      <p>Head of
item node−links c:1:8
e
ca a:2:8
g
b
f
d
e:1:8
c:2:6
g:4:4</p>
      <p>The insertion of a CFI into a CFI-tree is similar to the
insertion of a transaction into an FP-tree, except now the
count of a node is not incremented, it is always replaced by
the maximal count up-to-date. Figure 6 shows some
snapshots of the construction of a CFI-tree with respect to the
FP-tree in Figure 1 (b). The item order in two trees are
same because they are both for base ;. Note that insertions
of CFI’s into the top level CFI-tree will occur only after
recursive calls have been made. In the following example, the
insertions would in actuality be performed during various
stages of the execution, not in bulk as the example might
suggest. In Figure 6, a node x : ` : c means that the node is
for item x, its level is ` and its count is c. In Figure 6 (a),
after inserting (c; a; d) and (e; c; a; b; f ) with count 2, then we
insert (c; a; g) with count 5. Since (c; a; g) shares the
prefix (c; a) with (c; a; d), only node g is appended, and at the
same time, the counts for nodes c and a are both changed
to be 5. In part (b) of Figure 6, the CFI’s (e; c; a; g) : 4,
(c; a) : 8, (c; a; e) : 6 and (e) : 8 are inserted. At this stage
the tree contains all CFI’s for the dataset in Figure 1 (a).</p>
      <sec id="sec-10-1">
        <title>Procedure FPclose(T; C)</title>
      </sec>
      <sec id="sec-10-2">
        <title>Input: T , an FP-tree</title>
      </sec>
      <sec id="sec-10-3">
        <title>C, the CFI-tree for T:base</title>
      </sec>
      <sec id="sec-10-4">
        <title>Updated C</title>
        <sec id="sec-10-4-1">
          <title>Output:</title>
          <p>Method:
1. if T only contains a single path P
2. generate all CFI’s from P
3. for each CFI X generated
4. if not closed checking(X; C)
5. insert X into C
6. else for each i in T:header
7. set Y = T:base [ fig;
8. if not closed checking(Y; C)
9. if T:array is not NULL
10. Tail = ffrequent items for
i in T:arrayg
11.
12.
13.
14.
15.
16.
17.</p>
          <p>else</p>
          <p>Tail=ffrequent items in i’s</p>
          <p>conditional pattern baseg
sort Tail in decreasing order
of items’ counts
construct the FP-tree TY and</p>
          <p>its array AY ;
initialize Y ’s conditional</p>
        </sec>
      </sec>
      <sec id="sec-10-5">
        <title>CFI-tree CY ;</title>
        <p>call FPclose(TY ; CY );
merge CY with C</p>
        <p>Figure 7 gives algorithm FPclose. Before calling
FPclose with some (T; C), we already know from line 8 that
there is no existing CFI X such that T:base X, and
T:base and X have the same count. If there is only one
single path in T , the nodes and their counts in this single path
can be easily used to list the T:base-local closed frequent
itemsets. These itemsets will be compared with the CFI’s
in C. If an itemset is closed, it is inserted into C. If the
FP-tree T is not a single-path tree, we execute line 6. Lines
9 to 12 use the array technique. Lines 4 and 8 call function
closed checking(Y; C) to check if a frequent itemset Y is
closed. If it is, the function returns true, otherwise, false is
returned. Lines 14 and 15 construct Y ’s conditional FP-tree
and CFI-tree. Then FPclose is called recursively for TY and
CY .</p>
        <p>Note that line 17 is not implemented as such. As in
algorithm FPmax*, we found it more efficient to do the insertion
of lines 3–5 into all CFI-trees currently in main memory.</p>
        <p>CFI-trees are initialized similarly to MFI-trees,
described in Section 3.3. The implementation of function
closed checking is almost the same as the
implementation of function subset checking, except now we also
consider the count of an itemset. Given an itemset Y =
fi1; i2; : : : ; ikg with count c, suppose the order of the items
in header table of the current CFI-tree is i1; i2; : : : ; ik.
Following the linked list of ik, for each node in the list, first we
check if its count is equal to or greater than c. If it is, we
then test if Y is a subset of the ancestors of that node. The
function closed checking returns true only when there is no
existing CFI Z in the CFI-tree such that Z is a superset of
Y and the count of Y is equal to or greater than the count
of Z.</p>
        <p>Memory management allocating and deallocating space
for FP-trees and CFI-trees is similar to the memory
management of FPgrowth* and FPmax*.</p>
        <p>By a similar reasoning as in Section 3.5, we conclude
that the total main memory requirement for running
FPclose on a dataset is approximately sum of the size of the
first FP-tree and its CFI-tree.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>5. Experimental Evaluation</title>
      <p>
        We now present a performance comparison of our
FPalgorithms with algorithms dEclat, GenMax, CHARM and
MAFIA. Algorithm dEclat is a depth-first search algorithm
proposed by Zaki and Gouda in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. dEclat uses a linked
list to organize frequent patterns, however, each itemset
now corresponds to an array of transaction IDs (the
“TIDarray”). Each element in the array corresponds to a
transaction that contains the itemset. Frequent itemset mining
and candidate frequent itemset generation are done by
TIDarray intersections. A technique called diffset, is used for
reducing the memory requirement of TID-arrays. The
diffset technique only keeps track of differences in the TID’s of
a candidate itemsets when it is generating frequent itemsets.
GenMax, also proposed by Gouda and Zaki [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], takes an
approach called progressive focusing to do maximality
testing. CHARM is proposed by Zaki and Hsiao [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for CFI
mining. In all three algorithms, the main operation is the
intersection of TID-arrays. Each of them has been shown as
one of the best algorithms for mining FI’s, MFI’s or CFI’s.
MAFIA is introduced in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] by Burdick et al. for mining
maximal frequent itemsets. It also has options for mining
FI’s and CFI’s. We give the results of three different sets
of experiments, one set for FI’s, one for MFI’s and one for
CFI’s.
      </p>
      <p>
        The source codes for dEclat, CHARM, GenMax and
MAFIA were provided by their authors. We ran all
algorithms on many synthetic and real datasets. Due to the lack
of space, only the results for two synthetic datasets and two
real datasets are shown here. These datasets should be
representative, as recent research papers [
        <xref ref-type="bibr" rid="ref10 ref11 ref2 ref3 ref4 ref8 ref9">2, 3, 4, 11, 10, 8, 9</xref>
        ],
use these or similar datasets.
      </p>
      <p>The two synthetic datasets, T40I10D100K and
T100I20D100K, were generated from the application
on the website of IBM 1. They both use 100,000
transactions and 1000 items. The two real datasets, pumsb* and
connect-4, were also downloaded from the IBM website 2.
Dataset connect-4 is compiled from game state information.
Dataset pumsb* is produced from census data of Public Use
Microdata Sample (PUMS). These two real datasets are
both quite dense, so a large number of frequent itemsets can
be mined even for very high values of minimum support.</p>
      <p>All experiments were performed on a 1Ghz Pentium III
with 512 MB of memory running RedHat Linux 7.3. All
times in the figures refer to CPU time.</p>
    </sec>
    <sec id="sec-12">
      <title>5.1. FI Mining</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the original FPgrowth method has been shown
to be an efficient and scalable algorithm for mining
frequent itemsets. FPgrowth is about an order of magnitude
faster than the Apriori. Subsequently, it was shown in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
that the algorithm dEclat outperforms FPgrowth on most
datasets. Thus, in the first set of experiments, FP-growth*
is compared with the original FP-growth method and with
dEclat. The original FP-growth method is implemented on
the basis of the paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In this set of experiments we also
included with MAFIA [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which has an option for mining
all FI’s. The results of the first set of experiments are shown
in Figure 8.
      </p>
      <p>Figure 8 (a) shows the CPU time of the four algorithms
running on dataset T40I10D100K. We see that FPgrowth*
is the best algorithm for this dataset. It outperforms dEclat
and MAFIA at least by a factor of two. Main memory is
used up by dEclat when the minimum support goes down to
0.25%, while FPgrowth* can still run for even smaller levels
of minimum support. MAFIA is the slowest algorithm for
this dataset and its CPU time increases rapidly.</p>
      <p>Due to the use of the array technique, and the fact that
T40I10D100K is a sparse dataset, FPgrowth* turns out to
be faster than FPgrowth. However, when the minimum
support is very low, we can expect the FP-tree to achieve a good
compactification, starting at the initial recursion level. Thus
the array technique does not offer a big gain. Consequently,
as verified in Figure 8 (a), for very low levels minimum
support, FPgrowth* and FPgrowth have almost the same
running time.</p>
      <p>Figure 8 (b) shows the CPU time for running the four
algorithms on dataset T100I20D100K. The result is similar to
the result in Figure 8 (a). FPgrowth* is again the best. Since
the dataset T100I20D100K is sparser than T40I10D100K,
the speedup from FPgrowth to FPgrowth* is increased.</p>
      <p>From Figure 8 (c) and (d), we can see that the
FPmethods are faster than dEclat by an order of magnitude
1http://www.almaden.ibm.com/cs/quest/syndata.html
2http://www.almaden.ibm.com/cs/people/bayardo/resources.html
in both experiments. Since pumsb* and connect-4 are both
very dense datasets, FPgrowth* and FPgrowth have almost
same running time, as the array technique does not achieve
a significant speedup for dense datasets.</p>
      <p>In Figure 8 (c), the CPU time increases drastically when
the minimum support goes down below 25%. However, this
is not a problem for FPgrowth and FPgrowth*, which still
are able to produce results. The main reason for the
nevertheless steeply increased CPU time is that a long time has
to be spent listing frequent itemsets. Recall, that if there is
a frequent “long” itemset of size `, then we have to generate
2` frequent sets from it.</p>
      <p>We also ran the four algorithms on many other datasets,
and we found that FPgrowth* was always the fastest.</p>
      <p>To see why FPgrowth* is the fastest, let us consider the
main operations in the algorithms. As discussed before,
FPgrowth* spends most of its time on constructing and
traversing FP-trees. The main operation in dEclat is to generate
new candidate FI’s by TID-array intersections. In MAFIA,
generating new candidate FI’s by bitvector and-operations
is the main work. Since FPgrowth* uses the compact
FPtree, further boosted by the array technique, the time it
spends constructing and traversing the trees, is less than the
time needed for TID-array intersections and bitvector
andoperations. Moreover, the main memory space needed for
storing FP-trees is far less than that for storing diffsets or
bitvectors. Thus FPgrowth* runs faster than the other two
algorithms, and it scales to very low levels of minimum
support.</p>
      <p>Figure 11 (a) shows the main memory consumption of
three algorithms by running them on dataset connect-4. We
can see that FP-growth* always use the least main memory.
And even for very low minimum support, it still uses a small
amount of main memory.</p>
    </sec>
    <sec id="sec-13">
      <title>5.2. MFI Mining</title>
      <p>
        In our paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we analyzed and verified the
performance of algorithm FPmax. We learned that FPmax
outperformed GenMax and MAFIA in some, but not all cases.
To see the impact of the new array technique and the new
subset checking function that we are using in FPmax*, in
the second set of experiments, we compared FPmax* with
FPmax, GenMax, and MAFIA.
      </p>
      <p>Figure 9 (a) gives the result for running these algorithms
on the sparse dataset T40I10D100K. We can see that
FPmax is slower than GenMax for all levels of minimum
support, while FPmax* outperforms GenMax by a factor of at
least two. Figure 9 (b) shows the results for the very sparse
dataset T100I20D100K, FPmax is the slowest algorithm,
while FPmax* is the fastest algorithm. Figure 9 (c) shows
that FPmax* is the fastest algorithm for the dense dataset
pumsb*, even though FPmax is the slowest algorithm on
this dataset for very low levels of minimum support. In
12.25
2</p>
      <p>Mining</p>
      <p>FI’s</p>
      <p>Mining MFI’s
1.5 1.25 1 0.75
Minimum Support (%)</p>
      <p>(a)</p>
      <p>T40I10D100K
FPMAX*
GenMax
MAFIA
FPMAX
1.5 1.25 1 0.75
Minimum Support (%)</p>
      <p>(a)</p>
      <p>T40I10D100K
FPclose
MAFIA
Charm
1000
100
10
1
1000
100
10
1
1000
100
10
1
0.1
Figure 9 (d), FPmax outperforms GenMax and MAFIA for
high levels of minimum support, but it is slow for very low
levels. FPmax*, on the other hand is about one to two
orders of magnitude faster than GenMax and MAFIA for all
levels of minimum support.</p>
      <p>All experiments in this second set show that the array
technique and the new subset checking function are indeed
very effective. Figure 11 (b) shows the main memory used
by three algorithms when running them on dataset
connect4. From the figure, we can see that FPmax* uses less main
memory than the other algorithms. Figure 11 (c) shows the
main memory used by FP-trees, MFI-trees and the whole
algorithm when running FPmax* on dataset connect-4. The
minimum support was set as 10%. In the figure, the last
point of the line for FP-tree is for the main memory of the
first FP-tree (T; ), since at this point the space for all
conditional FP-trees has been freed. The last point of the line for
MFI-tree is for the main memory of the MFI-tree that
contains whole set of MFI’s, i.e., M . The figure confirms our
;
analysis of main memory used by FPmax* in Section 3.5.</p>
      <p>We also run these four algorithms on many other
datasets, and we found that FPmax* always was the fastest
algorithm.</p>
    </sec>
    <sec id="sec-14">
      <title>5.3. CFI Mining</title>
      <p>In the third set of experiments, the performances of
FPclose, CHARM and MAFIA, with the option of mining
closed frequent itemset, were compared.</p>
      <p>Figure 10 shows the results of running FPclose, CHARM
and MAFIA on datasets T40I10D100K, T100I20D100K,
pumsb* and connect-4. FPclose shows good performance
on all datasets, due to the fact that it uses the compact
FPtree and the array technique. However, for very low
levels of minimum support FPclose has performance similar to
CHARM and MAFIA. By analyzing the three algorithms,
we found that FPclose generates more non-closed frequent
itemsets than the other algorithms. For each of the
generated frequent itemsets, the function closed checking must
be called. Although the closed checking function is very
efficient, the increased number of calls to it means higher
total running time. For high levels of minimum support,
the time saved by using the compact FP-tree and the
array technique compensates for the time FPclose spends on
closed checking. In all cases, FPclose uses less main
memory for mining CFI’s than CHARM and MAFIA. Figure 11
(d) shows the memory used by three algorithms by
running them on dataset connnect-4. We can see that for very
low levels of minimum support, CHARM and MAFIA were
aborted because they ran out of memory, while FPclose was
still able to run and produce output.</p>
    </sec>
    <sec id="sec-15">
      <title>6. Conclusions</title>
      <p>We have introduced a novel array-based technique that
allows using FP-trees more efficiently when mining
frequent itemsets. Our technique greatly reduces the time
spent traversing FP-trees, and works especially well for
sparse datasets. Furthermore, we presented new algorithms
for mining maximal and closed frequent itemsets.</p>
      <p>The FPgrowth* algorithm, which extends original
FPgrowth method, also uses the novel array technique to mine
all frequent itemsets.</p>
      <p>For mining maximal frequent itemsets, we extended our
earlier algorithm FPmax to FPmax*. FPmax* not only uses
the array technique, but also a new subset-testing algorithm.
For the subset testing, a variation of the FP-tree, an
MFItree, is used for storing all already discovered MFI’s. In
FPmax*, a newly found FI is always compared with a small set
of MFI’s that are kept in an MFI-tree, thus making
subsettesting much more efficient.</p>
      <p>For mining closed frequent itemsets we give the FPclose
algorithm. In the algorithm, a CFI-tree —another variation
of a FP-tree— is used for testing the closedness of frequent
itemsets.</p>
      <p>For all of our algorithms we have presented several
optimizations that further reduce their running time.</p>
      <p>Our experimental results showed that FPgrowth* and
FPmax* always outperforms existing algorithms. FPclose
also demonstrates extremely good performance. All of the
algorithms need less main memory because of the compact
FP-trees, MFI-trees, and CFI-trees.</p>
      <p>Though the experimental results given in this paper show
the success of our algorithms, in the future we will test them
on more applications to further study their performance. We
are also planning to explore ways to improve the FPclose
algorithm by reducing the number of closedness-tests needed.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          .
          <article-title>Fast algorithms for mining association rules</article-title>
          .
          <source>In Proceedings of VLDB'94</source>
          , pages
          <fpage>487</fpage>
          -
          <lpage>499</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Bayardo</surname>
          </string-name>
          , Jr.
          <article-title>Efficiently mining long patterns from databases</article-title>
          .
          <source>In Proceedings of ACM SIGMOD'98</source>
          , pages
          <fpage>85</fpage>
          -
          <lpage>93</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Calimlim</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Gehrke.</surname>
          </string-name>
          <article-title>MAFIA: A maximal frequent itemset algorithm for transactional databases</article-title>
          .
          <source>In Proceedings of ICDE'01</source>
          , pages
          <fpage>443</fpage>
          -
          <lpage>452</lpage>
          , Apr.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gouda</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          .
          <article-title>Efficiently mining maximal frequent itemsets</article-title>
          .
          <source>In Proceedings of ICDM'01</source>
          , San Jose, CA,
          <year>Nov</year>
          .
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Grahne</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Zhu.</surname>
          </string-name>
          <article-title>High performance mining of maximal frequent itemsets</article-title>
          .
          <source>In SIAM'03 Workshop on High Performance Data Mining: Pervasive and Data Stream Mining</source>
          , May
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Pei, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          .
          <article-title>Mining frequent patterns without candidate generation</article-title>
          .
          <source>In Proceedings of ACM SIGMOD'00</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          , May
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            .
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Tzvetkov. Mining</surname>
          </string-name>
          top
          <article-title>-k frequent closed patterns without minimum support</article-title>
          .
          <source>In Proceedings of ICDM'02</source>
          , pages
          <fpage>211</fpage>
          -
          <lpage>218</lpage>
          , Dec.
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          , J. Han, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mao</surname>
          </string-name>
          . CLOSET:
          <article-title>An efficient algorithm for mining frequent closed itemsets</article-title>
          .
          <source>In ACM SIGMOD'00 Workshop on Research Issues in Data Mining and Knowledge Discovery</source>
          , pages
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , J. Han, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          . Closet+:
          <article-title>Searching for the best strategies for mining frequent closed itemsets</article-title>
          .
          <source>In Proceedings of ACM SIGKDD'03</source>
          , Washington, DC,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaki</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gouda</surname>
          </string-name>
          .
          <article-title>Fast vertical mining using diffsets</article-title>
          .
          <source>In Proceedings of ACM SIGKDD'03</source>
          , Washington, DC, Aug.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaki</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Hsiao</surname>
          </string-name>
          . Charm:
          <article-title>An efficient algorithm for closed itemset mining</article-title>
          .
          <source>In Proceedings of SIAM'02</source>
          ,
          <string-name>
            <surname>Arlington</surname>
          </string-name>
          , Apr.
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>