=Paper= {{Paper |id=Vol-90/paper-4 |storemode=property |title=Detailed Description of an Algorithm for Enumeration of Maximal Frequent Sets with Irredundant Dualization |pdfUrl=https://ceur-ws.org/Vol-90/satoh.pdf |volume=Vol-90 |dblpUrl=https://dblp.org/rec/conf/fimi/UnoS03 }} ==Detailed Description of an Algorithm for Enumeration of Maximal Frequent Sets with Irredundant Dualization== https://ceur-ws.org/Vol-90/satoh.pdf
 Detailed Description of an Algorithm for Enumeration of Maximal
           Frequent Sets with Irredundant Dualization

                                      Takeaki Uno, Ken Satoh
                                   National Institute of Informatics
                       2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
                                    Email: uno, ksatoh@nii.ac.jp


                     Abstract                                 avoids checking all frequent item sets. However, this
We describe an implementation of an algorithm for             algorithm solves dualization problems many times,
enumerating all maximal frequent sets using irredun-          hence it is not fast for practical purpose. Moreover,
dant dualization, which is an improved version of that        the algorithm uses the dualization algorithm of Fred-
of Gunopulos et al. The algorithm of Gunopulos et             man and Khachiyan [Fredman96] which is said to be
al. solves many dualization problems, and takes long          slow in practice.
computation time. We interleaves dualization with                We improved the algorithm in [SatohUno03] by
the main algorithm, and reduce the computation time           using incremental dualization algorithms proposed
for dualization by as long as one dualization. This           by Kavvadias and Stavropoulos [Kavvadias99], and
also reduces the space complexity. Moreover, we ac-           Uno [Uno02]. We developed an algorithm by in-
celerate the computation by using sparseness.                 terleaving dualization with ﬁnding maximal frequent
                                                              sets. Roughly speaking, our algorithm solves one du-
1. Introduction                                               alization problem with the size |Bd+ |, in which Bd+
                                                              is the set of maximal frequent sets, while the algo-
Let E be an item set and T be a set of transactions           rithm of Gunopulos et al. solves |Bd+ | dualization
deﬁned on E. For an item set S ⊆ E, we denote the             problems with sizes from 1 through |Bd+ |. This re-
set of transactions including S by X(S). We deﬁne             duces the computation time by a factor of 1/|Bd+|.
the frequency of S by |X(S)|. For a given constant α,            To reduce the computation time more, we used
if an item set S satisﬁes |X(S)| ≥ α, then S is said to       Uno’s dualization algorithm [Uno02]. The experi-
be frequent. A frequent item set included in no other         mental computation time of Uno’s algorithm is lin-
frequent item set is said to be maximal. An item set          ear in the number of outputs, and O(|E|) per out-
not frequent is called infrequent. An infrequent item         put, while that of Kavvadias and Stavropoulos seems
set including no other infrequent item set is said to         to be O(|E|2 ). This reduces the computation time
be minimal.                                                   by a factor of 1/|E|. Moreover, we add an improve-
   This paper describes an implementation of an algo-         ment based on sparseness of input. By this, the
rithm for enumerating all maximal frequent sets us-           experimental computation time per output is re-
ing dualization in detail presented at [SatohUno03].          duced to O(ave(Bd+ )) where ave(Bd+ ) is the av-
The algorithm is an improved version of that of               erage size of maximal frequent sets. In summary,
Gunopulos et al. [Gunopulos97a, Gunopulos97b].                we reduced the computation time by a factor of
The algorithm computes maximal frequent sets based            ave(Bd+ ) / (|Bd+ | × |E|2 ) by using the combina-
on computing minimal transversals of a hyper-                 tion of the algorithm of Gunopulos et al. and the
graph, computing minimal hitting set, or, in                  algorithm of Kavvadias and Stavropoulos.
other words, computing a dualization of a mono-                  In the following sections, we describe our algorithm
tone function [Fredman96]. The algorithm ﬁnds all             and the computational result. Section 2 describes
minimal item sets not included in any current ob-             the algorithm of Gunopulos et al. and Section 3 de-
tained maximal frequent set by dualization. If a fre-         scribes our algorithm and Uno’s algorithm. Section 4
quent item set is in those minimal item sets, then the        explains our improvement using sparseness. Compu-
algorithm ﬁnds a new maximal frequent set includ-             tational experiments for FIMI’03 instances are shown
ing the frequent item set. In this way, the algorithm         in Section 5, and we conclude the paper in Section 6.

                                                          1
Dualize and Advance[Gunopulos97a]
1 Bd+ := {go up(∅)}
2 Compute M HS(Bd+ ).
3 If no set in M HS(Bd+ ) is frequent, output M HS(Bd+ ).
4 If there exists a frequent set S in M HS(Bd+ ), Bd+ := Bd+ ∪ {go up(S)} and go to 2.

                                     Figure 1: Dualize and Advance Algorithm


2. Enumerating maximal frequent sets                              Basically, the algorithm of Gunopulos et al. solves
     by dualization                                            dualization problems with sizes from 1 through
                                                               |Bd+ |. Although we can terminate dualization when
In this section, we describe the algorithm of Gunop-           we ﬁnd a new maximal frequent set, we may check
ulos et al. Explanations are also in [Gunopulos97a,            each minimal infrequent item set again and again.
Gunopulos97b, SatohUno03], however, those are                  This is one of the reasons that the algorithm of
written with general terms. In this section, we ex-            Gunopulos et al. is not fast in practice. In the next
plain in terms of frequent set mining.                         section, we propose a new algorithm obtained by in-
   Let Bd− be the set of minimal infrequent sets. For          terleaving gp up into a dualization algorithm. The
a subset family H of E, a hitting set HS of H is a set         algorithm basically solves one dualization problem of
such that for every S ∈ H, S ∩ HS = ∅. If a hitting            size |Bd+ |.
set includes no other hitting set, then it is said to be
minimal. We denote the set of all minimal hitting              3. Description of our algorithm
sets of H by M HS(H). We denote the complement
of a subset S w.r.t. E by S. For a subset family H,            The key lemma of our algorithm is the following.
we denote {S|S ∈ H} by H.
   There is a strong connection between the maximal            Lemma 1 [SatohUno03] Let Bd+        +
                                                                                           1 and Bd2 be sub-
                                                                         +       +     +
frequent sets and the minimal infrequent sets by the           sets of Bd . If Bd1 ⊆ Bd2 ,
minimal hitting set operation.
                                                                    M HS(Bd+       −         +
                                                                           1 ) ∩ Bd ⊆ M HS(Bd2 ) ∩ Bd
                                                                                                     −
Proposition 1 [Mannila96] Bd− = M HS(Bd+ )
                                                                  Suppose that we have already found minimal hit-
  Using the following proposition, Gunopulos et al.
                                                               ting sets corresponding to Bd+ of a subset Bd+ of
proposed an algorithm called Dualize and Advance               the maximal frequent sets. The above lemma means
shown in Fig. 1 to compute the maximal frequent
                                                               that if we add a maximal frequent set to Bd+ , any
sets [Gunopulos97a].
                                                               minimal hitting set we found which corresponds to a
Proposition 2 [Gunopulos97a] Let Bd+ ⊆ Bd+ .                   minimal infrequent set is still a minimal infrequent
Then, for every S ∈ M HS(Bd+ ), either S ∈ Bd−                 set. Therefore, if we can use an algorithm to visit
or S is frequent (but not both).                               each minimal hitting set based on an incremental ad-
                                                               dition of maximal frequent sets one by one, we no
   In the above algorithm, go up(S) for a subset S of          longer have to check the same minimal hitting set
E is a maximal frequent set which is computed as               again even if maximal frequent sets are newly found.
follows.                                                       The dualization algorithms proposed by Kavvadias
 1. Select one element e from S and check the fre-             and Stavropoulos [Kavvadias99] and Uno[Uno02] are
    quency of S ∪ {e}.                                         such kinds of algorithms. Using these algorithms, we
                                                               reduce the number of checks.
 2. If it is frequent, S := S ∪ {e} and go to 1.                  Let us show Uno’s algorithm [Uno02]. This is
 3. Otherwise, if there is no element e in S such that         an improved version of Kavvadias and Stavropou-
    S ∪ {e} is frequent, then return S.                        los’s algorithm [Kavvadias99]. Here we introduce
                                                               some notation. A set S ∈ H is called critical for
Proposition 3 [Gunopulos97a] The number of fre-                e ∈ hs, if S ∩ hs = {e}. We denote a family of
quency checks in the “Dualize and Advance” algo-               critical sets for e w.r.t. hs and H as crit(e, hs).
rithm to compute Bd+ is at most |Bd+ | · |Bd− | +              Note that mhs is a minimal hitting set of H if and
|Bd+ | · |E|2 .                                                only if for every e ∈ mhs, crit(e, mhs) is not empty.

                                                           2
global S0 , ..., Sm;
compute mhs(i, mhs) /* mhs is a minimal hitting set of S0 , ..., Si */
begin
1 if i == m then output mhs and return;
2 else if Si+1 ∩ mhs = ∅ then compute mhs(i + 1, mhs);
    else
    begin
3 for every e ∈ Si+1 do
4     if for every e ∈ mhs, there exists Sj ∈ crit(e , mhs), j ≤ i
         s.t. Sj does not contain e then
5        comupute mhs(i + 1, mhs ∪ {e});
    end
    return;
end

                              Figure 2: Algorithm to Enumerate Minimal Hitting Sets


Suppose that H = {S1 , ..., Sm}, and let M HSi be                following conditions.
M HS({S0 , ..., Si})(1 ≤ i ≤ n). We simply denote                (1) mhs = mhs
M HS(H) by M HS. A hitting set hs for {S1 , ..., Si}             (2) mhs = mhs ∪ {e}
is minimal if and only if crit(e, hs) ∩ {S1 , ..., Si} = ∅       In particular, no mhs has a child satisfying (1) and
for any e ∈ hs.                                                  a child satisfying (2).

Lemma 2 [Uno02] For any mhs ∈ M HSi (1 ≤ i ≤                        If mhs ∩ Si+1 = ∅ then mhs ∈ M HSi+1 , and (1)
n), there exists just one minimal hitting set mhs ∈             holds. If mhs ∩ Si+1 = ∅, then mhs ∈ M HSi+1 , and
M HSi−1 satisfying either of the following conditions            (2) can hold for some e ∈ Si+1 . If mhs = mhs ∪ {e}
(but not both),                                                  is a child of mhs, then for any e ∈ mhs, there is
                                                                 Sj ∈ crit(e , mhs), j ≤ i such that e ∈ Sj . From these
  • mhs = mhs.                                                  observations, we obtain the algorithm described in
                                                                 Fig. 2.
  • mhs = mhs \ {e} where crit(e, mhs) ∩
                                                                    An iteration of the algorithm in Fig. 2 takes:
    {S0 , ..., Si} = {Si }.
                                                                   • O(|mhs|) time for line 1.
   We call mhs the parent of mhs, and mhs a child of
mhs . Since the parent-child relationship is not cyclic,          • O(|Si+1 ∪ mhs|) time for line 2.
its graphic representation forms a forest in which each                                                          
                                                                   • O((|E| − |mhs|) ×             e ∈mhs |crit(e , mhs) ∩
of its connected components is a tree rooted at a min-               {S0 , ..., Si}|) time for lines 3 to 5, except for the
imal hitting set of M HS1 . We consider the trees as                 computation of crit.
traversal routes deﬁned for all minimal hitting sets
of all M HSi . These transversal routes can be traced               To compute crit quickly, we store crit(e, mhs) in
in a depth-ﬁrst manner by generating children of the             memory, and update them when we generate a recur-
current visiting minimal hitting set, hence we can               sive call. Note that this takes O(m) memory. Since
enumerate                                                        crit(e , mhs ∪ {e}) is obtained from crit(e , mhs) by
          all minimal hitting  sets of M HS in linear
time of i |M HSi |. Although i |M HSi | can be ex-               removing sets including e (i.e., crit(e , mhs ∪ {e}) =
ponential to |M HS|, such cases are expected       to be         {S|S ∈ crit(e , mhs), e ∈ Si+1 }), crit(e , mhs ∪ {e})
                                            
exceptional in practice. Experimentally, i |M HSi |              for all e can be computed in O(m) time. Hence
is linear in |M HS|.                                             the computation time of an iteration is bounded by
   To ﬁnd children of a minimal hitting set, we use the          O(|E| × m).
following proposition that immediately leads from                   Based on this dualization algorithm, we devel-
the above lemma.                                                 oped a maximal frequent sets enumeration algorithm.
                                                                 First, the algorithm sets the input H of the dual-
Proposition 4 [Uno02]                                            ization problem to the empty set. Then, the algo-
Any child mhs of mhs ∈ M HSi satisfies one of the               rithm solves the dualization in the same way as the

                                                             3
Irredundant Border Enumerator
global integer bdpnum; sets bd+       +
                                0 , bd1 ....;
main()
begin
   bdpnum := 0;
   construct bdp(0, ∅);
   output all the bd+
                    j (0 ≤ j ≤ bdpnum);
end
construct bdp(i, mhs)
begin
  if i == bdpnum /* minimal hitting set for ∪bdpnum
                                             j:=0   bd+
                                                      j is found */
     then goto 1 else goto 2
  1.   if mhs is not frequent, return; /* new Bd− element is found */
       bd+                               +
         bdpnum := go up2(mhs); /* new Bd element is found */
       bdpnum := bdpnum + 1; /* proceed to 2 */

  2.   if bd+
            i ∩ mhs = ∅ then construct bdp(i + 1, mhs);
       else
       begin
       for every e ∈ bd+
                       i do

          if bd+                                      +     +        +
               i ∪ {e} is a minimal hitting set of {bd0 , bd1 ..., bdi−1}) then construct bdp(i + 1, mhs ∪ {e});
       return;
end

                       Figure 3: Algorithm to Check Each Minimal Hitting Set Only Once


above algorithm. When a minimal hitting set mhs              Theorem 1 The computation time of IBE is
is found, the algorithm checks its frequency. If mhs         O(Dual(Bd+ ) + |Bd+ |g), where Dual(Bd   ¯ + ) is the
is frequent, the algorithm ﬁnds a maximal frequent           computation time of Uno’s algorithm for dualizing
set S including it, and adds S to H as a new element         Bd+ , and g is the computation time for go up.
of H. Now mhs is not a minimal hitting set since
S ∩ mhs = ∅. The algorithm continues generating a               Note also that, the space complexity of the IBE
recursive call to ﬁnd a minimal hitting set of the up-       algorithm is O(ΣS∈Bd+ |S|) since all we need to mem-
dated H. In the case that mhs is not frequent, from          orize is Bd+ and once a set in Bd− is checked, it is
Lemma 1, mhs continues to be a minimal hitting set           no longer need to be recorded. On the other hand,
even when H is updated. Hence, we backtrack and              Gunopulos et al. [Gunopulos97a] suggests a usage
ﬁnd other minimal hitting sets.                              of Fredman and Khachiyan’s algorithm [Fredman96]
   When the algorithm terminates, H is the set of            which needs a space of O(ΣS∈(Bd+ ∪Bd− ) |S|) since the
maximal frequent sets, and the set of all minimal            algorithm needs both Bd+ and Bd− at the last stage.
hitting sets the algorithm found is the set of minimal
infrequent sets. The recursive tree the algorithm gen-       4. Using sparseness
erated is a subtree of the recursive tree obtained by
Uno’s dualization algorithm inputting Bd+ , which is         In this section, we speed up the dualization phase
the set of the complement of maximal frequent sets.          of our algorithm by using a sparseness of H. In real
   This algorithm is described in Fig. 3. We call the        data, the sizes of maximal frequent sets are usually
algorithm Irredundant Border Enumerator (IBE al-             small. They are often bounded by a constant. We use
gorithm, for short).                                         this sparse structure for accelerating the algorithm.

                                                         4
global S0 , ..., Sm;
compute mhs(i, mhs) /* mhs is a minimal hitting set of S0 , ..., Si */
begin
1 if uncov(mhs) == ∅ then output mhs and return;
2 i := minimum index of uncov(mhs) ;
3 for every e ∈ mhs do
4     increase the counter of items in ∪S∈crit(mhs,e) S by one
    end
5 for every e ∈ mhs s.t. counter is increased by |mhs| do /* items included in all ∪S∈crit(mhs,e) S */
6     compute mhs(i + 1, mhs ∪ {e});
    return;
end

                          Figure 4: Improved Dualization Algorithm Using Sparseness


   First, we consider a way to reduce the computa-            sive call, we allocate memory for each variable used
tion time of iterations. Let us see the algorithm             in the recursive call. Hence, the memory required
described in Fig. 2. The bottle neck part of                  by the algorithm can be up to O(|E| × m). However,
the computation of an iteration of the algorithm is           experimentally the required memory is always linear
lines 3 to 5, which check the existence of a criti-           in the input size. Note that we can reduce the worst
cal set Sj ∈ crit(mhs, e ), j < i such that e ∈ Sj .         case memory complexity by some sophisticated algo-
To check   this condition for an item e ∈ mhs, we            rithms.
spend O( e ∈mhs |crit(mhs, e )|) time, hence this
check
       for all e ∈ mhs takes O((|E| − |mhs|) ×
                       
   e ∈mhs |crit(mhs, e )|) time.
                                                              5. Experiments
                                   
   Instead of this, we compute S∈crit(mhs,e) S for            In this section, we show some results of the com-
                                      
each e ∈ mhs. If and only if e ∈ S∈crit(mhs,e) S             putational experiments of our algorithms. We im-
for all e ∈ mhs, e satisﬁes
                              the condition of “if” at       plement our algorithm using C programming lan-
line 4. To compute S∈crit(mhs,e) S for all e ∈ mhs,           guage, and examined instances of FIMI2003. For
                     
we take O( e∈mhs S∈crit(mhs,e) |S|) time. In the              instances of KDD-cup 2000[KDDcup00], we com-
case of IBE algorithm, S is a maximal frequent set,           pared the results to the computational experiments of
                                                              CHARM[Zaki02], closed[Pei00], FP-growth[Han00],
hence the average size of |S| is expected to be small.
                                                              and Apriori[Agrawal96] shown in [Zheng01]. The ex-
The sizes of minimal infrequent sets are not greater
                                                              periments in [Zheng01] were done on a PC with a
than the maximum size of maximal frequent sets, and
they are usually smaller than the average size of the         Duron 550MHz CPU and 1GB RAM memory. Our
maximal frequent sets. Hence, |mhs| is also expected          experiments were done on a PC with a Pentium III
                                                              500MHz CPU and 256MB RAM memory, which is
to be small.
                                                              little slower than a Duron 550MHz CPU. The re-
   Second, we reduce the number of iterations. For            sults are shown in Figs. 4 – 14. Note that our algo-
mhs ⊆ E, we deﬁne uncov(mhs) by the set of S ∈ H              rithm uses at most 170MB for any following instance.
satisfying S ∩ mhs = ∅. If mhs ∩ Si = ∅, the iteration        We also show the number of frequent sets, frequent
inputting mhs and i does nothing but generates a              closed/maximal item sets, and minimal frequent sets.
recursive call with increasing i by one. This type of            In our experiments, IBE algorithm takes approx-
iterations should be skipped. Only iterations execut-         imately O(|Bd− | × ave(Bd+ )) time, while the com-
ing lines 3 to 5 are crucial. Hence, in each iteration,       putation time of other algorithms deeply depends on
we set i to the minimum index among uncov(mhs).               the number of frequent sets, the number of frequent
As a result of this, we need not execute  line 2, and        closed item sets, and the minimum support. We re-
the number
            of iterations is reduced from   i |M HSi |       call that ave(Bd+ ) is the average size of maximal
to | i M HSi |. We describe the improved algorithm            frequent sets. In some instances, our IBE algorithm
in Fig. 4.                                                    performs rather well compared to other algorithms.
  In our implementation, when we generate a recur-            In these cases, the number of maximal frequent item

                                                          5
sets is smaller than number of frequent item sets.          [KDDcup00] Kohavi, R., Brodley, C. E., Frasca, B.,
IBE algorithm seems to give a good performance for             Mason, L., and Zheng, Z., “KDD-Cup 2000 Or-
diﬃcult problems such that the number of maximal               ganizers’ Report: Peeling the Onion,” SIGKDD
frequent sets is very small rather than those of fre-          Explorations, 2(2), pp. 86-98, 2000.
quent item sets and frequent closed item sets.
                                                            [Mannila96] Mannila, H. and Toivonen, T., “On
                                                                an Algorithm for Finding All Interesting Sen-
6. Conclusion                                                   tences”, Cybernetics and Systems, Vol II, The
                                                                Thirteen European Meeting on Cybernetics and
In this paper, we describe the detailed im-                     Systems Research, pp. 973 – 978 (1996).
plementation method of our algorithm proposed
                                                            [Pei00] Pei, J., Han, J., Mao, R., “CLOSET: An Ef-
in [SatohUno03] and we give some experimental re-
                                                                 ﬁcient Algorithm for Mining Frequent Closed
sults on test data.
                                                                 Itemsets,” ACM SIGMOD Workshop on Re-
                                                                 search Issues in Data Mining and Knowledge
Acknowledgments                                                  Discovery 2000, pp. 21-30, 2000.
We are grateful to Heikki Mannila for participating
useful discussions about this research.                     [SatohUno03] Satoh, K., Uno, T., “Enumerating
                                                                 Maximal Frequent Sets using Irredundant Dual-
                                                                 ization”, Lecture Notes in Artificial Intelligence
References                                                       (Proc. of Discovery Science 2003), Springer-
                                                                 Varlag, pp. 192-201, 2003.
[Agrawal96] Agrawal, R., Mannila, H., Srikant, R.,
                                                            [Uno02] Uno, T., “A Practical Fast Algorithm
    Toivonen, H., and Verkamo, A. I., “Fast Dis-
                                                                for Enumerating Minimal Set Coverings”, SI-
    covery of Association Rules”, U. M. Fayyad,
                                                                GAL83, Information Processing Society of
    G. Piatetsky-Shapiro, P. Smyth, and R. Uthu-
                                                                Japan, pp. 9 – 16 (in Japanese) (2002).
    rusamy, (eds), Advances in Knowledge Discov-
    ery and Data Mining, chapter 12, pp. 307–328            [Zaki02] Zaki, M. J., Hsiao, C., “CHARM: An Eﬃ-
    (1996).                                                     cient Algorithm for Closed Itemset Mining,” 2nd
                                                                SIAM International Conference on Data Mining
[Fredman96] Fredman, M. L. and Khachiyan, L.,                   (SDM’02), pp. 457-473, 2002.
     “On the Complexity of Dualization of Mono-
     tone Disjunctive Normal Forms”, Journal of Al-         [Zheng01] Zheng, Z., Kohavi, R., and Mason, L.,
     gorithms 21(3), pp. 618 – 628 (1996)                       “Real World Performance of Association Rule
                                                                Algorithms,” KDD 2001, pp. 401-406, 2001.
[Gunopulos97a] Gunopulos, D., Khardon, R., Man-
    nila, H. and Toivonen, H., “Data mining, Hy-
    pergraph Transversals, and Machine Learning”,
    Proc. of PODS’97, pp. 209 – 216 (1997).

[Gunopulos97b] Gunopulos, D., Mannila, H., and
    Saluja, S., “Discovering All Most Speciﬁc Sen-
    tences using Randomized Algorithms”, Proc. of
    ICDT’97, pp. 215 – 229 (1997).

[Han00] Han, J., Pei, J., Yin, Y., “Mining Frequent
    Patterns without Candidate Generation,” SIG-
    MOD Conference 2000, pp. 1-12, 2000

[Kavvadias99] Kavvadias, D. J., and Stavropoulos,
    E. C., “Evaluation of an Algorithm for the
    Transversal Hypergraph Problem”, Algorithm
    Engineering, pp 72 – 84 (1999).

                                                        6
time(sec)                                                                                        time(sec)
10000
                                  BMS-WebView1                                                                                T10I4D100K
                                                                                               10000




 1000                                                                                          1000


                                                                                                                                                                      Apriori
                                                                               Apriori
  100                                                                                           100                                                                   FP-growth
                                                                               FP-growth
                                                                                                                                                                      closet
                                                                               closet
                                                                                                                                                                      CHARM
                                                                               CHARM
                                                                                                 10                                                                   IBE
   10                                                                          IBE


                                                                                                  1
    1




                                                                                                          0.1



                                                                                                                    0.08



                                                                                                                              0.06



                                                                                                                                       0.04



                                                                                                                                              0.02



                                                                                                                                                           0.01
                                                                                                                                                                     support
                                                                             support
         60



                       48



                                  36



                                             24



                                                         12



                                                                       6




  time(sec)
                                  BMS-WebView2                                                 time(sec                       T40I10D100K
100000
                                                                                               10000)


 10000



  1000                                                                       Apriori                                                                                  IBE
                                                                             FP-growth          1000
                                                                             closet
   100
                                                                             CHARM
                                                                             IBE
    10



     1                                                                                           100
                                                                            support                                                                               support
            77



                       62



                                  46



                                             31



                                                        15



                                                                   7




                                                                                                             1600          1300       1000           700




time(sec)                                                                                       time(sec)
                                  BMS-POS                                                                                            pumsb
100000                                                                                             10000



 10000

                                                                               Apiori                  1000
                                                                               FP-growth
  1000
                                                                               CHARM
                                                                               IBE
                                                                                                       100
   100
                                                                                                                                                                      IBE

    10
                                                                                                        10
                 517


                            413


                                       310


                                                  206


                                                             103


                                                                       51




                                                                            support                             45000 44000 43000 42000 41000 40000                    support




                                                                                           7
time(sec)                                                                                time(sec
10000                     pumsb_star                                                     10000)
                                                                                                                             connect


1000


                                                                                                                                                          IBE
                                                                                         1000
 100



                                                                            IBE
  10


                                                                                          100




                                                                                                0


                                                                                                         0


                                                                                                                 0


                                                                                                                         0


                                                                                                                                 0


                                                                                                                                         0


                                                                                                                                                 0
   1




                                                                                              00


                                                                                                       00


                                                                                                               00


                                                                                                                       00


                                                                                                                               00


                                                                                                                                       00


                                                                                                                                               00
                                                                           support                                                                    support




                                                                                            63


                                                                                                     60


                                                                                                             57


                                                                                                                     54


                                                                                                                             51


                                                                                                                                     48


                                                                                                                                             45
        30000   25000     20000          15000       10000




                                                                                         time(sec                             chess
                                                                                         10000)




time(sec)
10000                        kosarak                                                     1000
                                                                                                                                                          IBE


                                                                                          100


1000


                                                                                           10
                                                                                                    2200      1900      1600         1300      1000   support
                                                                            IBE


 100
                                                                          support
         3000      2500           2000           1500            1000




time(sec                          mushroom
1000 )




                                                                            IBE
 100




  10
          30       20         10                 5           2          support




                                                                                     8
 BMS-Web-View1: #item 497, #transactions, 59602, ave. size of transaction 2.51
          support     60    48     36     24     12       6
           Apriori   1.1   3.6    113      -      -       -
       FP-growth     1.2   1.8     51      -      -       -
            Closet    33    74      -      -      -       -
           Charm     2.2   2.7    7.9    133    422       -
              IBE    5.8   9.6     45     42    333    2982
      #freq. sets 3992 10287 461522        -      -       -
     #closed sets 3974 9391 64762 155651 422692 1240701
 #max. freq. sets 2067 4028 15179 12956 84833 129754
#min. infreq. sets 66629 81393 150278 212073 579508 4320003
maximum use of memory: 45MB
BMS-Web-View2: #items 3340, #transactions 77512, ave. size of transaction 4.62
          support      77     62      46      31      15       7
           Apriori   13.1     15    29.6    58.2     444       -
       Fp-growth     7.03     10    17.2    29.6     131     763
            Closet 1500 2250        3890    6840 25800         -
           Charm     5.82   6.66    7.63    13.8    27.2      76
              IBE      25     32      46      98     355    1426
     #closed sets 22976 37099 60352 116540 343818 754924
      #freq. sets 24143 42762 84334 180386 1599210 9897303
 #max. freq. sets 3901 5230         7841 16298 43837 118022
#min. infreq. sets 657461 958953 1440057 2222510 3674692 5506524
maximal use of memory: 100MB
 BMS-POS: #items 1657, #transactions 517255, ave. size of transaction 6.5
          support     517    413    310     206     103       51
           Apriori    251    341    541    1000    2371   10000
       Fp-growth      196    293    398     671    1778     6494
            Closet      -      -      -       -       -        -
           Charm      100    117    158     215     541     3162
              IBE 1714 2564 4409           9951 44328          -
     #closed sets 121879 200030 378217 840544 1742055 21885050
      #freq. sets 121956 200595 382663 984531 5301939 33399782
 #max. freq. sets 30564 48015 86175 201306 891763 4280416
#min. infreq. sets 236274 337309 530946 1047496 3518003        -
 maximum use of memory: 110MB
T10I4D100K: #items 1000, #transactions 100000, ave. size of transaction 10
          support     100     80     60      40      20    10
           Apriori     33     39     45      62     117   256
       Fp-growth      7.3    7.7    8.1     9.0      12    20
            Closet     13     16     18      23      41   130
           Charm       11     13     16      24      45    85
              IBE      96    147    263     567    1705      -
      #freq. sets 15010 28059 46646 84669 187679 335183
     #closed sets 13774 22944 38437 67537 131342 229029
 #max. freq. sets 7853 11311 16848 25937 50232 114114
#min. infreq. sets 392889 490203 736589 1462121 4776165      -
maximum use of memory: 60MB
T40I10D100K: #items 1000, #transactions 100000, ave. size of transaction 39.6
          support 1600 1300 1000            700
              IBE     378    552 1122      2238
      #freq. sets 4591 10110 65236 550126
     #closed sets 4591 10110 65236 548349
 #max. freq. sets 4003 6944 21692 41473
#min. infreq. sets 245719 326716 521417 1079237


                                                        9
maximum memory use: 74MB
pumsb: #items 7117, #transactions 49046, ave. size of transaction 74
          support 45000 44000 43000 42000
              IBE 301 582 1069 1840
      #freq. sets 1163 2993 7044 15757
     #closed sets 685 1655 3582 7013
 #max. freq. sets 144 288 541 932
#min. infreq. sets 7482 7737 8402 9468
  maximum use of memory: 70MB
pumsb star: #items 7117, #transactions 49046, ave. size of transaction 50
          support 30000 25000 20000 15000 10000 5000
              IBE      8    19     59     161     556 2947
      #freq. sets 165 627 21334 356945 >2G                -
     #closed sets     66 221 2314 14274 111849            -
 #max. freq. sets      4    17     81     315 1666 15683
#min. infreq. sets 7143 7355 8020 9635 19087 98938
  maximum use of memory: 44MB
kosarak: #items 41217, #transactions 990002, ave. size of transaction 8
          support 3000 2500 2000 1500 1000
              IBE 226       294     528     759 2101
      #freq. sets 4894 8561 34483 219725 711424
     #closed sets 4865 8503 31604 157393 496675
 #max. freq. sets 792 1146 2858 4204 16231
#min. infreq. sets 87974 120591 200195 406287 875391
  maximum use of memory: 170MB
mushroom: #items 120, #transactions 8124, ave. size of transaction 23
          support         30          20          10      5       2
              IBE        132        231          365    475     433
      #freq. sets 505205917 781458545 1662769667 >2G >2G
     #closed sets      91122     109304       145482 181243 230585
 #max. freq. sets      15232      21396        30809 34131 27299
#min. infreq. sets     66085      79481        81746 69945 31880
  maximum use of memory 47MB
connect: #items 130, #transactions 67577, ave. size of transaction 43
           support 63000 60000 57000 54000 51000 48000 45000
               IBE 229 391          640     893     1154 1381 1643
        #freq. sets 6327 41143 171239 541911 1436863          -     -
      #closed sets 1566 4372 9041 15210 23329                 -     -
  #max. freq. sets 152 269          464     671      913 1166 1466
 #min. infreq. sets 297 486         703     980     1291 1622 1969
  maximum use of memory: 60MB
chess: #items 76, #transactions 3196, ave. size of transaction 37
           support 2200 1900          1600     1300     1000
               IBE     19      61      176      555     2191
        #freq. sets 59181 278734 1261227 5764922 29442848
      #closed sets 28358 106125 366529 1247700 4445373
  #max. freq. sets 1047 3673 11209 35417 114382
 #min. infreq. sets 1725 5202 14969 46727 152317
  maximum use of memory: 50MB




                                                        10