=Paper=
{{Paper
|id=Vol-90/paper-4
|storemode=property
|title=Detailed Description of an Algorithm for Enumeration of Maximal Frequent Sets with Irredundant Dualization
|pdfUrl=https://ceur-ws.org/Vol-90/satoh.pdf
|volume=Vol-90
|dblpUrl=https://dblp.org/rec/conf/fimi/UnoS03
}}
==Detailed Description of an Algorithm for Enumeration of Maximal Frequent Sets with Irredundant Dualization==
Detailed Description of an Algorithm for Enumeration of Maximal
Frequent Sets with Irredundant Dualization
Takeaki Uno, Ken Satoh
National Institute of Informatics
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Email: uno, ksatoh@nii.ac.jp
Abstract avoids checking all frequent item sets. However, this
We describe an implementation of an algorithm for algorithm solves dualization problems many times,
enumerating all maximal frequent sets using irredun- hence it is not fast for practical purpose. Moreover,
dant dualization, which is an improved version of that the algorithm uses the dualization algorithm of Fred-
of Gunopulos et al. The algorithm of Gunopulos et man and Khachiyan [Fredman96] which is said to be
al. solves many dualization problems, and takes long slow in practice.
computation time. We interleaves dualization with We improved the algorithm in [SatohUno03] by
the main algorithm, and reduce the computation time using incremental dualization algorithms proposed
for dualization by as long as one dualization. This by Kavvadias and Stavropoulos [Kavvadias99], and
also reduces the space complexity. Moreover, we ac- Uno [Uno02]. We developed an algorithm by in-
celerate the computation by using sparseness. terleaving dualization with finding maximal frequent
sets. Roughly speaking, our algorithm solves one du-
1. Introduction alization problem with the size |Bd+ |, in which Bd+
is the set of maximal frequent sets, while the algo-
Let E be an item set and T be a set of transactions rithm of Gunopulos et al. solves |Bd+ | dualization
defined on E. For an item set S ⊆ E, we denote the problems with sizes from 1 through |Bd+ |. This re-
set of transactions including S by X(S). We define duces the computation time by a factor of 1/|Bd+|.
the frequency of S by |X(S)|. For a given constant α, To reduce the computation time more, we used
if an item set S satisfies |X(S)| ≥ α, then S is said to Uno’s dualization algorithm [Uno02]. The experi-
be frequent. A frequent item set included in no other mental computation time of Uno’s algorithm is lin-
frequent item set is said to be maximal. An item set ear in the number of outputs, and O(|E|) per out-
not frequent is called infrequent. An infrequent item put, while that of Kavvadias and Stavropoulos seems
set including no other infrequent item set is said to to be O(|E|2 ). This reduces the computation time
be minimal. by a factor of 1/|E|. Moreover, we add an improve-
This paper describes an implementation of an algo- ment based on sparseness of input. By this, the
rithm for enumerating all maximal frequent sets us- experimental computation time per output is re-
ing dualization in detail presented at [SatohUno03]. duced to O(ave(Bd+ )) where ave(Bd+ ) is the av-
The algorithm is an improved version of that of erage size of maximal frequent sets. In summary,
Gunopulos et al. [Gunopulos97a, Gunopulos97b]. we reduced the computation time by a factor of
The algorithm computes maximal frequent sets based ave(Bd+ ) / (|Bd+ | × |E|2 ) by using the combina-
on computing minimal transversals of a hyper- tion of the algorithm of Gunopulos et al. and the
graph, computing minimal hitting set, or, in algorithm of Kavvadias and Stavropoulos.
other words, computing a dualization of a mono- In the following sections, we describe our algorithm
tone function [Fredman96]. The algorithm finds all and the computational result. Section 2 describes
minimal item sets not included in any current ob- the algorithm of Gunopulos et al. and Section 3 de-
tained maximal frequent set by dualization. If a fre- scribes our algorithm and Uno’s algorithm. Section 4
quent item set is in those minimal item sets, then the explains our improvement using sparseness. Compu-
algorithm finds a new maximal frequent set includ- tational experiments for FIMI’03 instances are shown
ing the frequent item set. In this way, the algorithm in Section 5, and we conclude the paper in Section 6.
1
Dualize and Advance[Gunopulos97a]
1 Bd+ := {go up(∅)}
2 Compute M HS(Bd+ ).
3 If no set in M HS(Bd+ ) is frequent, output M HS(Bd+ ).
4 If there exists a frequent set S in M HS(Bd+ ), Bd+ := Bd+ ∪ {go up(S)} and go to 2.
Figure 1: Dualize and Advance Algorithm
2. Enumerating maximal frequent sets Basically, the algorithm of Gunopulos et al. solves
by dualization dualization problems with sizes from 1 through
|Bd+ |. Although we can terminate dualization when
In this section, we describe the algorithm of Gunop- we find a new maximal frequent set, we may check
ulos et al. Explanations are also in [Gunopulos97a, each minimal infrequent item set again and again.
Gunopulos97b, SatohUno03], however, those are This is one of the reasons that the algorithm of
written with general terms. In this section, we ex- Gunopulos et al. is not fast in practice. In the next
plain in terms of frequent set mining. section, we propose a new algorithm obtained by in-
Let Bd− be the set of minimal infrequent sets. For terleaving gp up into a dualization algorithm. The
a subset family H of E, a hitting set HS of H is a set algorithm basically solves one dualization problem of
such that for every S ∈ H, S ∩ HS = ∅. If a hitting size |Bd+ |.
set includes no other hitting set, then it is said to be
minimal. We denote the set of all minimal hitting 3. Description of our algorithm
sets of H by M HS(H). We denote the complement
of a subset S w.r.t. E by S. For a subset family H, The key lemma of our algorithm is the following.
we denote {S|S ∈ H} by H.
There is a strong connection between the maximal Lemma 1 [SatohUno03] Let Bd+ +
1 and Bd2 be sub-
+ + +
frequent sets and the minimal infrequent sets by the sets of Bd . If Bd1 ⊆ Bd2 ,
minimal hitting set operation.
M HS(Bd+ − +
1 ) ∩ Bd ⊆ M HS(Bd2 ) ∩ Bd
−
Proposition 1 [Mannila96] Bd− = M HS(Bd+ )
Suppose that we have already found minimal hit-
Using the following proposition, Gunopulos et al.
ting sets corresponding to Bd+ of a subset Bd+ of
proposed an algorithm called Dualize and Advance the maximal frequent sets. The above lemma means
shown in Fig. 1 to compute the maximal frequent
that if we add a maximal frequent set to Bd+ , any
sets [Gunopulos97a].
minimal hitting set we found which corresponds to a
Proposition 2 [Gunopulos97a] Let Bd+ ⊆ Bd+ . minimal infrequent set is still a minimal infrequent
Then, for every S ∈ M HS(Bd+ ), either S ∈ Bd− set. Therefore, if we can use an algorithm to visit
or S is frequent (but not both). each minimal hitting set based on an incremental ad-
dition of maximal frequent sets one by one, we no
In the above algorithm, go up(S) for a subset S of longer have to check the same minimal hitting set
E is a maximal frequent set which is computed as again even if maximal frequent sets are newly found.
follows. The dualization algorithms proposed by Kavvadias
1. Select one element e from S and check the fre- and Stavropoulos [Kavvadias99] and Uno[Uno02] are
quency of S ∪ {e}. such kinds of algorithms. Using these algorithms, we
reduce the number of checks.
2. If it is frequent, S := S ∪ {e} and go to 1. Let us show Uno’s algorithm [Uno02]. This is
3. Otherwise, if there is no element e in S such that an improved version of Kavvadias and Stavropou-
S ∪ {e} is frequent, then return S. los’s algorithm [Kavvadias99]. Here we introduce
some notation. A set S ∈ H is called critical for
Proposition 3 [Gunopulos97a] The number of fre- e ∈ hs, if S ∩ hs = {e}. We denote a family of
quency checks in the “Dualize and Advance” algo- critical sets for e w.r.t. hs and H as crit(e, hs).
rithm to compute Bd+ is at most |Bd+ | · |Bd− | + Note that mhs is a minimal hitting set of H if and
|Bd+ | · |E|2 . only if for every e ∈ mhs, crit(e, mhs) is not empty.
2
global S0 , ..., Sm;
compute mhs(i, mhs) /* mhs is a minimal hitting set of S0 , ..., Si */
begin
1 if i == m then output mhs and return;
2 else if Si+1 ∩ mhs = ∅ then compute mhs(i + 1, mhs);
else
begin
3 for every e ∈ Si+1 do
4 if for every e ∈ mhs, there exists Sj ∈ crit(e , mhs), j ≤ i
s.t. Sj does not contain e then
5 comupute mhs(i + 1, mhs ∪ {e});
end
return;
end
Figure 2: Algorithm to Enumerate Minimal Hitting Sets
Suppose that H = {S1 , ..., Sm}, and let M HSi be following conditions.
M HS({S0 , ..., Si})(1 ≤ i ≤ n). We simply denote (1) mhs = mhs
M HS(H) by M HS. A hitting set hs for {S1 , ..., Si} (2) mhs = mhs ∪ {e}
is minimal if and only if crit(e, hs) ∩ {S1 , ..., Si} = ∅ In particular, no mhs has a child satisfying (1) and
for any e ∈ hs. a child satisfying (2).
Lemma 2 [Uno02] For any mhs ∈ M HSi (1 ≤ i ≤ If mhs ∩ Si+1 = ∅ then mhs ∈ M HSi+1 , and (1)
n), there exists just one minimal hitting set mhs ∈ holds. If mhs ∩ Si+1 = ∅, then mhs ∈ M HSi+1 , and
M HSi−1 satisfying either of the following conditions (2) can hold for some e ∈ Si+1 . If mhs = mhs ∪ {e}
(but not both), is a child of mhs, then for any e ∈ mhs, there is
Sj ∈ crit(e , mhs), j ≤ i such that e ∈ Sj . From these
• mhs = mhs. observations, we obtain the algorithm described in
Fig. 2.
• mhs = mhs \ {e} where crit(e, mhs) ∩
An iteration of the algorithm in Fig. 2 takes:
{S0 , ..., Si} = {Si }.
• O(|mhs|) time for line 1.
We call mhs the parent of mhs, and mhs a child of
mhs . Since the parent-child relationship is not cyclic, • O(|Si+1 ∪ mhs|) time for line 2.
its graphic representation forms a forest in which each
• O((|E| − |mhs|) × e ∈mhs |crit(e , mhs) ∩
of its connected components is a tree rooted at a min- {S0 , ..., Si}|) time for lines 3 to 5, except for the
imal hitting set of M HS1 . We consider the trees as computation of crit.
traversal routes defined for all minimal hitting sets
of all M HSi . These transversal routes can be traced To compute crit quickly, we store crit(e, mhs) in
in a depth-first manner by generating children of the memory, and update them when we generate a recur-
current visiting minimal hitting set, hence we can sive call. Note that this takes O(m) memory. Since
enumerate crit(e , mhs ∪ {e}) is obtained from crit(e , mhs) by
all minimal hitting sets of M HS in linear
time of i |M HSi |. Although i |M HSi | can be ex- removing sets including e (i.e., crit(e , mhs ∪ {e}) =
ponential to |M HS|, such cases are expected to be {S|S ∈ crit(e , mhs), e ∈ Si+1 }), crit(e , mhs ∪ {e})
exceptional in practice. Experimentally, i |M HSi | for all e can be computed in O(m) time. Hence
is linear in |M HS|. the computation time of an iteration is bounded by
To find children of a minimal hitting set, we use the O(|E| × m).
following proposition that immediately leads from Based on this dualization algorithm, we devel-
the above lemma. oped a maximal frequent sets enumeration algorithm.
First, the algorithm sets the input H of the dual-
Proposition 4 [Uno02] ization problem to the empty set. Then, the algo-
Any child mhs of mhs ∈ M HSi satisfies one of the rithm solves the dualization in the same way as the
3
Irredundant Border Enumerator
global integer bdpnum; sets bd+ +
0 , bd1 ....;
main()
begin
bdpnum := 0;
construct bdp(0, ∅);
output all the bd+
j (0 ≤ j ≤ bdpnum);
end
construct bdp(i, mhs)
begin
if i == bdpnum /* minimal hitting set for ∪bdpnum
j:=0 bd+
j is found */
then goto 1 else goto 2
1. if mhs is not frequent, return; /* new Bd− element is found */
bd+ +
bdpnum := go up2(mhs); /* new Bd element is found */
bdpnum := bdpnum + 1; /* proceed to 2 */
2. if bd+
i ∩ mhs = ∅ then construct bdp(i + 1, mhs);
else
begin
for every e ∈ bd+
i do
if bd+ + + +
i ∪ {e} is a minimal hitting set of {bd0 , bd1 ..., bdi−1}) then construct bdp(i + 1, mhs ∪ {e});
return;
end
Figure 3: Algorithm to Check Each Minimal Hitting Set Only Once
above algorithm. When a minimal hitting set mhs Theorem 1 The computation time of IBE is
is found, the algorithm checks its frequency. If mhs O(Dual(Bd+ ) + |Bd+ |g), where Dual(Bd ¯ + ) is the
is frequent, the algorithm finds a maximal frequent computation time of Uno’s algorithm for dualizing
set S including it, and adds S to H as a new element Bd+ , and g is the computation time for go up.
of H. Now mhs is not a minimal hitting set since
S ∩ mhs = ∅. The algorithm continues generating a Note also that, the space complexity of the IBE
recursive call to find a minimal hitting set of the up- algorithm is O(ΣS∈Bd+ |S|) since all we need to mem-
dated H. In the case that mhs is not frequent, from orize is Bd+ and once a set in Bd− is checked, it is
Lemma 1, mhs continues to be a minimal hitting set no longer need to be recorded. On the other hand,
even when H is updated. Hence, we backtrack and Gunopulos et al. [Gunopulos97a] suggests a usage
find other minimal hitting sets. of Fredman and Khachiyan’s algorithm [Fredman96]
When the algorithm terminates, H is the set of which needs a space of O(ΣS∈(Bd+ ∪Bd− ) |S|) since the
maximal frequent sets, and the set of all minimal algorithm needs both Bd+ and Bd− at the last stage.
hitting sets the algorithm found is the set of minimal
infrequent sets. The recursive tree the algorithm gen- 4. Using sparseness
erated is a subtree of the recursive tree obtained by
Uno’s dualization algorithm inputting Bd+ , which is In this section, we speed up the dualization phase
the set of the complement of maximal frequent sets. of our algorithm by using a sparseness of H. In real
This algorithm is described in Fig. 3. We call the data, the sizes of maximal frequent sets are usually
algorithm Irredundant Border Enumerator (IBE al- small. They are often bounded by a constant. We use
gorithm, for short). this sparse structure for accelerating the algorithm.
4
global S0 , ..., Sm;
compute mhs(i, mhs) /* mhs is a minimal hitting set of S0 , ..., Si */
begin
1 if uncov(mhs) == ∅ then output mhs and return;
2 i := minimum index of uncov(mhs) ;
3 for every e ∈ mhs do
4 increase the counter of items in ∪S∈crit(mhs,e) S by one
end
5 for every e ∈ mhs s.t. counter is increased by |mhs| do /* items included in all ∪S∈crit(mhs,e) S */
6 compute mhs(i + 1, mhs ∪ {e});
return;
end
Figure 4: Improved Dualization Algorithm Using Sparseness
First, we consider a way to reduce the computa- sive call, we allocate memory for each variable used
tion time of iterations. Let us see the algorithm in the recursive call. Hence, the memory required
described in Fig. 2. The bottle neck part of by the algorithm can be up to O(|E| × m). However,
the computation of an iteration of the algorithm is experimentally the required memory is always linear
lines 3 to 5, which check the existence of a criti- in the input size. Note that we can reduce the worst
cal set Sj ∈ crit(mhs, e ), j < i such that e ∈ Sj . case memory complexity by some sophisticated algo-
To check this condition for an item e ∈ mhs, we rithms.
spend O( e ∈mhs |crit(mhs, e )|) time, hence this
check
for all e ∈ mhs takes O((|E| − |mhs|) ×
e ∈mhs |crit(mhs, e )|) time.
5. Experiments
Instead of this, we compute S∈crit(mhs,e) S for In this section, we show some results of the com-
each e ∈ mhs. If and only if e ∈ S∈crit(mhs,e) S putational experiments of our algorithms. We im-
for all e ∈ mhs, e satisfies
the condition of “if” at plement our algorithm using C programming lan-
line 4. To compute S∈crit(mhs,e) S for all e ∈ mhs, guage, and examined instances of FIMI2003. For
we take O( e∈mhs S∈crit(mhs,e) |S|) time. In the instances of KDD-cup 2000[KDDcup00], we com-
case of IBE algorithm, S is a maximal frequent set, pared the results to the computational experiments of
CHARM[Zaki02], closed[Pei00], FP-growth[Han00],
hence the average size of |S| is expected to be small.
and Apriori[Agrawal96] shown in [Zheng01]. The ex-
The sizes of minimal infrequent sets are not greater
periments in [Zheng01] were done on a PC with a
than the maximum size of maximal frequent sets, and
they are usually smaller than the average size of the Duron 550MHz CPU and 1GB RAM memory. Our
maximal frequent sets. Hence, |mhs| is also expected experiments were done on a PC with a Pentium III
500MHz CPU and 256MB RAM memory, which is
to be small.
little slower than a Duron 550MHz CPU. The re-
Second, we reduce the number of iterations. For sults are shown in Figs. 4 – 14. Note that our algo-
mhs ⊆ E, we define uncov(mhs) by the set of S ∈ H rithm uses at most 170MB for any following instance.
satisfying S ∩ mhs = ∅. If mhs ∩ Si = ∅, the iteration We also show the number of frequent sets, frequent
inputting mhs and i does nothing but generates a closed/maximal item sets, and minimal frequent sets.
recursive call with increasing i by one. This type of In our experiments, IBE algorithm takes approx-
iterations should be skipped. Only iterations execut- imately O(|Bd− | × ave(Bd+ )) time, while the com-
ing lines 3 to 5 are crucial. Hence, in each iteration, putation time of other algorithms deeply depends on
we set i to the minimum index among uncov(mhs). the number of frequent sets, the number of frequent
As a result of this, we need not execute line 2, and closed item sets, and the minimum support. We re-
the number
of iterations is reduced from i |M HSi | call that ave(Bd+ ) is the average size of maximal
to | i M HSi |. We describe the improved algorithm frequent sets. In some instances, our IBE algorithm
in Fig. 4. performs rather well compared to other algorithms.
In our implementation, when we generate a recur- In these cases, the number of maximal frequent item
5
sets is smaller than number of frequent item sets. [KDDcup00] Kohavi, R., Brodley, C. E., Frasca, B.,
IBE algorithm seems to give a good performance for Mason, L., and Zheng, Z., “KDD-Cup 2000 Or-
difficult problems such that the number of maximal ganizers’ Report: Peeling the Onion,” SIGKDD
frequent sets is very small rather than those of fre- Explorations, 2(2), pp. 86-98, 2000.
quent item sets and frequent closed item sets.
[Mannila96] Mannila, H. and Toivonen, T., “On
an Algorithm for Finding All Interesting Sen-
6. Conclusion tences”, Cybernetics and Systems, Vol II, The
Thirteen European Meeting on Cybernetics and
In this paper, we describe the detailed im- Systems Research, pp. 973 – 978 (1996).
plementation method of our algorithm proposed
[Pei00] Pei, J., Han, J., Mao, R., “CLOSET: An Ef-
in [SatohUno03] and we give some experimental re-
ficient Algorithm for Mining Frequent Closed
sults on test data.
Itemsets,” ACM SIGMOD Workshop on Re-
search Issues in Data Mining and Knowledge
Acknowledgments Discovery 2000, pp. 21-30, 2000.
We are grateful to Heikki Mannila for participating
useful discussions about this research. [SatohUno03] Satoh, K., Uno, T., “Enumerating
Maximal Frequent Sets using Irredundant Dual-
ization”, Lecture Notes in Artificial Intelligence
References (Proc. of Discovery Science 2003), Springer-
Varlag, pp. 192-201, 2003.
[Agrawal96] Agrawal, R., Mannila, H., Srikant, R.,
[Uno02] Uno, T., “A Practical Fast Algorithm
Toivonen, H., and Verkamo, A. I., “Fast Dis-
for Enumerating Minimal Set Coverings”, SI-
covery of Association Rules”, U. M. Fayyad,
GAL83, Information Processing Society of
G. Piatetsky-Shapiro, P. Smyth, and R. Uthu-
Japan, pp. 9 – 16 (in Japanese) (2002).
rusamy, (eds), Advances in Knowledge Discov-
ery and Data Mining, chapter 12, pp. 307–328 [Zaki02] Zaki, M. J., Hsiao, C., “CHARM: An Effi-
(1996). cient Algorithm for Closed Itemset Mining,” 2nd
SIAM International Conference on Data Mining
[Fredman96] Fredman, M. L. and Khachiyan, L., (SDM’02), pp. 457-473, 2002.
“On the Complexity of Dualization of Mono-
tone Disjunctive Normal Forms”, Journal of Al- [Zheng01] Zheng, Z., Kohavi, R., and Mason, L.,
gorithms 21(3), pp. 618 – 628 (1996) “Real World Performance of Association Rule
Algorithms,” KDD 2001, pp. 401-406, 2001.
[Gunopulos97a] Gunopulos, D., Khardon, R., Man-
nila, H. and Toivonen, H., “Data mining, Hy-
pergraph Transversals, and Machine Learning”,
Proc. of PODS’97, pp. 209 – 216 (1997).
[Gunopulos97b] Gunopulos, D., Mannila, H., and
Saluja, S., “Discovering All Most Specific Sen-
tences using Randomized Algorithms”, Proc. of
ICDT’97, pp. 215 – 229 (1997).
[Han00] Han, J., Pei, J., Yin, Y., “Mining Frequent
Patterns without Candidate Generation,” SIG-
MOD Conference 2000, pp. 1-12, 2000
[Kavvadias99] Kavvadias, D. J., and Stavropoulos,
E. C., “Evaluation of an Algorithm for the
Transversal Hypergraph Problem”, Algorithm
Engineering, pp 72 – 84 (1999).
6
time(sec) time(sec)
10000
BMS-WebView1 T10I4D100K
10000
1000 1000
Apriori
Apriori
100 100 FP-growth
FP-growth
closet
closet
CHARM
CHARM
10 IBE
10 IBE
1
1
0.1
0.08
0.06
0.04
0.02
0.01
support
support
60
48
36
24
12
6
time(sec)
BMS-WebView2 time(sec T40I10D100K
100000
10000)
10000
1000 Apriori IBE
FP-growth 1000
closet
100
CHARM
IBE
10
1 100
support support
77
62
46
31
15
7
1600 1300 1000 700
time(sec) time(sec)
BMS-POS pumsb
100000 10000
10000
Apiori 1000
FP-growth
1000
CHARM
IBE
100
100
IBE
10
10
517
413
310
206
103
51
support 45000 44000 43000 42000 41000 40000 support
7
time(sec) time(sec
10000 pumsb_star 10000)
connect
1000
IBE
1000
100
IBE
10
100
0
0
0
0
0
0
0
1
00
00
00
00
00
00
00
support support
63
60
57
54
51
48
45
30000 25000 20000 15000 10000
time(sec chess
10000)
time(sec)
10000 kosarak 1000
IBE
100
1000
10
2200 1900 1600 1300 1000 support
IBE
100
support
3000 2500 2000 1500 1000
time(sec mushroom
1000 )
IBE
100
10
30 20 10 5 2 support
8
BMS-Web-View1: #item 497, #transactions, 59602, ave. size of transaction 2.51
support 60 48 36 24 12 6
Apriori 1.1 3.6 113 - - -
FP-growth 1.2 1.8 51 - - -
Closet 33 74 - - - -
Charm 2.2 2.7 7.9 133 422 -
IBE 5.8 9.6 45 42 333 2982
#freq. sets 3992 10287 461522 - - -
#closed sets 3974 9391 64762 155651 422692 1240701
#max. freq. sets 2067 4028 15179 12956 84833 129754
#min. infreq. sets 66629 81393 150278 212073 579508 4320003
maximum use of memory: 45MB
BMS-Web-View2: #items 3340, #transactions 77512, ave. size of transaction 4.62
support 77 62 46 31 15 7
Apriori 13.1 15 29.6 58.2 444 -
Fp-growth 7.03 10 17.2 29.6 131 763
Closet 1500 2250 3890 6840 25800 -
Charm 5.82 6.66 7.63 13.8 27.2 76
IBE 25 32 46 98 355 1426
#closed sets 22976 37099 60352 116540 343818 754924
#freq. sets 24143 42762 84334 180386 1599210 9897303
#max. freq. sets 3901 5230 7841 16298 43837 118022
#min. infreq. sets 657461 958953 1440057 2222510 3674692 5506524
maximal use of memory: 100MB
BMS-POS: #items 1657, #transactions 517255, ave. size of transaction 6.5
support 517 413 310 206 103 51
Apriori 251 341 541 1000 2371 10000
Fp-growth 196 293 398 671 1778 6494
Closet - - - - - -
Charm 100 117 158 215 541 3162
IBE 1714 2564 4409 9951 44328 -
#closed sets 121879 200030 378217 840544 1742055 21885050
#freq. sets 121956 200595 382663 984531 5301939 33399782
#max. freq. sets 30564 48015 86175 201306 891763 4280416
#min. infreq. sets 236274 337309 530946 1047496 3518003 -
maximum use of memory: 110MB
T10I4D100K: #items 1000, #transactions 100000, ave. size of transaction 10
support 100 80 60 40 20 10
Apriori 33 39 45 62 117 256
Fp-growth 7.3 7.7 8.1 9.0 12 20
Closet 13 16 18 23 41 130
Charm 11 13 16 24 45 85
IBE 96 147 263 567 1705 -
#freq. sets 15010 28059 46646 84669 187679 335183
#closed sets 13774 22944 38437 67537 131342 229029
#max. freq. sets 7853 11311 16848 25937 50232 114114
#min. infreq. sets 392889 490203 736589 1462121 4776165 -
maximum use of memory: 60MB
T40I10D100K: #items 1000, #transactions 100000, ave. size of transaction 39.6
support 1600 1300 1000 700
IBE 378 552 1122 2238
#freq. sets 4591 10110 65236 550126
#closed sets 4591 10110 65236 548349
#max. freq. sets 4003 6944 21692 41473
#min. infreq. sets 245719 326716 521417 1079237
9
maximum memory use: 74MB
pumsb: #items 7117, #transactions 49046, ave. size of transaction 74
support 45000 44000 43000 42000
IBE 301 582 1069 1840
#freq. sets 1163 2993 7044 15757
#closed sets 685 1655 3582 7013
#max. freq. sets 144 288 541 932
#min. infreq. sets 7482 7737 8402 9468
maximum use of memory: 70MB
pumsb star: #items 7117, #transactions 49046, ave. size of transaction 50
support 30000 25000 20000 15000 10000 5000
IBE 8 19 59 161 556 2947
#freq. sets 165 627 21334 356945 >2G -
#closed sets 66 221 2314 14274 111849 -
#max. freq. sets 4 17 81 315 1666 15683
#min. infreq. sets 7143 7355 8020 9635 19087 98938
maximum use of memory: 44MB
kosarak: #items 41217, #transactions 990002, ave. size of transaction 8
support 3000 2500 2000 1500 1000
IBE 226 294 528 759 2101
#freq. sets 4894 8561 34483 219725 711424
#closed sets 4865 8503 31604 157393 496675
#max. freq. sets 792 1146 2858 4204 16231
#min. infreq. sets 87974 120591 200195 406287 875391
maximum use of memory: 170MB
mushroom: #items 120, #transactions 8124, ave. size of transaction 23
support 30 20 10 5 2
IBE 132 231 365 475 433
#freq. sets 505205917 781458545 1662769667 >2G >2G
#closed sets 91122 109304 145482 181243 230585
#max. freq. sets 15232 21396 30809 34131 27299
#min. infreq. sets 66085 79481 81746 69945 31880
maximum use of memory 47MB
connect: #items 130, #transactions 67577, ave. size of transaction 43
support 63000 60000 57000 54000 51000 48000 45000
IBE 229 391 640 893 1154 1381 1643
#freq. sets 6327 41143 171239 541911 1436863 - -
#closed sets 1566 4372 9041 15210 23329 - -
#max. freq. sets 152 269 464 671 913 1166 1466
#min. infreq. sets 297 486 703 980 1291 1622 1969
maximum use of memory: 60MB
chess: #items 76, #transactions 3196, ave. size of transaction 37
support 2200 1900 1600 1300 1000
IBE 19 61 176 555 2191
#freq. sets 59181 278734 1261227 5764922 29442848
#closed sets 28358 106125 366529 1247700 4445373
#max. freq. sets 1047 3673 11209 35417 114382
#min. infreq. sets 1725 5202 14969 46727 152317
maximum use of memory: 50MB
10