<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Mannila, H. and Toivonen, T., “On
an Algorithm for Finding All Interesting Sen-
tences”, Cybernetics and Systems, Vol II, The
Thirteen European Meeting on Cybernetics and
Systems Research, pp.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Detailed Description of an Algorithm for Enumeration of Maximal Frequent Sets with Irredundant Dualization</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Takeaki Uno, Ken Satoh National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo</institution>
          ,
          <addr-line>101-8430</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2000</year>
      </pub-date>
      <volume>978</volume>
      <issue>1996</issue>
      <fpage>21</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>We describe an implementation of an algorithm for enumerating all maximal frequent sets using irredundant dualization, which is an improved version of that of Gunopulos et al. The algorithm of Gunopulos et al. solves many dualization problems, and takes long computation time. We interleaves dualization with the main algorithm, and reduce the computation time for dualization by as long as one dualization. This also reduces the space complexity. Moreover, we accelerate the computation by using sparseness.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Let E be an item set and T be a set of transactions
defined on E. For an item set S ⊆ E, we denote the
set of transactions including S by X(S). We define
the frequency of S by |X(S)|. For a given constant α,
if an item set S satisfies |X(S)| ≥ α, then S is said to
be frequent. A frequent item set included in no other
frequent item set is said to be maximal. An item set
not frequent is called infrequent. An infrequent item
set including no other infrequent item set is said to
be minimal.</p>
      <p>This paper describes an implementation of an
algorithm for enumerating all maximal frequent sets
using dualization in detail presented at [SatohUno03].</p>
      <p>The algorithm is an improved version of that of
Gunopulos et al. [Gunopulos97a, Gunopulos97b].</p>
      <p>The algorithm computes maximal frequent sets based
on computing minimal transversals of a
hypergraph, computing minimal hitting set, or, in
other words, computing a dualization of a
monotone function [Fredman96]. The algorithm finds all
minimal item sets not included in any current
obtained maximal frequent set by dualization. If a
frequent item set is in those minimal item sets, then the
algorithm finds a new maximal frequent set
including the frequent item set. In this way, the algorithm
avoids checking all frequent item sets. However, this
algorithm solves dualization problems many times,
hence it is not fast for practical purpose. Moreover,
the algorithm uses the dualization algorithm of
Fredman and Khachiyan [Fredman96] which is said to be
slow in practice.</p>
      <p>We improved the algorithm in [SatohUno03] by
using incremental dualization algorithms proposed
by Kavvadias and Stavropoulos [Kavvadias99], and
Uno [Uno02]. We developed an algorithm by
interleaving dualization with finding maximal frequent
sets. Roughly speaking, our algorithm solves one
dualization problem with the size |Bd+|, in which Bd+
is the set of maximal frequent sets, while the
algorithm of Gunopulos et al. solves |Bd+| dualization
problems with sizes from 1 through |Bd+|. This
reduces the computation time by a factor of 1/|Bd+|.</p>
      <p>To reduce the computation time more, we used
Uno’s dualization algorithm [Uno02]. The
experimental computation time of Uno’s algorithm is
linear in the number of outputs, and O(|E|) per
output, while that of Kavvadias and Stavropoulos seems
to be O(|E|2). This reduces the computation time
by a factor of 1/|E|. Moreover, we add an
improvement based on sparseness of input. By this, the
experimental computation time per output is
reduced to O(ave(Bd+)) where ave(Bd+) is the
average size of maximal frequent sets. In summary,
we reduced the computation time by a factor of
ave(Bd+) / (|Bd+| × |E|2) by using the
combination of the algorithm of Gunopulos et al. and the
algorithm of Kavvadias and Stavropoulos.</p>
      <p>In the following sections, we describe our algorithm
and the computational result. Section 2 describes
the algorithm of Gunopulos et al. and Section 3
describes our algorithm and Uno’s algorithm. Section 4
explains our improvement using sparseness.
Computational experiments for FIMI’03 instances are shown
in Section 5, and we conclude the paper in Section 6.</p>
      <p>Dualize and Advance[Gunopulos97a]
1 Bd+ := {go up(∅)}
2 Compute M HS(Bd+).
3 If no set in M HS(Bd+) is frequent, output M HS(Bd+).
4 If there exists a frequent set S in M HS(Bd+), Bd+ := Bd+ ∪ {go up(S)} and go to 2.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Enumerating maximal frequent sets</title>
      <p>by dualization
In this section, we describe the algorithm of
Gunopulos et al. Explanations are also in [Gunopulos97a,
Gunopulos97b, SatohUno03], however, those are
written with general terms. In this section, we
explain in terms of frequent set mining.</p>
      <p>Let Bd− be the set of minimal infrequent sets. For
a subset family H of E, a hitting set HS of H is a set
such that for every S ∈ H, S ∩ HS = ∅. If a hitting
set includes no other hitting set, then it is said to be
minimal. We denote the set of all minimal hitting
sets of H by M HS(H). We denote the complement
of a subset S w.r.t. E by S. For a subset family H,
we denote {S|S ∈ H} by H.</p>
      <p>There is a strong connection between the maximal
frequent sets and the minimal infrequent sets by the
minimal hitting set operation.</p>
      <p>Proposition 1 [Mannila96] Bd− = M HS(Bd+)</p>
      <p>Using the following proposition, Gunopulos et al.
proposed an algorithm called Dualize and Advance
shown in Fig. 1 to compute the maximal frequent
sets [Gunopulos97a].</p>
      <p>Proposition 2 [Gunopulos97a] Let Bd+ ⊆ Bd+.
Then, for every S ∈ M HS(Bd+), either S ∈ Bd−
or S is frequent (but not both).</p>
      <p>In the above algorithm, go up(S) for a subset S of
E is a maximal frequent set which is computed as
follows.</p>
      <p>1. Select one element e from S and check the
frequency of S ∪ {e}.
2. If it is frequent, S := S ∪ {e} and go to 1.
3. Otherwise, if there is no element e in S such that</p>
      <p>S ∪ {e} is frequent, then return S.</p>
      <p>Proposition 3 [Gunopulos97a] The number of
frequency checks in the “Dualize and Advance”
algorithm to compute Bd+ is at most |Bd+| · |Bd−| +
|Bd+| · |E|2.</p>
      <p>Basically, the algorithm of Gunopulos et al. solves
dualization problems with sizes from 1 through
|Bd+|. Although we can terminate dualization when
we find a new maximal frequent set, we may check
each minimal infrequent item set again and again.
This is one of the reasons that the algorithm of
Gunopulos et al. is not fast in practice. In the next
section, we propose a new algorithm obtained by
interleaving gp up into a dualization algorithm. The
algorithm basically solves one dualization problem of
size |Bd+|.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Description of our algorithm</title>
      <p>The key lemma of our algorithm is the following.
Lemma 1 [SatohUno03] Let Bd1+ and Bd2+ be
sub+ +
sets of Bd+. If Bd1 ⊆ Bd2 ,</p>
      <p>M HS(Bd1+) ∩ Bd− ⊆ M HS(Bd2+) ∩ Bd−
Suppose that we have already found minimal
hitting sets corresponding to Bd+ of a subset Bd+ of
the maximal frequent sets. The above lemma means
that if we add a maximal frequent set to Bd+, any
minimal hitting set we found which corresponds to a
minimal infrequent set is still a minimal infrequent
set. Therefore, if we can use an algorithm to visit
each minimal hitting set based on an incremental
addition of maximal frequent sets one by one, we no
longer have to check the same minimal hitting set
again even if maximal frequent sets are newly found.
The dualization algorithms proposed by Kavvadias
and Stavropoulos [Kavvadias99] and Uno[Uno02] are
such kinds of algorithms. Using these algorithms, we
reduce the number of checks.</p>
      <p>Let us show Uno’s algorithm [Uno02]. This is
an improved version of Kavvadias and
Stavropoulos’s algorithm [Kavvadias99]. Here we introduce
some notation. A set S ∈ H is called critical for
e ∈ hs, if S ∩ hs = {e}. We denote a family of
critical sets for e w.r.t. hs and H as crit(e, hs).
Note that mhs is a minimal hitting set of H if and
only if for every e ∈ mhs, crit(e, mhs) is not empty.
end
end
return;
Suppose that H = {S1, ..., Sm}, and let M HSi be
M HS({S0, ..., Si})(1 ≤ i ≤ n). We simply denote
M HS(H) by M HS. A hitting set hs for {S1, ..., Si}
is minimal if and only if crit(e, hs) ∩ {S1, ..., Si} = ∅
for any e ∈ hs.</p>
      <p>Lemma 2 [Uno02] For any mhs ∈ M HSi(1 ≤ i ≤
n), there exists just one minimal hitting set mhs ∈
M HSi−1 satisfying either of the following conditions
(but not both),
• mhs = mhs.
• mhs = mhs \ {e}
{S0, ..., Si} = {Si}.</p>
      <p>where crit(e, mhs) ∩</p>
      <p>We call mhs the parent of mhs, and mhs a child of
mhs . Since the parent-child relationship is not cyclic,
its graphic representation forms a forest in which each
of its connected components is a tree rooted at a
minimal hitting set of M HS1. We consider the trees as
traversal routes defined for all minimal hitting sets
of all M HSi. These transversal routes can be traced
in a depth-first manner by generating children of the
current visiting minimal hitting set, hence we can
enumerate all minimal hitting sets of M HS in linear
time of i |M HSi|. Although i |M HSi| can be
exponential to |M HS|, such cases are expected to be
exceptional in practice. Experimentally, i |M HSi|
is linear in |M HS|.</p>
      <p>To find children of a minimal hitting set, we use the
following proposition that immediately leads from
the above lemma.</p>
      <sec id="sec-3-1">
        <title>Proposition 4 [Uno02]</title>
        <p>Any child mhs of mhs ∈ M HSi satisfies one of the
following conditions.
(1) mhs = mhs
(2) mhs = mhs ∪ {e}
In particular, no mhs has a child satisfying (1) and
a child satisfying (2).</p>
        <p>If mhs ∩ Si+1 = ∅ then mhs ∈ M HSi+1, and (1)
holds. If mhs ∩ Si+1 = ∅, then mhs ∈ M HSi+1, and
(2) can hold for some e ∈ Si+1. If mhs = mhs ∪ {e}
is a child of mhs, then for any e ∈ mhs, there is
Sj ∈ crit(e , mhs), j ≤ i such that e ∈ Sj . From these
observations, we obtain the algorithm described in
Fig. 2.</p>
        <p>An iteration of the algorithm in Fig. 2 takes:
• O(|mhs|) time for line 1.
• O(|Si+1 ∪ mhs|) time for line 2.
• O((|E| − |mhs|) × e ∈mhs |crit(e , mhs) ∩
{S0, ..., Si}|) time for lines 3 to 5, except for the
computation of crit.</p>
        <p>To compute crit quickly, we store crit(e, mhs) in
memory, and update them when we generate a
recursive call. Note that this takes O(m) memory. Since
crit(e , mhs ∪ {e}) is obtained from crit(e , mhs) by
removing sets including e (i.e., crit(e , mhs ∪ {e}) =
{S|S ∈ crit(e , mhs), e ∈ Si+1}), crit(e , mhs ∪ { }
e )
for all e can be computed in O(m) time. Hence
the computation time of an iteration is bounded by
O(|E| × m).</p>
        <p>Based on this dualization algorithm, we
developed a maximal frequent sets enumeration algorithm.</p>
        <p>First, the algorithm sets the input H of the
dualization problem to the empty set. Then, the
algorithm solves the dualization in the same way as the
Irredundant Border Enumerator
global integer bdpnum; sets bd0+, bd1+....;
main()
begin
bdpnum := 0;
construct bdp(0, ∅);
output all the bdj+(0 ≤ j ≤ bdpnum);
end
construct bdp(i, mhs)
begin</p>
        <p>bdpnumbdj+ is found */
if i == bdpnum /* minimal hitting set for ∪j:=0
then goto 1 else goto 2
if mhs is not frequent, return; /* new Bd− element is found */
bdb+dpnum := go up2(mhs); /* new Bd+ element is found */
bdpnum := bdpnum + 1; /* proceed to 2 */
if bdi+ ∩ mhs = ∅ then construct bdp(i + 1, mhs);
end
else
begin
return;
for every e ∈ bdi+ do
if bdi+ ∪ {e} is a minimal hitting set of {bd0+, bd1+..., bdi+−1}) then construct bdp(i + 1, mhs ∪ {e});
above algorithm. When a minimal hitting set mhs
is found, the algorithm checks its frequency. If mhs
is frequent, the algorithm finds a maximal frequent
set S including it, and adds S to H as a new element
of H. Now mhs is not a minimal hitting set since
S ∩ mhs = ∅. The algorithm continues generating a
recursive call to find a minimal hitting set of the
updated H. In the case that mhs is not frequent, from
Lemma 1, mhs continues to be a minimal hitting set
even when H is updated. Hence, we backtrack and
find other minimal hitting sets.</p>
        <p>When the algorithm terminates, H is the set of
maximal frequent sets, and the set of all minimal
hitting sets the algorithm found is the set of minimal
infrequent sets. The recursive tree the algorithm
generated is a subtree of the recursive tree obtained by
Uno’s dualization algorithm inputting Bd+, which is
the set of the complement of maximal frequent sets.</p>
        <p>This algorithm is described in Fig. 3. We call the
algorithm Irredundant Border Enumerator (IBE
algorithm, for short).</p>
        <p>Theorem 1 The computation time of IBE is
O(Dual(Bd+) + |Bd+|g), where Dual(B¯d+) is the
computation time of Uno’s algorithm for dualizing
Bd+, and g is the computation time for go up.</p>
        <p>Note also that, the space complexity of the IBE
algorithm is O(ΣS∈Bd+ |S|) since all we need to
memorize is Bd+ and once a set in Bd− is checked, it is
no longer need to be recorded. On the other hand,
Gunopulos et al. [Gunopulos97a] suggests a usage
of Fredman and Khachiyan’s algorithm [Fredman96]
which needs a space of O(ΣS∈(Bd+∪Bd−)|S|) since the
algorithm needs both Bd+ and Bd− at the last stage.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Using sparseness</title>
      <p>In this section, we speed up the dualization phase
of our algorithm by using a sparseness of H. In real
data, the sizes of maximal frequent sets are usually
small. They are often bounded by a constant. We use
this sparse structure for accelerating the algorithm.
global S0, ..., Sm;
compute mhs(i, mhs) /* mhs is a minimal hitting set of S0, ..., Si */
begin
1 if uncov(mhs) == ∅ then output mhs and return;
2 i := minimum index of uncov(mhs) ;
3 for every e ∈ mhs do
4 increase the counter of items in ∪S∈crit(mhs,e)S by one</p>
      <p>end
5 for every e ∈ mhs s.t. counter is increased by |mhs| do /* items included in all ∪S∈crit(mhs,e)S */
6 compute mhs(i + 1, mhs ∪ {e});</p>
      <p>return;
end</p>
      <p>First, we consider a way to reduce the
computation time of iterations. Let us see the algorithm
described in Fig. 2. The bottle neck part of
the computation of an iteration of the algorithm is
lines 3 to 5, which check the existence of a
critical set Sj ∈ crit(mhs, e ), j &lt; i such that e ∈ Sj .</p>
      <p>To check this condition for an item e ∈ mhs, we
spend O( e ∈mhs |crit(mhs, e )|) time, hence this
check for all e ∈ mhs takes O((|E| − |mhs|) ×
e ∈mhs |crit(mhs, e )|) time.</p>
      <p>Instead of this, we compute S∈crit(mhs,e) S for
each e ∈ mhs. If and only if e ∈ S∈crit(mhs,e) S
for all e ∈ mhs, e satisfies the condition of “if” at
line 4. To compute</p>
      <p>S∈crit(mhs,e) S for all e ∈ mhs,
we take O( e∈mhs S∈crit(mhs,e) |S|) time. In the
case of IBE algorithm, S is a maximal frequent set,
hence the average size of |S| is expected to be small.</p>
      <p>The sizes of minimal infrequent sets are not greater
than the maximum size of maximal frequent sets, and
they are usually smaller than the average size of the
maximal frequent sets. Hence, |mhs| is also expected
to be small.</p>
      <p>Second, we reduce the number of iterations. For
mhs ⊆ E, we define uncov(mhs) by the set of S ∈ H
satisfying S ∩ mhs = ∅. If mhs ∩ Si = ∅, the iteration
inputting mhs and i does nothing but generates a
recursive call with increasing i by one. This type of
iterations should be skipped. Only iterations
executing lines 3 to 5 are crucial. Hence, in each iteration,
we set i to the minimum index among uncov(mhs).</p>
      <p>As a result of this, we need not execute line 2, and
the number of iterations is reduced from i |M HSi|
to | i M HSi|. We describe the improved algorithm
in Fig. 4.</p>
      <p>In our implementation, when we generate a
recursive call, we allocate memory for each variable used
in the recursive call. Hence, the memory required
by the algorithm can be up to O(|E| × m). However,
experimentally the required memory is always linear
in the input size. Note that we can reduce the worst
case memory complexity by some sophisticated
algorithms.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>In this section, we show some results of the
computational experiments of our algorithms. We
implement our algorithm using C programming
language, and examined instances of FIMI2003. For
instances of KDD-cup 2000[KDDcup00], we
compared the results to the computational experiments of
CHARM[Zaki02], closed[Pei00], FP-growth[Han00],
and Apriori[Agrawal96] shown in [Zheng01]. The
experiments in [Zheng01] were done on a PC with a
Duron 550MHz CPU and 1GB RAM memory. Our
experiments were done on a PC with a Pentium III
500MHz CPU and 256MB RAM memory, which is
little slower than a Duron 550MHz CPU. The
results are shown in Figs. 4 – 14. Note that our
algorithm uses at most 170MB for any following instance.
We also show the number of frequent sets, frequent
closed/maximal item sets, and minimal frequent sets.</p>
      <p>In our experiments, IBE algorithm takes
approximately O(|Bd−| × ave(Bd+)) time, while the
computation time of other algorithms deeply depends on
the number of frequent sets, the number of frequent
closed item sets, and the minimum support. We
recall that ave(Bd+) is the average size of maximal
frequent sets. In some instances, our IBE algorithm
performs rather well compared to other algorithms.
In these cases, the number of maximal frequent item
sets is smaller than number of frequent item sets.
IBE algorithm seems to give a good performance for
difficult problems such that the number of maximal
frequent sets is very small rather than those of
frequent item sets and frequent closed item sets.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we describe the detailed
implementation method of our algorithm proposed
in [SatohUno03] and we give some experimental
results on test data.</p>
      <sec id="sec-6-1">
        <title>Acknowledgments</title>
        <p>We are grateful to Heikki Mannila for participating
useful discussions about this research.
6
0
4
8</p>
        <p>kosarak
3000
2500
2000
1500
1000
time(sec
1000 )</p>
        <p>mushroom
connect
30000
25000
20000
15000
10000
IBE
IBE
maximum memory use: 74MB
pumsb: #items 7117, #transactions 49046, ave. size of transaction 74
support 45000 44000 43000 42000</p>
        <p>IBE 301 582 1069 1840
#freq. sets 1163 2993 7044 15757
#closed sets 685 1655 3582 7013
#max. freq. sets 144 288 541 932
#min. infreq. sets 7482 7737 8402 9468</p>
        <p>maximum use of memory: 70MB
pumsb star: #items 7117, #transactions 49046, ave. size of transaction 50
support 30000 25000 20000 15000 10000 5000</p>
        <p>IBE 8 19 59 161 556 2947
#freq. sets 165 627 21334 356945 &gt;2G
#closed sets 66 221 2314 14274 111849
#max. freq. sets 4 17 81 315 1666 15683
#min. infreq. sets 7143 7355 8020 9635 19087 98938</p>
        <p>maximum use of memory: 44MB
kosarak: #items 41217, #transactions 990002, ave. size of transaction 8
support 3000 2500 2000 1500 1000</p>
        <p>IBE 226 294 528 759 2101
#freq. sets 4894 8561 34483 219725 711424
#closed sets 4865 8503 31604 157393 496675
#max. freq. sets 792 1146 2858 4204 16231
#min. infreq. sets 87974 120591 200195 406287 875391</p>
        <p>maximum use of memory: 170MB
mushroom: #items 120, #transactions 8124, ave. size of transaction 23
support 30 20 10 5 2</p>
        <p>IBE 132 231 365 475 433
#freq. sets 505205917 781458545 1662769667 &gt;2G &gt;2G
#closed sets 91122 109304 145482 181243 230585
#max. freq. sets 15232 21396 30809 34131 27299
#min. infreq. sets 66085 79481 81746 69945 31880
maximum use of memory 47MB
connect: #items 130, #transactions 67577, ave. size of transaction 43
support 63000 60000 57000 54000 51000 48000 45000</p>
        <p>IBE 229 391 640 893 1154 1381 1643
#freq. sets 6327 41143 171239 541911 1436863 -
#closed sets 1566 4372 9041 15210 23329 -
#max. freq. sets 152 269 464 671 913 1166 1466
#min. infreq. sets 297 486 703 980 1291 1622 1969
maximum use of memory: 60MB
chess: #items 76, #transactions 3196, ave. size of transaction 37
support 2200 1900 1600 1300 1000</p>
        <p>IBE 19 61 176 555 2191
#freq. sets 59181 278734 1261227 5764922 29442848
#closed sets 28358 106125 366529 1247700 4445373
#max. freq. sets 1047 3673 11209 35417 114382
#min. infreq. sets 1725 5202 14969 46727 152317
maximum use of memory: 50MB</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Agrawal96]
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikant</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toivonen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Verkamo</surname>
            ,
            <given-names>A. I.</given-names>
          </string-name>
          , “Fast Discovery of Association Rules”,
          <string-name>
            <surname>U. M. Fayyad</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Piatetsky-Shapiro</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Smyth</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Uthurusamy</surname>
          </string-name>
          , (eds),
          <source>Advances in Knowledge Discovery and Data Mining</source>
          , chapter
          <volume>12</volume>
          , pp.
          <fpage>307</fpage>
          -
          <lpage>328</lpage>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Fredman96]
          <string-name>
            <surname>Fredman</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Khachiyan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , “
          <article-title>On the Complexity of Dualization of Monotone Disjunctive Normal Forms”</article-title>
          ,
          <source>Journal of Algorithms</source>
          <volume>21</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>618</fpage>
          -
          <lpage>628</lpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Gunopulos97a]
          <string-name>
            <surname>Gunopulos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khardon</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Toivonen</surname>
          </string-name>
          , H.,
          <article-title>“Data mining, Hypergraph Transversals, and Machine Learning”</article-title>
          ,
          <source>Proc. of PODS'97</source>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>216</lpage>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [Gunopulos97b]
          <string-name>
            <surname>Gunopulos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Saluja</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , “
          <article-title>Discovering All Most Specific Sentences using Randomized Algorithms”</article-title>
          ,
          <source>Proc. of ICDT'97</source>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>229</lpage>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Han00] Han,
          <string-name>
            <given-names>J</given-names>
            .,
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          ,
          <source>“Mining Frequent Patterns without Candidate Generation,” SIGMOD Conference</source>
          <year>2000</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>2000</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Kavvadias99]
          <string-name>
            <surname>Kavvadias</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Stavropoulos</surname>
            ,
            <given-names>E. C.</given-names>
          </string-name>
          , “
          <article-title>Evaluation of an Algorithm for the Transversal Hypergraph Problem”</article-title>
          , Algorithm Engineering, pp
          <fpage>72</fpage>
          -
          <lpage>84</lpage>
          (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [KDDcup00]
          <string-name>
            <surname>Kohavi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brodley</surname>
            ,
            <given-names>C. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frasca</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mason</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <source>“KDD-Cup 2000 Organizers' Report: Peeling the Onion,” SIGKDD Explorations</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>86</fpage>
          -
          <lpage>98</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>