<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>FIMI'03: Workshop on Frequent Itemset Mining Implementations</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Bart Goethals HIIT Basic Research Unit Department of Computer Science University of Helsinki</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy</institution>
          ,
          <addr-line>New York</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Why Organize the FIMI Workshop?
Since the introduction of association rule mining in
1993 by Agrawal Imielinski and Swami [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the frequent
itemset mining (FIM) tasks have received a great deal
of attention. Within the last decade, a phenomenal
number of algorithms have been developed for
mining all [3{5, 10, 18, 19, 21, 23, 26, 28, 31, 33], closed [
        <xref ref-type="bibr" rid="ref12 ref22 ref24 ref25 ref27 ref29 ref30 ref32 ref6">6,
12, 22, 24, 25, 27, 29, 30, 32</xref>
        ] and maximal frequent
itemsets [1, 2, 7, 11, 15{17, 20, 35]. Every new paper claims
to run faster than previously existing algorithms, based
on their experimental testing, which is oftentimes quite
limited in scope, since many of the original algorithms
are not available due to intellectual property and
copyright issues. Zheng, Kohavi and Mason [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] observed
that the performance of several of these algorithms is
not always as claimed by its authors, when tested on
some di®erent datasets. Also, from personal
experience, we noticed that even di®erent implementations
of the same algorithm could behave quite di®erently
for various datasets and parameters.
      </p>
      <p>Given this proliferation of FIM algorithms, and
sometimes contradictory claims, there is a pressing
need to benchmark, characterize and understand the
algorithmic performance space. We would like to
understand why and under what conditions one algorithm
outperforms another. This means testing the
methods for a wide variety of parameters, and on di®erent
datasets spanning dense and sparse, real and synthetic,
small and large, and so on.</p>
      <p>Given the experimental, algorithmic nature of FIM
(and most of data mining in general), it is crucial that
other researchers be able to independently verify the
claims made in a new paper. Unfortunately, the FIM
community (with few exceptions) has a very poor track
record in this regard. Many new algorithms are not
available even as an executable, let alone the source
code. How many times have we heard \this is
proprietary software, and not available." This is not the way
other sciences work. Independent veri¯ability is the
hallmark of sciences like physics, chemistry, biology,
and so on. One may argue, that the nature of research
is di®erent, they have detailed experimental procedure
that can be replicated, while we have algorithms, and
there is more than one way to code an algorithm.
However, a good example to emulate is the
bioinformatics community. They have espoused the open-source
paradigm with more alacrity than we have. It is quite
common for journals and conferences in
bioinformatics to require that software be available. For example,
here is a direct quote from the journal Bioinformatics
(http://bioinformatics.oupjournals.org/):
Authors please note that software should be
available for a full 2 YEARS after publication
of the manuscript.</p>
      <p>We organized the FIMI workshop to address the
three main de¯ciencies in the FIM community:
² Lack of publicly available implementations of FIM
algorithms
² Lack of publicly available \real" datasets
² Lack of any serious performance benchmarking of
algorithms
1.1</p>
    </sec>
    <sec id="sec-2">
      <title>FIMI Repository</title>
      <p>The goals of this workshop are to ¯nd out the main
implementation aspects of the FIM problem for all,
closed and maximal pattern mining tasks, and
evaluating the behavior of the proposed algorithms with
respect to di®erent types of datasets and parameters.
One of the most important aspects is that only open
source code submissions are allowed and that all
submissions will become freely available (for research
purposes only) on the online FIMI repository along with
several new datasets for benchmarking purposes. See
the URL: http://fimi.cs.helsinki.fi/.
1.2</p>
      <p>Some Recommendations</p>
      <p>We strongly urge all new papers on FIM to provide
access to source code or at least an executable
immediately after publication. We request that researchers to
contribute to the FIMI repository both in terms of
algorithms and datasets. We also urge the data mining
community to adopt the open-source strategy, which
will serve to accelerate the advances in the ¯eld.
Finally, we would like to alert reviewers that the FIMI
repository now exists, and it contains state-of-the-art
FIM algorithms, so there is no excuse for a new paper
to not do an extensive comparison with methods in the
FIMI repository. Such papers should, in our opinion,
be rejected outright!
2</p>
      <p>The Workshop</p>
      <p>This is a truly unique workshop. It consisted of
code submission as well as a paper submission
describing the algorithm and a detailed performance study by
the authors on publicly provided datasets, along with
a detailed explanation on when and why their
algorithm performs better than existing implementations.
The submissions were tested independently by the
cochairs, and the papers were reviewed by members of the
program committee. The algorithms were judged for
three main tasks: all frequent itemsets mining, closed
frequent itemset mining, and maximal frequent itemset
mining.</p>
      <p>The workshop proceedings contains 15 papers
describing 18 di®erent algorithms that solve the frequent
itemset mining problems. The source code of the
implementations of these algorithms is publicly available
on the FIMI repository site.</p>
      <p>The conditions for \acceptance" of a submission
were as follows: i) a correct implementation for the
given task, ii) an e±cient implementation compared
with other submissions in the same category or a
submission that provides new insight into the FIM
problem. The idea is to highlight both successful and
unsuccessful but interesting ideas. One outcome of the
workshop will be to outline the focus for research on
new problems in the ¯eld.</p>
      <p>
        In order to allow a fair comparison of these
algorithms, we performed an extensive set of experiments
on several real-life datasets, and a few synthetic ones.
Among these are three new datasets, i.e. a supermarket
basket dataset donated by Tom Brijs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], a dataset
containing click-stream data of a Hungarian on-line news
portal donated by Ferenc Bodon [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and a dataset
containing Belgian tra±c accident descriptions donated by
Karolien Geurts [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>We would like to thank the following program
committee members for their useful suggestions and
reviews:
² Roberto Bayardo, IBM Almaden Research Center,</p>
      <p>USA
² Johannes Gehrke, Cornell University, USA
² Jiawei Han, University of Illinois at
Urbana</p>
      <p>Champaign, USA
² Hannu Toivonen, University of Helsinki, Finland
We also thank Taneli MielikÄainen and Toon Calders for
their help in reviewing the submissions.</p>
      <p>We extend our thanks to all the participants who
made submissions to the workshop, since their
willingness to participate and contribute source-code in the
public domain was essential in the creation of the FIMI
Repository. For the same reason, thanks are due to
Ferenc Bodon, Tom Brijs, and Karolien Geurts, who
contributed new datasets, and to Zheng, Kohavi and
Mason for the KDD Cup 2001 datasets.
3</p>
      <p>The FIMI Tasks De¯ned</p>
      <p>Let's assume we are given a set of items I. An
itemset I µ I is some subset of items. A transaction is a
couple T = (tid ; I) where tid is the transaction
identi¯er and I is an itemset. A transaction T = (tid ; I)
is said to support an itemset X, if X µ I. A
transaction database D is a set of transactions such that
each transaction has a unique identi¯er. The cover
of an itemset X in D consists of the set of
transaction identi¯ers of transactions in D that support X:
cover (X; D) := ftid j (tid ; I) 2 D; X µ Ig: The
support of an itemset X in D is the number of transactions
in the cover of X in D:</p>
      <p>support (X; D) := jcover(X; D)j:
An itemset is called frequent in D if its support in D
exceeds a given minimal support threshold ¾. D and
¾ are omitted when they are clear from the context.
The goal is now to ¯nd all frequent itemsets, given a
database and a minimal support threshold.</p>
      <p>The search space of this problem, all subsets of I, is
clearly huge, and a frequent itemset of size k implies the
presence of 2k¡2 other frequent itemsets as well, i.e., its
nonempty subsets. In other words, if frequent itemsets
are long, it simply becomes infeasible to mine the set of
all frequent itemsets. In order to tackle this problem,
several solutions have been proposed that only generate
a representing subset of all frequent itemsets. Among
these, the collections of all closed or maximal itemsets
are the most popular.</p>
      <p>A frequent itemset I is called closed if it has no
frequent superset with the same support, i.e., if
I =</p>
      <p>\
(tid;J)2cover(I)</p>
      <p>J
An frequent itemset is called maximal if it has no
superset that is frequent.</p>
      <p>
        Obviously, the collection of maximal frequent
itemsets is a subset of the collection of closed frequent
itemsets which is a subset of the collection of all frequent
itemsets. Although all maximal itemsets characterize
all frequent itemsets, the supports of all their subsets
is not available, while this might be necessary for some
applications such as association rules. On the other
hand, the closed frequent itemsets form a lossless
representation of all frequent itemsets since the support
of those itemsets that are not closed is uniquely
determined by the closed frequent itemsets. See [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for a
recent survey of the FIM algorithms.
4
      </p>
      <p>Experimental Evaluation</p>
      <p>We conducted an extensive set of experiments for
di®erent datasets, for all of the algorithms in the three
categories (all, closed and maximal). Figure 1 shows
the data characteristics.</p>
      <p>Our target platform was a Pentium4, 3.2 GHz
Processor, with 1GB of memory, using a WesternDigital
IDE 7200rpms, 200GB, local disk. The operating
system was Redhat Linux 2.4.22 and we used gcc 3.2.2 for
the compilation. Other platforms were also tried, such
as an older dual 400Mhz Pentium III processors with
256MB memory, but a faster SCSI 10,000rpms disk.
Independent tests were also run on quad 500Mhz
Pentium III processors, with 1GB memory. There were
some minor di®erences, which have been reported on
the workshop website. Here we refer to the target
platform (3.2Ghz/1GB/7200rpms).</p>
      <p>All times reported are real times, including system
and user times, as obtained from the unix time
command. All algorithms were run with the output °ag
turned on, which means that mined results were
written to a ¯le. We made this decision, since in the real
world one wants to see the output, and the total wall
clock time is the end-to-end delay that one will see.
There was one unfortunate consequence of this, namely,
we were not able to run algorithms for mining all
frequent itemsets below a certain threshold, since the
output ¯le exceeded the 2GB ¯le size limit on a 32bit
platform. For each algorithm we also recorded its memory
consumption using the memusage command. Results
on memory usage are available on the FIMI website.</p>
      <p>For the experiments, each algorithm was allocated
a maximum of 10 minutes to ¯nish execution, after
which point it was killed. We had to do this to
¯nish the evaluation in a reasonable amount of time. We
had a total of 18 algorithms in the all category, 6 in
the closed category, and 8 in the maximal category, for
a grand total of 32 algorithms. Please note the
algorithms eclat zaki, eclat goethals, charm and genmax,
were not technically submitted to the workshop,
however we included them in the comparison since their
source code is publicly available. We used 14 datasets,
with an average of 7 values of minimum support. With
a 10 minute time limit per algorithm, the total time to
¯nish one round of evaluation took 31360 minutes of
running time, which translates to an upper-bound of
21 days! Since not all algorithms take a full 10 minute,
the actual time for one round was roughly 7-10 days.</p>
      <p>We should also mention that some algorithms had
problems on certain datasets. For instance for
mining all frequent itemsets, armor is not able to handle
dense datasets very well (for low values of minimum
support it crashed for chess, mushroom, pumsb); pie
gives a segmentation fault for bms2, chess, retail and
the synthetic datasets; co¯ gets killed for bms1 and
kosarak; and dftime/dfmem crash for accidents, bms1
and retail. For closed itemset mining, fpclose
segmentfaults for bms1, bms2, bmspos and retail; borgelt eclat
also has problems with retail. Finally, for maximal set
mining, apriori borgelt crashes for bms1 for low value
of support and so does eclat borgelt for pumsb.</p>
      <p>Mining All Frequent Itemsets
4.2</p>
      <p>Mining Closed Frequent Itemsets</p>
      <p>Figures 5 and 6 show the timings for the algorithms
for mining all frequent itemsets. Figure 2 shows the
best and second-best algorithms for high and low values
of support for each dataset.</p>
      <p>There are several interesting trends that we observe:
1. In some cases, we observe a high initial running
time of the highest value of support, and the time
drops for the next value of minimum support. This
is due to ¯le caching. Each algorithm was run with
multiple minimum support values before switching
to another algorithm. Therefore the ¯rst time the
database is accessed we observe higher times, but
on subsequent runs the data is cached and the I/O
time drops.
2. In some cases, we observe that there is a cross-over
in the running times as one goes from high to low
values of support. An algorithm may be the best
for high values of support, but the same algorithm
may not be the best for low values.
3. There is no one best algorithm either for high
values or low values of support, but some algorithms
are the best or runner-up more often than others.
Looking at Figure 2, we can conclude that for high
values the best algorithms are either kdci or patricia,
across all databases we tested. For low values, the
picture is not as clear; the algorithms likely to perform
well are patricia, fpgrowth* or lcm. For the runner-up
in the low support category, we once again see patricia
and kdci showing up.</p>
      <p>Figures 7 and 8 show the timings for the algorithms
for mining closed frequent itemsets. Figure 3 shows the
best and second-best algorithms for high and low values
of support for each dataset. For high support values,
fpclose is best for 7 out of the 14 datasets, and lcm,
afopt, and charm also perform well on some datasets.
For low values of support the competition is between
fpclose and lcm for the top spot. For the runner-up
spot there is a mixed bag of algorithms: fpclose, afopt,
lcm and charm. If one were to pick an overall best
algorithm, it would arguably be fpclose, since it either
performs the best or shows up in the runner-up spot,
more times than any other algorithm. An interesting
observation is that for the cases where fpclose doesn't
appear in the table it gives a segmentation fault (for
bms1, bms2, bmspos and retail).
4.3</p>
      <p>Mining Maximal Frequent Itemsets</p>
      <p>Figures 9 and 10 show the timings for the
algorithms for mining maximal frequent itemsets. Figure 4
shows the best and second-best algorithms for high and
low values of support for each dataset. For high values
of support fpmax* is the dominant winner or
runnerup. Genmax, ma¯a and afopt also are worth
mentioning. For the low support category fpmax* again makes
a strong show as the best in 7 out of 14 databases, and
when it is not best, it appears as the runner-up 6 times.
Thus fpmax* is the method of choice for maximal
pattern mining.</p>
      <p>We presented only some of the results in this
report. We refer the reader to the FIMI repository for
a more detailed experimental study. The study done
by us was also somewhat limited, since we performed
only timing and memory usage experiments for given
datasets. Ideally, we would have liked to do a more
detailed study of the scale-up of the algorithms, and for a
variety of di®erent parameters; our preliminary studies
show that none of the algorithms is able to gracefully
scale-up to very large datasets, with millions of
transactions. One reason may be that most methods are
optimized for in-memory datasets, which points to the
area of out-of-core FIM algorithms an avenue for future
research.</p>
      <p>In the experiments reported above, there were no
clear winners, but some methods did show up as the
best or second best algorithms for both high and low
values of support. Both patricia and kdci represent the
state-of-the-art in all frequent itemset mining, whereas
fpclose takes this spot for closed itemset mining, and
¯nally fpmax* appears to be one of the best for
maximal itemset mining. An interesting observation is that
for the synthetic datasets, apriori borgelt seems to
perform quite well for all, closed and maximal itemset
mining.</p>
      <p>We refer the reader to the actual papers in these
proceedings to ¯nd out the details on each of the
algorithms in this study. The results presented here should
be taken in the spirit of experiments-in-progress, since
we do plan to diversify our testing to include more
parameters. We are con¯dent that the workshop will
generate a very healthy and critical discussion on the
state-of-a®airs in frequent itemset mining
implementations.</p>
      <p>To conclude, we hope that the FIMI workshop will
serve as a model for the data mining community to
hold more such open-source benchmarking tests, and
we hope that the FIMI repository will continue to grow
with the addition of new algorithms and datasets, and
once again to serve as a model for the rest of the data
mining world.
Database
accidents
bms1
bms2
bmspos
chess
connect
kosarak
mushroom
pumsb
pumsb*
retail
T10I5N1KP5KC0.25D200K
T20I10N1KP5KC0.25D200K
T30I15N1KP5KC0.25D200K
1st
kdci
patricia
patricia</p>
      <p>kdci
patricia
kdci
kdci
kdci
patricia</p>
      <p>kdci
patricia
patricia
kdci
kdci
1st
fpgrowth*</p>
      <p>lcm
fpgrowth*
lcm
lcm
patricia
lcm
ma¯a
patricia</p>
      <p>lcm
fpgrowth*
fpgrowth*
apriori borgelt</p>
      <p>2nd
patricia
80
70
60
50
40
30
20
10
0
1
0.1</p>
      <p>0.1
1000
100
10
1
1
1000
100
10
1
0.1
95
0.1
1000
100
10
1
0.01
90
80
70
60
50
40
30
20</p>
      <p>10
0.1
100
10
1
0.1
100
10
1000
1
1
pie
patricia
apriori-dftime
eclat_borgelt
apriori_bodon</p>
      <p>lcm
armor
apriori_brave
eclat_zaki</p>
      <p>aim
fpgrowth*
eclat_goethals</p>
      <p>mafia
apriori-dfmem</p>
      <p>afopt
apriori_borgelt
kdci
100
10
1
1000
100
10
1
100
10
1000
100
10
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
18
16
14
12
10
8
6
4
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
1000
100
10
1
0.01
0.1
100
0.1</p>
      <sec id="sec-3-1">
        <title>Minimum Support (%) closed-bmspos Minimum Support (%) closed-connect</title>
        <p>0.1
100
10
1
0.1
100
10
1000
1
1
0.1
0.01
1000
100
10
1
0.1
1000
100
10
1
100
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
20
18
16
14
12
10
8
6
4
2
0
0.1
)
c
e
s
(
e
m
i
T
l
a
t
o
T
)
c
e
s
(
e
m
i
T
l
a
t
o
T
100
10
1
0.1
1000
100
10
1
1
10000
1000
100
10
1
0.1
0.1</p>
      </sec>
      <sec id="sec-3-2">
        <title>Minimum Support (%)</title>
        <p>maximal-bmspos
Minimum Support (%)
maximal-connect
1
0.11
18
16
14
85
0.05
)
c
e
s
(
e
m
i
T
l
a
t
o
T
)
c
e
s
(
e
m
i
T
l
a
t
o
T
1
0.1
10000
1000
100
10
1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.5
ibe
mafia
eclat_borgelt
fpmax*</p>
        <p>lcm
genmax</p>
        <p>afopt
apriori_borgelt
ibe
mafia
eclat_borgelt
fpmax*</p>
        <p>lcm
genmax</p>
        <p>afopt
apriori_borgelt</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          .
          <article-title>Towards long pattern generation in dense databases</article-title>
          .
          <source>SIGKDD Explorations</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <volume>20</volume>
          {
          <fpage>26</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Prasad</surname>
          </string-name>
          .
          <article-title>Depth First Generation of Long Patterns</article-title>
          .
          <source>In 7th Int'l Conference on Knowledge Discovery and Data Mining, Aug</source>
          .
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Imielinski</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Swami</surname>
          </string-name>
          .
          <article-title>Mining association rules between sets of items in large databases</article-title>
          .
          <source>In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data</source>
          , pages
          <volume>207</volume>
          {
          <fpage>216</fpage>
          . ACM Press,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mannila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Toivonen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. I.</given-names>
            <surname>Verkamo</surname>
          </string-name>
          .
          <article-title>Fast discovery of association rules</article-title>
          . In U. Fayyad and et al, editors,
          <source>Advances in Knowledge Discovery and Data Mining</source>
          , pages
          <volume>307</volume>
          {
          <fpage>328</fpage>
          . AAAI Press, Menlo Park, CA,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          .
          <article-title>Fast algorithms for mining association rules</article-title>
          .
          <source>In 20th VLDB Conference</source>
          , Sept.
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bastide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taouil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pasquier</surname>
          </string-name>
          , G. Stumme, and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lakhal</surname>
          </string-name>
          .
          <article-title>Mining frequent patterns with counting inference</article-title>
          .
          <source>SIGKDD Explorations</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ), Dec.
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Bayardo</surname>
          </string-name>
          . E±
          <article-title>ciently mining long patterns from databases</article-title>
          .
          <source>In ACM SIGMOD Conf. Management of Data</source>
          ,
          <year>June 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bodon</surname>
          </string-name>
          .
          <article-title>A fast apriori implementation</article-title>
          .
          <source>In Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brijs</surname>
          </string-name>
          , G. Swinnen,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vanhoof</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Wets</surname>
          </string-name>
          .
          <article-title>Using association rules for product assortment decisions: A case study</article-title>
          .
          <source>In Knowledge Discovery and Data Mining</source>
          , pages
          <volume>254</volume>
          {
          <fpage>260</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Brin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ullman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Tsur</surname>
          </string-name>
          .
          <article-title>Dynamic itemset counting and implication rules for market basket data</article-title>
          .
          <source>In ACM SIGMOD Conf. Management of Data</source>
          , May
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Calimlim</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Gehrke.</surname>
          </string-name>
          <article-title>MAFIA: a maximal frequent itemset algorithm for transactional databases</article-title>
          .
          <source>In Intl. Conf. on Data Engineering</source>
          , Apr.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cristofor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cristofor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Simovici</surname>
          </string-name>
          .
          <article-title>Galois connection and data mining</article-title>
          .
          <source>Journal of Universal Computer Science</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ):
          <volume>60</volume>
          {
          <fpage>73</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Geurts</surname>
          </string-name>
          , G. Wets,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brijs</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Vanhoof</surname>
          </string-name>
          .
          <article-title>Pro¯ling high frequency accident locations using association rules</article-title>
          .
          <source>In Proceedings of the 82nd Annual Transportation Research Board, page 18</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Goethals</surname>
          </string-name>
          . E±cient Frequent Pattern Mining.
          <source>PhD thesis</source>
          , transnational University of Limburg, Belgium,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gouda</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          . E±
          <article-title>ciently mining maximal frequent itemsets</article-title>
          .
          <source>In 1st IEEE Int'l Conf. on Data Mining, Nov</source>
          .
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Grahne</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Zhu.</surname>
          </string-name>
          <article-title>High performance mining of maximal frequent itemsets</article-title>
          .
          <source>In 6th International Workshop on High Performance Data Mining</source>
          , May
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gunopulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mannila</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Saluja</surname>
          </string-name>
          .
          <article-title>Discovering all the most speci¯c sentences by randomized algorithms</article-title>
          .
          <source>In Intl. Conf. on Database Theory, Jan</source>
          .
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Pei, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          .
          <article-title>Mining frequent patterns without candidate generation</article-title>
          .
          <source>In ACM SIGMOD Conf. Management of Data</source>
          , May
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Houtsma</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Swami</surname>
          </string-name>
          .
          <article-title>Set-oriented mining of association rules in relational databases</article-title>
          .
          <source>In 11th Intl. Conf. Data Engineering</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.-I.</given-names>
            <surname>Lin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z. M.</given-names>
            <surname>Kedem</surname>
          </string-name>
          .
          <article-title>Pincer-search: A new algorithm for discovering the maximum frequent set</article-title>
          .
          <source>In 6th Intl. Conf. Extending Database Technology, Mar</source>
          .
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>J.-L. Lin</surname>
            and
            <given-names>M. H.</given-names>
          </string-name>
          <string-name>
            <surname>Dunham</surname>
          </string-name>
          .
          <article-title>Mining association rules: Anti-skew algorithms</article-title>
          .
          <source>In 14th Intl. Conf. on Data Engineering</source>
          , Feb.
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zaki</surname>
          </string-name>
          . CARPENTER:
          <article-title>Finding closed patterns in long biological datasets</article-title>
          .
          <source>In ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, Aug</source>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>An e®ective hash based algorithm for mining association rules</article-title>
          .
          <source>In ACM SIGMOD Intl. Conf. Management of Data</source>
          , May
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pasquier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bastide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taouil</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lakhal</surname>
          </string-name>
          .
          <article-title>Discovering frequent closed itemsets for association rules</article-title>
          .
          <source>In 7th Intl. Conf. on Database Theory, Jan</source>
          .
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          , J. Han, and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mao</surname>
          </string-name>
          . Closet:
          <article-title>An e±cient algorithm for mining frequent closed itemsets</article-title>
          .
          <source>In SIGMOD Int'l Workshop on Data Mining and Knowledge Discovery</source>
          , May
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Savasere</surname>
          </string-name>
          , E. Omiecinski, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Navathe</surname>
          </string-name>
          .
          <article-title>An ef¯cient algorithm for mining association rules in large databases</article-title>
          .
          <source>In 21st VLDB Conf.</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shenoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Haritsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sudarshan</surname>
          </string-name>
          , G. Bhalotia,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bawa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Shah</surname>
          </string-name>
          .
          <article-title>Turbo-charging vertical mining of large databases</article-title>
          .
          <source>In ACM SIGMOD Intl. Conf. Management of Data</source>
          , May
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>H.</given-names>
            <surname>Toivonen</surname>
          </string-name>
          .
          <article-title>Sampling large databases for association rules</article-title>
          .
          <source>In 22nd VLDB Conf.</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , J. Han, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          . Closet+:
          <article-title>Searching for the best strategies for mining frequent closed itemsets</article-title>
          .
          <source>In ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, Aug</source>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          .
          <article-title>Scalable algorithms for association mining</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>12</volume>
          (
          <issue>3</issue>
          ):
          <fpage>372</fpage>
          -
          <lpage>390</lpage>
          , May-June
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gouda</surname>
          </string-name>
          .
          <article-title>Fast vertical mining using Di®sets</article-title>
          .
          <source>In 9th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, Aug</source>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          and
          <string-name>
            <surname>C.-J. Hsiao.</surname>
          </string-name>
          <article-title>ChARM: An e±cient algorithm for closed itemset mining</article-title>
          .
          <source>In 2nd SIAM International Conference on Data Mining, Apr</source>
          .
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>M. J. Zaki</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Parthasarathy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ogihara</surname>
            , and
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>New algorithms for fast discovery of association rules</article-title>
          .
          <source>In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Aug</source>
          .
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kohavi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Mason</surname>
          </string-name>
          .
          <article-title>Real world performance of association rule algorithms</article-title>
          .
          <source>In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , pages
          <volume>401</volume>
          {
          <fpage>406</fpage>
          . ACM Press,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Smartminer: a depth ¯rst algorithm guided by tail information for mining maximal frequent itemsets</article-title>
          .
          <source>In 2nd IEEE Int'l Conf. on Data Mining, Nov</source>
          .
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <source>0.09 0.08 0.07 0.06 0.04 0.03 0.02 3.5 3 2.5 2 1.5 Figure 10. Comparative Performance: Maximal 13 Minimum Support (%)</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>