<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>APRIORI, A Depth First Implementation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Walter A. Kosters</string-name>
          <email>kosters@liacs.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wim Pijls</string-name>
          <email>pijls@few.eur.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Erasmus University</institution>
          ,
          <addr-line>P.O. Box 1738, 3000 DR Rotterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leiden Institute of Advanced Computer Science, Universiteit Leiden</institution>
          ,
          <addr-line>P.O. Box 9512, 2300 RA Leiden</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We will discuss , the depth first implementation of APRIORI as devised in 1999 (see [8]). Given a database, this algorithm builds a trie in memory that contains all frequent itemsets, i.e., all sets that are contained in at least minsup transactions from the original database. Here minsup is a threshold value given in advance. In the trie, that is constructed by adding one item at a time, every path corresponds to a unique frequent itemset. We describe the algorithm in detail, derive theoretical formulas, and provide experiments.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this paper we discuss the depth first ( , see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ])
implementation of APRIORI (see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), one of the fastest
known data mining algorithms to find all frequent
itemsets in a large database, i.e., all sets that are contained in at
least minsup transactions from the original database. Here
minsup is a threshold value given in advance. There exist
many implementations of APRIORI (see, e.g., [
        <xref ref-type="bibr" rid="ref11 ref6">6, 11</xref>
        ]). We
would like to focus on algorithms that assume that the whole
database fits in main memory, this often being the state of
affairs; among these, and (the FP-growth
implementation of APRIORI, see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) are the fastest. In most
papers so far little attention has been given to theoretical
complexity. In [
        <xref ref-type="bibr" rid="ref3 ref7">3, 7</xref>
        ] a theoretical basis for the analysis of these
two algorithms was presented.
      </p>
      <p>The depth first algorithm is a simple algorithm that
proceeds as follows. After some preprocessing, which
involves reading the database and a sorting of the single items
with respect to their support, builds a trie in memory,
where every path from the root downwards corresponds to
a unique frequent itemset; in consecutive steps items are
added to this trie one at a time. Both the database and the trie
are kept in main memory, which might cause memory
problems: both are usually very large, and in particular the trie
gets much larger as the support threshold decreases. Finally
the algorithm outputs all paths in the trie, i.e., all frequent
itemsets. Note that once completed, the trie allows for fast
itemset retrieval in the case of online processing.</p>
      <p>We formerly had two implementations of the algorithm,
one being time efficient, the other being memory efficient
(called dftime.cc and dfmemory.cc, respectively),
where the time efficient version could not handle low
support thresholds. The newest version (called dffast.cc)
combines them into one even faster implementation, and
runs for all support thresholds.</p>
      <p>In this paper we first describe , we then give some
formal definitions and theoretical formulas, we discuss the
program, provide experimental results, and conclude with
some remarks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Algorithm</title>
      <p>An appropriate data structure to store the frequent
itemsets of a given database is a trie. As a running example in
this section we use the dataset of Figure 1. Each line
represents a transaction. The trie of frequent patterns is shown
in Figure 2. The entries (or cells) in a node of a trie are
usually called buckets, as is also the case for a hash-tree.
Each bucket can be identified with its path to the root and
hence with a unique frequent itemset. The example trie has
9 nodes and 18 buckets, representing 18 frequent itemsets.
As an example, the frequent itemset can be
seen as the leftmost path in the trie; and a set as
is not present.</p>
      <p>
        One of the oldest algorithms for finding frequent patterns
is APRIORI, see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This algorithm successively finds all
frequent 1-itemsets, all frequent 2-itemsets, all frequent
3itemsets, and so on. (A -itemset has items.) The frequent
-itemsets are used to generate candidate -itemsets,
transaction
number
1
2
3
4
5
6
items
B C D
A B E F
A B E F
A B C F
A B C E F
      </p>
      <p>C D E F
Frequent itemsets when minsup
support frequent itemsets
5 B, F
4 A, AB, AF, ABF, BF, C, E, EF
3 AE, ABE, ABEF, AEF, BC,</p>
      <p>BE, BEF, CF</p>
      <p>A B C E F
B E F</p>
      <p>C E F</p>
      <p>F</p>
      <p>F
E F</p>
      <p>F</p>
      <p>F</p>
      <p>F
where the candidates are only known to have two frequent
subsets with elements. After a pruning step, where
candidates still having infrequent subsets are discarded, the
support of the candidates is determined. The way APRIORI
finds the frequent patterns implies that the trie is built layer
by layer. First the nodes in the root (depth ) are
constructed, next the trie nodes at depth 1 are constructed, and
so on. So, APRIORI can be thought of as an algorithm that
builds the pattern trie in a breadth first way. We propose an
algorithm that builds the trie in a depth first way. We will
explain the depth first construction of the trie using the dataset
of Figure 1. Note that the trie grows from right to left.</p>
      <p>The algorithm proceeds as follows. In a preprocessing
step, the support of each single item is counted and the
infrequent items are eliminated. Let the frequent items be
denoted by . Next, the code from Figure 3 is
executed.</p>
      <p>;
(1)
(2) for
(3)
(4)
(5)
(6)
(7)
the trie including only bucket
downto 1 do</p>
      <p>;
with added to the left and
a copy of appended to ;</p>
      <p>(= the subtrie rooted in
count ;
delete the infrequent itemsets from ;
(9) procedure count ::
(10) for every transaction including item
(11) for every itemset in do
(12) if supports then .support ;</p>
      <p>The procedure count determines the support of
each itemset (bucket) in the subtrie . This is achieved by
a database pass, in which each transaction including item
is considered. Any such transaction is one at a time
“pushed” through , where it only traverses a subtrie if it
includes the root of this subtrie, meanwhile updating the
support fields in the buckets. In the last paragraph from
Section 4 a refinement of this part of the algorithm is presented.
On termination of the algorithm, exactly contains the
frequent itemsets.</p>
      <p>Figure 4 illustrates the consecutive steps of the algorithm
applied to our example. The single items surpassing the
minimum support threshold 3 are</p>
      <p>and . In the figure, the shape of after
each iteration of the for loop is shown. Also the infrequent
itemsets to be deleted at the end of an iteration are
mentioned. At the start of the iteration with index , the root of
trie consists of the 1-itemsets . (We denote a
1-itemset by the name of its only item, omitting curly braces
and commas as in Figure 1 and Figure 4.) By the statement
in line (3) from Figure 3, this trie may also be referred to as
. A new trie is composed by adding bucket to the
root and by appending a copy of (the former value of )
to . The newly added buckets are the new candidates and
they make up a subtrie . In Figure 4, the candidate set is
in the left part of each trie and is drawn in bold. Notice that
C E F</p>
      <p>E F</p>
      <p>B C E F
C E F</p>
      <p>F
F</p>
      <p>F
A B C E F</p>
      <p>F
F
F</p>
      <p>F</p>
      <sec id="sec-2-1">
        <title>BCF is infrequent</title>
        <p>and hence deleted
CE and CEF
are infrequent
and hence deleted
B C E F
C E F
F</p>
        <p>F
C E F
F</p>
        <p>F
F
F</p>
      </sec>
      <sec id="sec-2-2">
        <title>ABC, AC and ACF are infrequent and hence deleted</title>
        <p>the final trie (after deleting infrequent itemsets) is identical
to Figure 2.</p>
        <p>The number of iterations in the for loop is one less
than the number of frequent 1-itemsets. Consequently, the
number of database passes is one less than the
number of frequent 1-itemsets. This causes the algorithm to
be tractable only if the database under consideration is
memory-resident. Given the present-day memory sizes, this
is not a real constraint any more.</p>
        <p>As stated above, our algorithm has a preprocessing step
which counts the support for each single item. After this
preprocessing step, the items may be re-ordered. The most
favorable execution time is achieved if we order the items
by increasing frequency (see Section 3 for a more formal
motivation). It is better to have low support at the top of the
deeper side (to the left bottom) of the trie and hence, high
support at the top of the shallow part (to the upper right) of
the trie.</p>
        <p>We may distinguish between “dense” data sets and
“sparse” datasets. A dense dataset has many frequent
patterns of large size and high support, as is the case for
test sets such as chess and mushroom (see Section 5).
In those datasets, many transactions are similar to each
other. Datasets with mainly short patterns are called sparse.
Longer patterns may exist, but with relatively small
support. Real-world transaction databases of supermarkets
mostly belong to this category. Also the synthetic datasets
from Section 5 have similar properties: interesting support
thresholds are much lower than in the dense case.</p>
        <p>
          Algorithms for finding frequent patterns may be divided
into two types: algorithms respectively with and without
candidate generation Any APRIORI-like instance belongs to
the first type. Eclat (see [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]) may also be considered as an
instance of this type. The FP-growth algorithm from
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is the best-known instance of the second type (though
one can also defend the point of view that it does generate
candidates). For dense datasets, performs better than
candidate generating algorithms. stores the dataset in
a way that is very efficient especially when the dataset has
many similar transactions. In case of algorithms that do
apply candidate generation, dense sets produce a large number
of candidates. Since each new candidate has to be related
to each transaction, the database passes take a lot of time.
However, for sparse datasets, candidate generation is a very
suitable method for finding frequent patterns. To our
experience, the instances of the APRIORI family are very useful
when searching transaction databases. According to the
results in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] the depth first algorithm outperforms
FPgrowth in the synthetic transaction sets (see Section 5
for a description of these sets).
        </p>
        <p>Finally, note that technically speaking is not a full
implementation of APRIORI, since every candidate itemset
is known to have only one frequent subset (resulting from
the part of the trie which has already been completed)
instead of two. Apart from this, its underlying candidate
generation mechanism strongly resembles the one from
APRIORI.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Theoretical Complexity</title>
      <p>Let denote the number of transactions (also called
customers), and let denote the number of products (also
called items). Usually is much larger than . For a
nonempty itemset we define:</p>
      <p>is the support of : the number of customers
that buy all products from (and possibly more), or
equivalently the number of transactions that contain ;
is the smallest number in ;
is the largest number in .</p>
      <p>In line with this we let . We also put
and . A set is called
frequent if , where the so-called support
threshold minsup is a fixed number given in advance.</p>
      <p>We assume every 1-itemset to be frequent; this can be
effected by the first step of the algorithms we are looking
at, which might be considered as preprocessing.</p>
      <p>
        A “database query” is defined as a question of the form
“Does customer buy product ?” (or “Does transaction
has item ?”), posed to the original database. Note that
we have database queries in the “preprocessing” phase
in which the supports of the 1-itemsets are computed and
ordered: every field of the database is inspected once. (By
the way, the sorting, in which the items are assigned the
numbers , takes time.) The number
of database queries for equals:
For a proof, see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It relies on the fact that in order for a
node to occur in the trie the path to it (except for the root)
should be frequent, and on the observation that this
particular node is “questioned” every time a transaction follows
this same path. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] a simple version of is described
in a similar style, leading to
database queries in “local databases” (FP-trees), except for
the preprocessing phase. Note the extra condition on the
inner summation (which is “good” for : we have less
summands there), while on the other hand the summands are
larger (which is “good” for : we have a smaller
contribution there).
      </p>
      <p>It makes also sense to look at the total number of nodes
of the trie during its construction, which is connected to the
effort of maintaining and using the datastructures. Counting
each trie-node with the number of buckets it contains, the
total is computed to be:
When the trie is finally ready the number of remaining
buckets equals the number of frequent sets, each item in a node
being the end of the path that represents the corresponding
itemset.</p>
      <p>
        Notice that the complexity heavily depends on the
sorting order of the items at the top level. It turns out that an
increasing order of items is beneficial here. This is suggested
by the contribution of the 1-itemsets in Equation (1):
(3)
(4)
(1)
(2)
which happens to be minimal in that case. This 1-itemset
contribution turns out to be the same for both and :
see [
        <xref ref-type="bibr" rid="ref3 ref7">3, 7</xref>
        ], where also results for are presented in more
detail.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4 Implementation Issues</title>
      <p>
        In this section we discuss some implementation details
of our program. As mentioned in Section 2, the database
is traversed many times. It is therefore necessary that the
database is memory-resident. Fortunately, only the
occurrences of frequent items need to be stored. The database
is represented by a two-dimensional boolean array. For
efficiency reasons, one array entry corresponds to one bit. Since
the function count in the algorithm considers the database
transaction by transaction, a horizontal layout is chosen,
cf. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>We have four preprocessing steps before the algorithm
of Section 2 actually starts.</p>
      <p>1 The range of the item values is determined. This is
necessary, because some test sets, e.g., the BMS-WebView
sets, have only values .
2 This is an essential initial step. First, for each item the
support is counted. Next, the frequent items are
selected and sorted by frequency. This process is
relevant, since the frequency order also prescribes the
order in the root of the trie, as stated before. The sorted
frequent items along with their supports are retained in
an array.
3 If a transaction has zero or one frequent item, it needs
not to be stored into the memory-resident
representation of the database. The root of the trie is constructed
according to the information gathered in step 2. For
constructing the other buckets, only transactions with
at least two frequent items are relevant. In this step, we
count the relevant transactions.
4 During this step the databases is stored into a
twodimensional array with horizontal layout. Each item is
given a new number, according to its rank in the
frequency order. The length of the array equals the result
of step 3; the width is determined by the number of
frequent items.</p>
      <p>After this preparatory work, which in practice usually takes
a few seconds, the code as described in Section 2 is
executed. The cells of the root are constructed using the result
of initial step 2.</p>
      <p>In line (12) from Figure 3 in Section 2, backtracking is
applied to inspect each path of . Inspecting a path is
aborted as soon as an item with outside the current
transaction is found. Obviously, processing one transaction
during the count procedure is a relatively expensive task, which
is unfortunately inevitable, whichever version of APRIORI
is used.</p>
      <p>As mentioned in the introduction, we used to have two
implementations, one being time efficient, the other being
memory efficient. These two have been used in the overall
FIMI’03 comparisons. The newest implementation (called
dffast.cc) combines these versions by using the
following refinement. Instead of appending a copy of to
(see Figure 3 in Section 2), first the counting is done in
auxiliary fields in the original , after which only the frequent
buckets are copied underneath . This makes the
deletion of infrequent itemsets (line (7) from Figure 3)
unnecessary and leads to better memory management. Another
improvement might be achieved by using more auxiliary
fields while adding two root items simultaneously in each
iteration, thereby halving the number of database passes at
the cost of more bookkeeping.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>
        Using the relatively small database chess (342 kB, with
3,196 transactions; available from the FIMI’03 website at
http://fimi.cs.helsinki.fi/testdata.html), the
database mushroom (570 kB, with 8,124 transactions; also
available from the FIMI’03 website) and the well-known
IBM-Almaden synthetic databases (see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) we shall
examine the complexity of the algorithm. These databases have
either few, but coherent records (chess and mushroom),
or many records (the synthetic databases). The parameters
for generating a synthetic database are the number of
transactions (in thousands), the average transaction size and
the average length of so-called maximal potentially large
1000
800
)sdn 600
o
c
e
s
(
e
itunm 400
r
200
200
150
)
s
d
n
o
c
(se100
e
m
it
n
u
r
      </p>
      <p>
        50
itemsets. The number of items was set to ,
following the design in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We use T10I4D100K (4.0 MB)
and T40I10D100K (15.5 MB), both also available from
the FIMI’03 website mentioned above; they both contain
100,000 transactions.
      </p>
      <p>Database chess
4 s
0
0
0
,
0
0
3 i,1n0
s
t
e
s
f
o
2 rebm
u
n
1
5
4 s
0
0
0
,
0
0
3 i,1n0
s
t
e
s
f
o
2 rebm
u
n
1</p>
      <p>The experiments were conducted at a Pentium-IV
machine with 512 MB memory at 2.8 GHz, running Red Hat
Linux 7.3. The program was developed under the GNU C
compiler, version 2.96.</p>
      <p>The following statistics are plotted in the graphs: the
execution time in seconds of the algorithm (see Section 4),
and the total number of frequent itemsets: in all figures the
corresponding axis is on the right hand side and scales 0–
5,500,000 (0–8,000,000 for ). The execution
time excludes preprocessing: in this phase the database is
read three times in order to detect the frequent items (see
before); also excluded is the time needed to print the
resulting itemsets. These actions together usually only take a
few seconds. The number of frequent 1-itemsets ( from
the previous sections, where we assumed all 1-itemsets
to be frequent) has range 31–39 for the experiments on
the database chess, 54–76 for mushroom, 844–869 for
T10I4D100K and 610–862 for T40I10D100K. Note the
very high support thresholds for mushroom (at least 5%)
and chess (at least 44%); for T10I4D100K a support
threshold as low as 0.003% was even feasible.</p>
      <p>The largest output files produced are of size 110.6 MB
(chess, minsup = 1,400, having 3,771,728 frequent sets
with 13 frequent 17-itemsets), 121.5 MB (mushroom,
minsup = 400, having 3,457,747 frequent sets with 24 frequent
17-itemsets), 131.1 MB (T10I4D100K, minsup = 3,
having 6,169,854 frequent sets with 30 frequent 13-itemsets
and 1 frequent 14-itemset) and 195.9 MB (T40I10D100K,
1
4 s
0
0
0
,
0
0
3 i,1n0
s
t
e
s
f
o
2 rebm
u
n
1
minsup = 300, having 5,058,313 frequent sets, with 21
frequent 19-itemsets and 1 frequent 20-itemset). The final trie
in the T40I10D100K case occupies approximately 65 MB
of memory — the output file in this case being 3 times as
large.</p>
      <p>Note that the 3,457,747 sets for the chess database
with minsup = 1,400 require 829 seconds to find, whereas
the 3,771,728 frequent itemsets for mushroom with
minsup = 400 take 158 seconds — differing approximately
a factor 5 in time. This difference in runtime is probably
caused by the difference in the absolute minsup value. Each
cell corresponding to a frequent itemset is visited at least
1400 times in the former case against 400 times in the
latter case. A similar phenomenon is observed when
comparing T40I10D100K with absolute minsup value 300 and
T10I4D100K with minsup = 3: this takes 378 versus 88
seconds. Although the outputs have the same orders of
magnitude, the runtimes differ substantially. We see that, besides
the number of frequent itemsets and the sizes of these sets,
the absolute minsup value is a major factor determining the
runtime.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper, we addressed , a depth first
implementation of APRIORI. To our experience, competes with
any other well-known algorithm, especially when applied to
large databases with transactions.</p>
      <p>
        Storing the database in the primary memory is no longer
a problem. On the other hand, storing the candidates causes
trouble in situations, where a dense database is considered
with a small support threshold. This is the case for any
algorithm using candidates. Therefore, it would be desirable
to look for a method which stores candidates in secondary
memory. This is an obvious topic for future research. To
our knowledge, is the only algorithm that can cope
with memory limitations. However, for real world retail
databases this algorithm is surpassed by , as we showed
in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Other optimizations might also be possible. Besides
improving the C code, ideas from, e.g., [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] on diffsets
with vertical layouts might be used.
      </p>
      <p>Our conclusion is that is a simple, practical,
straightforward and fast algorithm for finding all frequent itemsets.</p>
      <p>Survey on frequent pattern mining.</p>
      <p>.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mannila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Toivonen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.I.</given-names>
            <surname>Verkamo</surname>
          </string-name>
          .
          <article-title>Fast discovery of association rules</article-title>
          . In U.M. Fayyad,
          <string-name>
            <given-names>G.</given-names>
            <surname>Piatetsky-Shapiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Smyth</surname>
          </string-name>
          , and R. Uthurusamy, editors,
          <source>Advances in Knowledge Discovery and Data Mining</source>
          , pages
          <fpage>307</fpage>
          -
          <lpage>328</lpage>
          . AAAI/MIT Press,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          .
          <article-title>Fast algorithms for mining association rules</article-title>
          . In J.B.
          <string-name>
            <surname>Bocca</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Jarke</surname>
          </string-name>
          , and C. Zaniolo, editors,
          <source>Proceedings 20th International Conference on Very Large Data Bases, VLDB</source>
          , pages
          <fpage>487</fpage>
          -
          <lpage>499</lpage>
          . Morgan Kaufmann,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>J.M. de Graaf</surname>
            ,
            <given-names>W.A.</given-names>
          </string-name>
          <string-name>
            <surname>Kosters</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Pijls</surname>
            , and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Popova</surname>
          </string-name>
          .
          <article-title>A theoretical and practical comparison of depth first and FP-growth implementations of Apriori</article-title>
          . In H. Blockeel and M. Denecker, editors,
          <source>Proceedings of the Fourteenth Belgium-Netherlands Artificial Intelligence Conference (BNAIC</source>
          <year>2002</year>
          ), pages
          <fpage>115</fpage>
          -
          <lpage>122</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Goethals</surname>
          </string-name>
          . Helsinki,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Pei, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          .
          <article-title>Mining frequent patterns without candidate generation</article-title>
          .
          <source>In Proceedings 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD'00)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hipp</surname>
          </string-name>
          , U. Gu¨nther, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Nakhaeizadeh</surname>
          </string-name>
          .
          <article-title>Mining association rules: Deriving a superior algorithm by analyzing today's approaches</article-title>
          . In D.A.
          <string-name>
            <surname>Zighed</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Komorowski</surname>
          </string-name>
          , and J. Z˙ ytkov, editors,
          <source>Principles of Data Mining and Knowledge Discovery, Proceedings of the 4th European Conference (PKDD</source>
          <year>2000</year>
          ),
          <source>Springer Lecture Notes in Computer Science</source>
          <year>1910</year>
          , pages
          <fpage>159</fpage>
          -
          <lpage>168</lpage>
          . Springer Verlag,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W.A.</given-names>
            <surname>Kosters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Pijls</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Popova</surname>
          </string-name>
          .
          <article-title>Complexity analysis of depth first and FP-growth implementations of Apriori</article-title>
          . In P. Perner and
          <string-name>
            <surname>A</surname>
          </string-name>
          . Rosenfeld, editors,
          <source>Machine Learning and Data Mining in Pattern Recognition, Proceedings MLDM 2003, Springer Lecture Notes in Artificial Intelligence</source>
          <volume>2734</volume>
          , pages
          <fpage>284</fpage>
          -
          <lpage>292</lpage>
          . Springer Verlag,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Pijls</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.C.</given-names>
            <surname>Bioch</surname>
          </string-name>
          .
          <article-title>Mining frequent itemsets in memory-resident databases</article-title>
          . In E. Postma and M. Gyssens, editors,
          <source>Proceedings of the Eleventh Belgium-Netherlands Conference on Artificial Intelligence (BNAIC1999)</source>
          , pages
          <fpage>75</fpage>
          -
          <lpage>82</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          .
          <article-title>Scalable algorithms for association mining</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>12</volume>
          :
          <fpage>372</fpage>
          -
          <lpage>390</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gouda</surname>
          </string-name>
          .
          <article-title>Fast vertical mining using diffsets</article-title>
          .
          <source>In Proceedings 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kohavi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Mason</surname>
          </string-name>
          .
          <article-title>Real world performance of association rule algorithms</article-title>
          . In F. Provost and R. Srikant, editors,
          <source>Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)</source>
          , pages
          <fpage>401</fpage>
          -
          <lpage>406</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>