<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AIM2: Improved implementation of AIM</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Sagi Shporer School of Computer Science Tel-Aviv University Tel Aviv</institution>
          ,
          <country country="IL">Israel</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present AIM2-F , an improved implementation of AIM-F [4] algorithm for mining frequent itemsets. Past studies have proposed various algorithms and techniques for improving the e±ciency of the mining task. We have presented AIM-F at FIMI'03, a combination of some techniques into an algorithm which utilize those techniques dynamically according to the input dataset. The algorithm main features include depth ¯rst search with vertical compressed database, di®set, parent equivalence pruning, dynamic reordering and projection. Experimental testing suggests that AIM2F outperforms existing algorithm implementations on various datasets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction</p>
      <p>
        Finding association rules is one of the driving
applications in data mining, and much research has been
done in this ¯eld [
        <xref ref-type="bibr" rid="ref3 ref5 ref7">7, 3, 5</xref>
        ]. Using the support-con¯dence
framework, proposed in the seminal paper of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the
problem is split into two parts | (a) ¯nding frequent
itemsets, and (b) generating association rules.
      </p>
      <p>Let I be a set of items. A subset X µ I is called
an itemset. Let D be a transactional database, where
each transaction T 2 D is a subset of I : T µ I. For an
itemset X, support(X) is de¯ned to be the number of
transactions T for which X µ T . For a given parameter
minsupport, an itemset X is call a frequent itemset
if support(X) ¸ minsupport. The set of all frequent
itemsets is denoted by F .</p>
      <p>
        We have presented AIM-F [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for mining frequent
itemsets. The AIM-F algorithm build upon several
ideas appearing in previous work, a partial list of which
is the following: Apriori [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Lexicographic Trees and
Depth First Search Traversal [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Dynamic Reordering
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Vertical Bit Vectors [
        <xref ref-type="bibr" rid="ref3 ref7">7, 3</xref>
        ], Projection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Di®erence
sets [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Dynamic Reordering [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Parent Equivalence
Pruning [
        <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
        ] and Bit-vector projection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>High level pseudo code for the AIM-F algorithm
appears in Figure 1.</p>
      <p>AIM-F (n : node, minsupport : integer)
(1) t = n:tail
(2) for each ® in t
(3) Compute s® = support(n:head S ®)
(4) if (s® = support(n:head))
(5) add ® to the list of items removed by PEP
(6) remove ® from t
(7) else if (s® &lt; minsupport)
(8) remove ® from t
(9) Sort items in t by s® in ascending order.
(10) While t 6= ;
(11) Let ® be the ¯rst item in t
(12) remove ® from t
(13) n0:head = n:head S ®
(14) n0:tail = t
(15) Report n0:head SfAll subsets of items
removed by PEPg as frequent itemsets
(16) AIM-F (n0)
2. Implementation Improvements</p>
      <p>We now describe the di®erence between AIM-F and
AIM2-F implementations:
² Integer to String conversions - Experiments run
time analysis have shown that the conversion of
integers to strings is a major CPU consumer. To
reduce conversion time two steps are taken:
{ Item name conversion - When printing an
itemset all the item names in the itemset are
printed. In this mining task the items are
numbers, and need to be converted to strings.
Instead of creating the string every time
before printing, the conversion is done once for
every item, when the item is loaded during
the dataset reading process.
{ Support conversion - To print the support it
must be converted to a string. To enable fast
conversion of the support value to string, a
static lookup table from integer to string was
added. The lookup table contains the 64K
integer values above the minSupport. Every
entry in the lookup table has the string
representation of the entry attached. Every time a
support value needs to be converted to string,
it is ¯rst checked if the value appears in the
lookup table, if so, the string is taken from
the table, with a very low cost.</p>
      <p>In ¯gures 2 and 3 we compare the AIM2-F
algorithm runtime with and without the string
conversion improvement. It is clear that this
improvement alone contribute up to an order of magnitude
improvement. As the size of the input increases
(lower support) the contribution of the string
conversion improvements increases.
² Late F 2 matrix construction - The size of the
F 2 matrix is I2 where I is the number of items.
In datasets where the number of items is very
large the F 2 matrix can not be constructed. The
improvement in AIM2-F is that the F 2 matrix
is built only for items for which support(i) ¸
minSupport. This enables the construction of the
F 2 for larger datasets.
² Input bu®er reuse - In AIM-F the dataset load
method allocated an input bu®er for every
transaction read. Switching to a single input bu®er
that is re-used for all the transactions reduced the
loading time in AIM2-F by nearly 50%. However
the loading time is usually insigni¯cant comparing
to the overall runtime (unless the support is very
high).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Imielinski</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Swami</surname>
          </string-name>
          .
          <article-title>Mining association rules between sets of items in large databases</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <volume>207</volume>
          {
          <fpage>216</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          .
          <article-title>Fast algorithms for mining association rules</article-title>
          .
          <source>In VLDB</source>
          , pages
          <volume>487</volume>
          {
          <fpage>499</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Burdick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Calimlim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          . Ma¯
          <article-title>a: a maximal frequent itemset algorithm for transactional databases</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiat</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Shporer</surname>
          </string-name>
          . Aim:
          <article-title>Another itemset miner</article-title>
          .
          <source>In FIMI</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. J. B.</given-names>
            <surname>Jr</surname>
          </string-name>
          . E±
          <article-title>ciently mining long patterns from databases</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <volume>85</volume>
          {
          <fpage>93</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rymon</surname>
          </string-name>
          .
          <article-title>Search through systematic set enumeration</article-title>
          .
          <source>In KR-92</source>
          , pages
          <fpage>539</fpage>
          {
          <fpage>550</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shenoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Haritsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundarshan</surname>
          </string-name>
          , G. Bhalotia,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bawa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Shah</surname>
          </string-name>
          .
          <article-title>Turbo-charging vertical mining of large databases</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          .
          <article-title>Scalable algorithms for association mining</article-title>
          .
          <source>Knowledge and Data Engineering</source>
          ,
          <volume>12</volume>
          (
          <issue>2</issue>
          ):
          <volume>372</volume>
          {
          <fpage>390</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Gouda</surname>
          </string-name>
          .
          <article-title>Fast vertical mining using di®sets</article-title>
          .
          <source>In KDD</source>
          , pages
          <volume>326</volume>
          {
          <fpage>335</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>