<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">CT-PRO: A Bottom-Up Non Recursive Frequent Itemset Mining Algorithm Using Compressed FP-Tree Data Structure</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Yudho</forename><forename type="middle">Giri</forename><surname>Sucahyo</surname></persName>
							<email>sucahyoy@cs.curtin.edu.au</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computing</orgName>
								<orgName type="institution">Curtin University of Technology Kent St</orgName>
								<address>
									<postCode>6102</postCode>
									<settlement>Bentley Western</settlement>
									<country key="AU">Australia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Raj</forename><forename type="middle">P</forename><surname>Gopalan</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computing</orgName>
								<orgName type="institution">Curtin University of Technology Kent St</orgName>
								<address>
									<postCode>6102</postCode>
									<settlement>Bentley Western</settlement>
									<country key="AU">Australia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">CT-PRO: A Bottom-Up Non Recursive Frequent Itemset Mining Algorithm Using Compressed FP-Tree Data Structure</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">FEA60315D11E92AC49BED42731E02968</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:45+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Frequent itemset mining (FIM) is an essential part of association rules mining. Its application for other data mining tasks has also been recognized. It has been an active research area and a large number of algorithms have been developed. In this paper, we propose another pattern growth algorithm which uses a more compact data structure named Compressed FP-Tree (CFP-Tree). The number of nodes in a CFP-Tree can be up to half less than in the corresponding FP-Tree. We also describe the implementation of CT-PRO which utilize the CFP-Tree for FIM. CT-PRO traverses the CFP-Tree bottom-up and generates the frequent itemsets following the pattern growth approach non-recursively. Experiments show that CT-PRO performs better than OpportuneProject, FP-Growth, and Apriori. A further experiment is conducted to determine the feasible performance range of CT-PRO and the result shows that CT-PRO has a larger performance range compared to others. CT-PRO also performs better compared to LCM and kDCI that are known as the two best algorithms in FIMI Repository  2003.   </p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Since its introduction in <ref type="bibr" target="#b0">[1]</ref> the problem of efficiently generating frequent itemsets has been an active research area and a large number of algorithms have been developed for it; for surveys, see <ref type="bibr" target="#b1">[2]</ref><ref type="bibr" target="#b2">[3]</ref><ref type="bibr" target="#b3">[4]</ref>. Frequent itemset mining (FIM) is an essential part of association rules mining (ARM). Since FIM is computationally expensive, the general performance of ARM is determined by it. The frequent itemset concept has also been extended for many other data mining tasks such as classification <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>, clustering <ref type="bibr" target="#b6">[7]</ref>, and sequential pattern discovery <ref type="bibr" target="#b7">[8]</ref>.</p><p>The data structures used play an important role in the performance of FIM algorithms. The various data structures used by FIM algorithms can be categorized as either array-based or tree-based. An example of a successful array-based algorithm for FIM is H-Mine <ref type="bibr" target="#b8">[9]</ref>. It uses a data structure named H-struct, which is a combination of arrays and hyper-links. It was shown in <ref type="bibr" target="#b8">[9]</ref> that H-struct works well for sparse datasets as H-Mine outperforms FP-Growth <ref type="bibr" target="#b9">[10]</ref> on these datasets (note that both H-Mine and FP-Growth follows the same pattern growth method). However, the hyper-structure is not efficient on dense datasets and therefore H-Mine switches to FP-Growth for such datasets.</p><p>FP-Growth <ref type="bibr" target="#b9">[10]</ref> shows good performance on dense datasets as it uses a compact data structure named FP-Tree. FP-Tree is a prefix tree with links between nodes containing the same item. A tree data structure is suitable for dense datasets since many transactions will share common prefixes so that the database could be compactly represented. However, for sparse datasets the tree will be bigger and bushier, and therefore its construction cost and traversal cost will be higher than array-based data structures.</p><p>The strengths of H-Mine and FP-Growth were combined in the recent pattern growth FIM algorithm named OpportuneProject (OP) <ref type="bibr" target="#b10">[11]</ref>. OP is an adaptive algorithm that opportunistically chooses an array-based or a tree-based data structure depending on the sub-database characteristics.</p><p>In this paper, we describe our new data structure named Compressed FP-Tree (CFP-Tree) and also the implementation of our new FIM algorithm named CT-PRO that was first introduced in <ref type="bibr" target="#b11">[12]</ref>. Here we report the compactness of CFP-Tree with FP-Tree at several support levels on the various datasets generated using the synthetic data generator <ref type="bibr" target="#b12">[13]</ref>. The performance of CT-PRO is compared with Apriori <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>, FP-Growth <ref type="bibr" target="#b9">[10]</ref>, and OP <ref type="bibr" target="#b10">[11]</ref>.</p><p>Sample datasets such as real-world BMS datasets <ref type="bibr" target="#b2">[3]</ref> or UCI Machine Learning Repository datasets <ref type="bibr" target="#b15">[16]</ref> do not cover the full range of densities from sparse to dense. Some algorithms may work well for a certain dataset but may not be feasible when the dimensions of the database change (i.e. number of transactions, number of items, average number of items per transaction etc.). Therefore, a further study has been done in this paper, to show the feasible performance range of the algorithms. The more extensive testing of the algorithms is carried out using a set of databases with varying number of both transactions and average number of items per transaction. For each dataset, all the algorithms are tested on supports of 10% to 90% in increments of 10%. The experimental results are reported in detail.</p><p>To show how well CT-PRO compares with algorithms in FIMI Repository 2003 <ref type="bibr" target="#b16">[17]</ref>, two best algorithms from the last workshop, LCM <ref type="bibr" target="#b17">[18]</ref> and kDCI <ref type="bibr" target="#b18">[19]</ref>, are selected for comparison. The result shows that CT-PRO outperforms these and therefore all others.</p><p>The structure of the rest of this paper is as follows: In Section 2, we introduce the CFP-Tree data structure and report the results of experiments in evaluating its compactness. In Section 3, we describe the CT-PRO algorithm with a running example. We discuss the complexity of CT-PRO algorithm in Section 4. The performance of the algorithm on various datasets is compared against other algorithms in Section 5. Section 6 contains conclusions of our study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Compressed FP-Tree Data Structure</head><p>In this section, a new tree-based data structure, named Compressed FP-Tree (CFP-Tree), is introduced. It is a variant of CT-Tree data structure that we introduced in <ref type="bibr" target="#b19">[20]</ref> with the following major differences: items are sorted in descending order of their frequency (instead of ascending order, as in CT-Tree) and there is a link to the next node with the same item node (while links are not present in CT-Tree). The CFP-Tree is defined as follows:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 1 (Compressed FP-Tree or CFP-Tree). A Compressed FP-Tree is a prefix tree with the following properties: 1. It consists of an ItemTable and a tree whose root</head><p>represents the index of the item with the highest frequency and a set of subtrees as the children of the root.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The ItemTable contains all frequent items sorted in</head><p>descending order by their frequency. Each entry in the ItemTable consists of four fields, (1) index, <ref type="bibr" target="#b1">(2)</ref> item-id, (3) frequency of the item, and (4) a pointer pointing to the root of the subtree of each frequent item.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">If I = {i 1 , i 2 , … i k } is a set of frequent items in a</head><p>transaction, after being mapped to their index-id, then the transaction will be inserted into the Compressed FP-Tree starting from the root of a subtree to which i 1 in the ItemTable points. 4. The root of the Compressed FP-Tree is the level 0 of the tree. 5. Each node in the Compressed FP-Tree consists of four fields: node-id, a pointer to the next sibling, a pointer to the next node with the same id, and a count array where each entry corresponds to the number of occurrences of an itemset. If C = {C 0 , C 1 ,… C k } is a set of counts in the count array attached to a node and the index of the array starts from zero, then C i is the count of a transaction with an itemset along the path from the node at level i to the node where C i is located.</p><p>The following lemma provides the worst-case space complexity of a CFP-Tree.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lemma 1.</head><p>Let n be the number of frequent items in the database for a certain support threshold. The number of nodes of the CFP-Tree is bounded by 2 n-1 , which is half of the maximum for a full prefix tree. Rationale. If I F = {i F1 , … i Fn } is a set of distinct items in a CFP-Tree where i F1 , i F2, …i Fn are lexicographically ordered. The maximum number of nodes under subtrees i F1 , i F2, … i Fn is 2 n-1 , 2 n-2 …2 0 respectively. Since the CFP-Tree is actually the subtree i F1 then the maximum number of nodes of the CFP-Tree is 2 n-1 .</p><p>Compared to FP-Tree, CFP-Tree has some important differences, as follows: 1. FP-Tree stores the item id in the tree while, in CFP-Tree, item ids are mapped to an ascending sequence of integers that is actually the array index in ItemTable. item-id and a pointer to the first node in the FP-Tree carrying the nodes with the same item-id. CFP-Tree has an ItemTable consisting of four fields: index, item-id, count of the item and a pointer to the root of the subtree of each item. The root of each subtree has a link to the next node with the same-item-node. Both HeaderTable and ItemTable store only frequent items. Figure <ref type="figure" target="#fig_4">1</ref> shows the FP-Tree and the CFP-Tree for a sample database. In this case, the FP-Tree is a complete tree for items 1-4. In this example, the number of nodes in the FP-Tree is twice that of the corresponding CFP-Tree. However, most datasets do not have such an extreme characteristic as in this example.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The</head><p>Figure <ref type="figure" target="#fig_1">2</ref> shows the compactness of CFP-Tree compared to FP-Tree on several synthetic datasets at various support levels (the characteristics of the datasets are explained later in Section 5.2). CFP-Tree has a smaller number of nodes compared to FP-Tree in all cases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">CT-PRO Algorithm</head><p>In this section, a new method that traverses the tree in a bottom-up strategy, and implemented non-recursively, is presented. The CFP-Tree data structure is used to compactly represent transactions in the memory. The algorithm is called CT-PRO and it has three steps in it: finding the frequent items, constructing the CFP-Tree, and mining. Algorithm 1 shows the first two steps in CT-PRO.  Suppose the user wants to mine all frequent itemsets from the transaction database shown in Figure <ref type="figure">3a</ref> with a support threshold of two transactions (or 40%). First, we need to identify frequent items by reading the database once (lines <ref type="bibr" target="#b2">[3]</ref><ref type="bibr" target="#b3">[4]</ref><ref type="bibr" target="#b4">[5]</ref><ref type="bibr" target="#b5">[6]</ref><ref type="bibr" target="#b6">[7]</ref><ref type="bibr" target="#b7">[8]</ref><ref type="bibr" target="#b8">[9]</ref><ref type="bibr" target="#b9">[10]</ref><ref type="bibr" target="#b10">[11]</ref>. The frequent items are stored in frequency descending order in the GlobalItemTable (line 12). In a second pass over the database, only frequent items are selected from each transaction (see Figure <ref type="figure">3b</ref>), mapped to their index id in GlobalItemTable on-the-fly, sorted in ascending order of their index id (see Figure <ref type="figure">3c</ref>) and inserted into the CFP-Tree (see Figure <ref type="figure">3d</ref>). The pointer in GlobalItemTable also acts as the start of the links to other nodes with the same item ids (indicated by the dashed lines in Figure <ref type="figure">3d</ref>). For illustration, at each node we also show the index of the array, the transaction represented at each index entry and its count. In the implementation of CFP-Tree, however, only the second column that represents the count is stored.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 3: CFP-Tree for the Sample Dataset</head><p>The mining process in CT-PRO is shown in Algorithm 2 and illustrated by the following example.</p><p>Example 1. Let the CFP-Tree, as shown in Figure <ref type="figure">3d</ref>, be the input for the mining step in CT-PRO and suppose the user wants to get all the frequent itemsets with minimum support of two transactions (or 40%).</p><p>Figure <ref type="figure">4</ref> shows the LocalCFP-Tree and LocalFrequentPatternTree at each step during the mining process. CT-PRO starts from the least frequent item (index: 5, item: 7) in the GlobalItemTable (line 2). Item 7 is frequent and it will be the root of the LocalFrequentPatternTree (line 3). Then CT-PRO creates a projection of all transactions ending with index 5. This projection is represented by a LocalCFP-Tree and only contains locally frequent items. Traversing the node-link of index 5 in the GlobalCFP-Tree identifies the local frequent items that occur together with it. There are three nodes of index 5 and the path to the root for each node is traversed counting the other indexes that occur together with index 5 (lines <ref type="bibr" target="#b12">[13]</ref><ref type="bibr" target="#b13">[14]</ref><ref type="bibr" target="#b14">[15]</ref><ref type="bibr" target="#b15">[16]</ref><ref type="bibr" target="#b16">[17]</ref><ref type="bibr" target="#b17">[18]</ref><ref type="bibr" target="#b18">[19]</ref><ref type="bibr" target="#b19">[20]</ref><ref type="bibr" target="#b20">[21]</ref><ref type="bibr" target="#b21">[22]</ref><ref type="bibr" target="#b22">[23]</ref>. In all, we have 1 (2), 2 (1), 3 (1) and 4 (2) for index 5. As indexes 1,4 (item id: 4,5) are locally frequent, they are registered in the LocalItemTable and assigned new index ids (see Figure <ref type="figure">4a</ref>). They also become the child of the LocalFrequentPatternTree's root (lines 5-7). Together, the root and its children form frequent itemsets with length two.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 4: Local CFP-Tree during Mining Process</head><p>After local frequent items for the projection have been identified, the node-link in the GlobalCFP-Tree is retraversed and the path to the root from each node is revisited to get the local frequent items occurring together with index 5 in the transactions. These local frequent items are mapped to their index in the LocalItemTable onthe-fly, sorted in ascending order of their index id and inserted into the LocalCFP-Tree (lines 24-33). The first path of index 5 returns nothing. From the second path of index 5, a transaction 14 (1) is inserted into the LocalCFP-Tree and another transaction 14 (1) from the third path of index 5 also is inserted. In total, there are two  occurrences of transaction 14. Indexes 1 and 4 in the GlobalItemTable represent items 4 and 5 respectively. The indexes of items 4 and 5 in the LocalItemTable are 1 and 2 respectively and so the transaction 14 is inserted as transaction 12 in the LocalCFP-Tree. As the item index in the GlobalItemTable and LocalItemTable are different, the item id is always maintained for output purposes.</p><p>Longer frequent itemsets, with length greater than two, are extracted by calling the procedure RecMine (line 9). For simplicity, we have described this procedure (lines 34-47) using recursion but, in the program, it is implemented as a non-recursive procedure. Starting from the least frequent item in the LocalItemTable, (line 35), the node-link is traversed (lines 37-41). For each node, the path to the root in the LocalCFP-Tree is traversed counting the other items that are together with the current item. For example, in Figure <ref type="figure">4a</ref>, traversing the node-link of node 2 will return the index 1 (2) and, since it is frequent, an entry is created and attached as the child of index 2 in the LocalFrequentPatternTree (lines 42-44). All frequent itemsets containing item 7 can be extracted by traversing the LocalFrequentPatternTree (line 10): 7 (2), 75 (2), 754 (2), 74 <ref type="bibr" target="#b1">(2)</ref>.</p><p>The process is continued to mine the next item in the GlobalItemTable in the GlobalCFP-Tree with indexes 4, 3, 2 and finally, when the mining process reaches the root of the tree of Figure <ref type="figure">3d</ref>, it outputs 4 <ref type="bibr" target="#b4">(5)</ref>.</p><p>One major advantage of CT-PRO compared to FP-Growth is that CT-PRO avoids the cost of creating conditional FP-Trees. FP-Growth needs to create a conditional FP-Tree at each step of its recursive mining. This overhead adversely affects its performance, as the number of conditional FP-Trees corresponds to the number of frequent itemsets. In CT-PRO, for each frequent item (not frequent itemsets), only one LocalCFP-Tree is created and traversed non-recursively to extract all frequent itemsets beginning with the frequent item.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Time Complexity</head><p>In this section, the best-case and worst-case time complexity of CT-PRO algorithm is presented. Let I = {i 1 , i 2 , …., i n } be the set of all n items, let transaction database D be {t 1 , t 2 , …, t m }, and let v be the total number of items in all transactions. Lemma 2. In the best-case, the cost of generating frequent itemsets is O(v + n).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Proof. The best-case for the CT-PRO algorithm occurs when there is no frequent item. The algorithm has to read v items in all transactions and add the count of n items.</head><p>The count of all n items are stored in the ItemTable and checked to determine whether there is any frequent item or not. If there is no frequent item, the process stops. Lemma 3. In the worst-case, the cost of generating frequent itemsets is</p><formula xml:id="formula_0">(v + n)+ (v + 2 n-1 ) + ∑ = n k 2 ((2 (n-k) -1)(2 (n-k) ) + 2 (k-2) ) = O(2 2n ).</formula><p>Proof. The worst-case happens when all n items are frequent and all combinations of them are present in m transactions. CT-PRO has three steps: finding frequent items, constructing the CFP-Tree, and mining. The cost of finding frequent items has been provided by Lemma 2. The worst case for the GlobalCFP-Tree corresponds to a situation where all the possible paths exist. In constructing the GlobalCFP-Tree, all the transactions in the database are read (the cost is v) and inserted into the tree (the total number of nodes is 2 n-1 ). For the mining process, for each frequent item f k where 2 ≤ k ≤ n, 2 (k-2) nodes in the GlobalCFP-Tree are visited to construct a LocalCFP-Tree. The LocalCFP-Tree has (2 (n-k) -1) paths that correspond to, at most, 2 (n-k) candidate itemsets. So the worst case mining cost is:</p><formula xml:id="formula_1">∑ = n k 2 ((2 (n-k) -1)(2 (n-k) ) + 2 (k-2) ).</formula><p>Therefore, the total worst-case cost of CT-PRO is</p><formula xml:id="formula_2">(v+n)+(v+2 n-1 )+ ∑ = n k 2 ((2 (n-k) -1)(2 (n-k) ) + 2 (k-2) ) = O(2 2n )</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experimental Evaluation</head><p>This section contains three sub-sections. In Section 5.1, we compare CT-PRO against other well-known algorithms including with Apriori <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>, FP-Growth <ref type="bibr" target="#b9">[10]</ref> and recently proposed OpportuneProject (OP) <ref type="bibr" target="#b10">[11]</ref> on the various datasets available at FIMI Repository 2003 <ref type="bibr" target="#b16">[17]</ref>. In Section 5.2, we report the result of more comprehensive testing to determine the feasible performance range of the algorithms. Finally, in Section 5.3, we compare CT-PRO with the two best algorithms in the FIMI Repository 2003 <ref type="bibr" target="#b16">[17]</ref>, LCM <ref type="bibr" target="#b17">[18]</ref> and kDCI <ref type="bibr" target="#b18">[19]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Comparison with Apriori, FP-Growth and OpportuneProject</head><p>Six real datasets are used in this experiment including two dense datasets: Chess and Connect4; two less dense datasets: Mushroom and Pumsb*; and two sparse datasets: BMS-WebView-1 and BMS-WebView-2. The first four datasets are originally taken from UCI ML Repository <ref type="bibr" target="#b15">[16]</ref> and the last two datasets are donated by Blue Martini Software <ref type="bibr" target="#b2">[3]</ref>. All the datasets are also available at FIMI Repository 2003 <ref type="bibr" target="#b16">[17]</ref>.</p><p>We used the implementation of Apriori created by Christian Borgelt <ref type="bibr" target="#b20">[21]</ref> by enabling the use of the prefix tree data structure. As for FP-Growth, we used the executable code available from its authors <ref type="bibr" target="#b9">[10]</ref>. However, for comparing the number of nodes of FP-Tree to our proposed data structure, we modified the source code of FP-Growth provided by Bart Goethals in <ref type="bibr" target="#b21">[22]</ref>. For OpportuneProject (OP), we used the executable code available from its author, Junqiang Liu <ref type="bibr" target="#b10">[11]</ref>.</p><p>All the algorithm were implemented and compiled using MS Visual C++ 6.0. All the experiments (except comparisons with algorithms in the FIMI Repository 2003 website <ref type="bibr" target="#b16">[17]</ref>) were performed on a Pentium III PC 866 MHz with 512 MB RAM and 110 GB Hard Disk running on MS Windows 2000. All the reported runtime used in our charts is the total execution time, the period between input and output. It also includes the time of constructing all the data structures used in all programs.</p><p>Figure <ref type="figure" target="#fig_5">5</ref> shows the results of the experiment on various datasets. All the charts use a logarithmic scale for run time along the y-axis on the left of the chart. We did not plot the results in the chart if the runtime was more than 10,000 seconds. For a comprehensive evaluation of the algorithm's performance, rather than showing where our algorithm performed best at some of the support levels, all the algorithms were extensively tested on various datasets with a support level of 10% to 90% for dense datasets (e.g. Connect4, Chess, Pumsb*, Mushroom), a support level of 0.1% to 1% for the sparse dataset BMS-WebView-1, and a support level of 0.01% to 0.1% for the sparse dataset BMS-WebView-2. As the average number of items increases and/or the support level decreases, at some point, every algorithm 'hits the wall' (i.e. takes too long to complete).</p><p>CT-PRO outperforms others at all support thresholds on the Connect4, Chess, Mushroom and Pumsb* datasets. On the sparse dataset BMS-WebView-1, CT-PRO is a runner-up, after OP, with only small performance differences (0.4 seconds to 0.49 seconds at a support level of 0.1% and 5.18 seconds to 7.69 seconds at 0.06%). Below the support level of 0.06%, none of the algorithms could mine the BMS-WebView-1 dataset. On the sparse dataset BMS-WebView-2, a remarkable result is obtained. Apriori, which is known as a traditional FIM algorithm, outperforms FP-Growth at all support levels. CT-PRO is the fastest from a support threshold of 1% down to 0.4% and becomes the runner-up, after OP, at a support level of 0.3% down to 0.1% with small performance differences.</p><p>From these results, we can claim that, on dense datasets, CT-PRO generally outperforms others. On sparse datasets, the high cost of the tree construction reduces CT-PRO to runner-up. However, as the gap is very small, we can say that CT-PRO also works well for sparse datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Determining the Feasible Performance Range</head><p>As mentioned earlier, sample datasets such as realword BMS datasets <ref type="bibr" target="#b2">[3]</ref> and the UCI Machine Learning Repository <ref type="bibr" target="#b15">[16]</ref>, which also are available at the FIMI Repository 2003 <ref type="bibr" target="#b16">[17]</ref>, have their own static characteristics and thus do not cover the full range of densities. An algorithm that works well for one dataset may not have the same degree of performance on other datasets with different dimensions. Dimensions, here, could be the number of transactions, number of items, average number of items per transaction, denseness or sparseness, etc. In this section, a more comprehensive evaluation of the performance of various algorithms is presented.</p><p>We generated ten datasets using the synthetic data generator <ref type="bibr" target="#b12">[13]</ref>. The first five datasets contained 100 items with 50,000 transactions, and an average number of items per transaction of 10, 25, 50, 75, and 100. The second five datasets contained 100 items with 100,000 transactions, also with an average number of items per transaction of 10, 25, 50, 75, and 100. CT-PRO, Apriori, FP-Growth and OP were tested extensively on these datasets at a support level of 10% to 90%, in increments of 10%.</p><p>Figure <ref type="figure">6</ref> shows the performance comparisons of the algorithms on various datasets. The dataset name shows its characteristics. For example, I100T100KA10 means there are 100 items, and 100,000 transactions with an average of 10 items per transaction. The experimental results show that the performance characteristics on databases of 50,000 to 100,000 transactions are quite similar. However, the runtime increases with the number of transactions.</p><p>The Apriori algorithm is very feasible for sparse datasets (with an average number of items in each transaction of 10 and 25). Its performance is good, as it consistently performs better than FP-Growth at all support levels. Although Apriori is slower than CT-PRO and OP using the two sparse datasets, its runtime is still acceptable to the user. (It needs only 60 seconds to mine the I100T50KA25 dataset at the support level of 10%). However, on the datasets with an average number of items per transaction of 50, 75, and 100, Apriori performs worst and it can only mine down to a support level of 30%, 50%, and 70% respectively. These results confirm that, for dense datasets, if the support levels used are low, Apriori is infeasible.</p><p>FP-Growth performs worst at all support levels on the datasets with a low average number of items per transaction (i.e. 10 and 25). The fact that FP-Growth does not outperform Apriori on these two datasets shows that Apriori is more feasible than FP-Growth for sparse datasets. However, FP-Growth performs significantly better than Apriori for the larger average number of items in transactions.</p><p>Both CT-PRO and OP have larger feasible performance ranges compared to the other algorithms. OP does not perform well on the sparse datasets I100T50KA10 and I100T100KA10. Its performance was even worse than Apriori on this dataset. However, it performs better than Apriori and FP-Growth on other datasets. On the datasets with an average number of items per transaction of 50, 75, and 100, FP-Growth, CT-PRO and OP can mine down to a support level of 20%, 40%, and 50% respectively.</p><p>CT-PRO can be considered the best among all other algorithms as it generally performs the best at most support levels. However, as the support level gets lower, its performance is similar to OP. Only at a support level of 10%, OP occasionally runs slightly faster than CT-PRO (e.g. at a support level of 10% on the I100T50KA25 dataset).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Comparison with Best Algorithms in the FIMI Repository 2003</head><p>For comparison with the best algorithms in the FIMI Repository 2003 <ref type="bibr" target="#b16">[17]</ref>, we ported our algorithm CT-PRO to a Linux operating system and compared it with their two best algorithms: LCM <ref type="bibr" target="#b17">[18]</ref> and kDCI <ref type="bibr" target="#b18">[19]</ref>. We performed the experiments on a PC AMD Athlon™ XP 2000+ 1.6 GHz, 1 GB RAM, 2 GB Swap with 40GB Hard Disk running Fedora Core 1. All programs were compiled using g++ compiler.</p><p>Figure <ref type="figure">7</ref> shows the performance comparisons of algorithms that were submitted to FIMI 2003 on Chess and Connect4 datasets. The figures are taken from <ref type="bibr" target="#b16">[17]</ref>. On Chess dataset, kDCI is the best at a support level of 90% to a support level of 70%. Below that, LCM outperforms others. On Connect4 dataset, at a support  comparisons of CT-PRO against other well-known algorithms, including Apriori <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>, FP-Growth <ref type="bibr" target="#b9">[10]</ref> and OpportuneProject (OP) <ref type="bibr" target="#b10">[11]</ref> also were reported. The results show that CT-PRO outperforms other algorithms at all support levels on dense datasets and also works well on sparse datasets. Extensive experiments to measure the feasible performance range of the algorithms are also presented in this paper. A synthetic data generator is used to generate several datasets with varying number of both transactions and average number of items per transaction. Then the best available algorithms including CT-PRO, Apriori, FP-Growth and OP are tested on those datasets. The result shows that CT-PRO generally outperforms others.</p><p>In addition, to relate our research to the last workshop on frequent itemset mining implementations <ref type="bibr" target="#b16">[17]</ref>, we selected two best algorithms (LCM and kDCI) from FIMI Repository 2003 and compared their performance with CT-PRO. It was shown that CT-PRO performed better than the others.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>FP-Tree is compressed by removing identical subtrees of a complete FP-Tree and succinctly storing the information from them in the remaining nodes. All subtrees of the root of the FP-Tree (except the leftmost branch) are collected together at the leftmost branch to form the CFP-Tree.. 3. Each node in the FP-Tree (except the root) consists of three fields: item-id, count and node-link. Count registers the number of transactions represented by the portion of the path reaching this node. Node-link links to the next node with the same item-id. Each node in the CFP-Tree consists of three fields: item-id, count array and node-link. The count array contains counts for item subsets in the path from the root to that node. The index of the cells in the array corresponds to the level numbers of the nodes above. 4. FP-Tree has a HeaderTable consisting of two fields:</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Compactness of CFP-Tree Compared to FP-Tree on Various Synthetic Datasets at Various Support Levels</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>Frequent itemsets in projection 2</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>1 Support</head><label>1</label><figDesc>0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Performance Evaluation of CT-PRO Against Others on Various Datasets</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Algorithm 1 CT-PRO Algorithm: Step 1 and Step 2</head><label></label><figDesc></figDesc><table><row><cell>input</cell><cell>Database D, Support Threshold σ</cell></row><row><cell cols="2">output CFP-Tree</cell></row><row><cell cols="2">1 begin</cell></row><row><cell>2</cell><cell>// Step 1: Identify frequent items</cell></row><row><cell>3 4</cell><cell>for each transaction t ∈ D for each item i ∈ t</cell></row><row><cell>5</cell><cell>if i ∈ ItemTable</cell></row><row><cell>6</cell><cell>Increment count of i</cell></row><row><cell>7</cell><cell>else</cell></row><row><cell>8</cell><cell>Insert i into GlobalItemTable with count = 1</cell></row><row><cell>9</cell><cell>end if</cell></row><row><cell>10</cell><cell>end for</cell></row><row><cell>11</cell><cell>end for</cell></row><row><cell>12</cell><cell>Sort GlobalItemTable in</cell></row><row><cell></cell><cell>frequency descending order</cell></row><row><cell>13</cell><cell>Assign an index for each frequent item in the</cell></row><row><cell></cell><cell>GlobalItemTable</cell></row><row><cell>14</cell><cell>// Step 2: Construct CFP-Tree</cell></row><row><cell>15</cell><cell>Construct the left most branch of the tree</cell></row><row><cell>16</cell><cell>for each transaction t ∈ D</cell></row><row><cell>17</cell><cell>Initialize mappedTrans</cell></row><row><cell>18</cell><cell>for each frequent item i ∈ t</cell></row><row><cell>19</cell><cell>mappedTrans = mappedTrans ∪ GetIndex(i)</cell></row><row><cell>20</cell><cell>end for</cell></row><row><cell>21</cell><cell>Sort mappedTrans in ascending order of item ids</cell></row><row><cell>22</cell><cell>InsertToCFPTree(mappedTrans)</cell></row><row><cell>23</cell><cell>end for</cell></row><row><cell cols="2">24 end</cell></row><row><cell cols="2">25 Procedure InsertToCFPTree(mappedTrans)</cell></row><row><cell>26</cell><cell>firstItem := mappedTrans[1]</cell></row><row><cell>27</cell><cell>currNode := root of subtree pointed by</cell></row><row><cell></cell><cell>ItemTable[firstItem]</cell></row><row><cell>28</cell><cell>for each subsequent item i ∈ mappedTrans</cell></row><row><cell>29</cell><cell>if currNode has child represent i</cell></row><row><cell>30</cell><cell>Increment count[firstItem-1] of the child node</cell></row><row><cell>31</cell><cell>else</cell></row><row><cell>32</cell><cell>Create child node and set its</cell></row><row><cell></cell><cell>count[firstItem-1] to 1</cell></row><row><cell>33</cell><cell>Link the node to its respective node-link</cell></row><row><cell>34</cell><cell>end if</cell></row><row><cell>35</cell><cell>end for</cell></row><row><cell cols="2">36 end</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Algorithm 2 CT-PRO Algorithm: Mining Part</head><label></label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell></cell><cell>input</cell><cell>CFP-Tree</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">output Frequent Itemsets FP</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">1 Procedure Mining</cell></row><row><cell></cell><cell></cell><cell></cell><cell>2</cell><cell>for each frequent item i ∈ GlobalItemTable</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>from the least to the most frequent</cell></row><row><cell></cell><cell></cell><cell></cell><cell>3</cell><cell>Initialize LocalFrequentPatternTree</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>with i as the root</cell></row><row><cell>1 4 2 2 5 2 Local ItemTable</cell><cell>Lev 0 0 2 1 0 2 1 0 12 2</cell><cell>GlobalItemTable</cell><cell>4 5 6 7 8 9</cell><cell>ConstructLocalItemTable(x) for each frequent item j ∈ LocalItemTable Attach i as a child of x end for ConstructLocalCFPTree(x) RecMine(x)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>10</cell><cell>Traverse the LocalFrequentPatternTree</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>to print the frequent itemsets</cell></row><row><cell></cell><cell></cell><cell></cell><cell>11</cell><cell>end for</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">12 end</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">13 Procedure ConstructLocalItemTable(i)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>14</cell><cell>for each occurrence of node i in the CFP-Tree</cell></row><row><cell></cell><cell></cell><cell></cell><cell>15</cell><cell>for each item j in the path to the root</cell></row><row><cell></cell><cell></cell><cell></cell><cell>16</cell><cell>if j ∈ LocalItemTable</cell></row><row><cell></cell><cell></cell><cell></cell><cell>17</cell><cell>Increment count of j</cell></row><row><cell></cell><cell></cell><cell></cell><cell>18</cell><cell>else</cell></row><row><cell></cell><cell></cell><cell></cell><cell>19</cell><cell>Insert j into LocalItemTable with count = 1</cell></row><row><cell></cell><cell></cell><cell></cell><cell>20</cell><cell>end if</cell></row><row><cell></cell><cell></cell><cell></cell><cell>21</cell><cell>end for</cell></row><row><cell></cell><cell></cell><cell></cell><cell>22</cell><cell>end for</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">23 end</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">24 Procedure ConstructLocalCFPTree(i)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>25</cell><cell>for each occurrence of node i in the CFP-Tree</cell></row><row><cell></cell><cell></cell><cell></cell><cell>26</cell><cell>Initialize mappedTrans</cell></row><row><cell></cell><cell></cell><cell></cell><cell>27</cell><cell>for each frequent item j ∈ LocalItemTable</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>in the path to the root</cell></row><row><cell></cell><cell></cell><cell></cell><cell>28</cell><cell>mappedTrans = mappedTrans ∪ GetIndex(j)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>29</cell><cell>end for</cell></row><row><cell></cell><cell></cell><cell></cell><cell>30</cell><cell>Sort mappedTrans in ascending order of item ids</cell></row><row><cell></cell><cell></cell><cell></cell><cell>31</cell><cell>InsertToCFPTree(mappedTrans)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>32</cell><cell>end for</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">33 end</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">34 Procedure RecMine(x)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>35</cell><cell>for each child i of x</cell></row><row><cell></cell><cell></cell><cell></cell><cell>36</cell><cell>Set all counts in LocalItemTable to 0</cell></row><row><cell></cell><cell></cell><cell></cell><cell>37</cell><cell>for each occurrence of node i in</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>the LocalCFPTree</cell></row><row><cell></cell><cell></cell><cell></cell><cell>38</cell><cell>for each item j in the path to the root</cell></row><row><cell></cell><cell></cell><cell></cell><cell>39</cell><cell>Increment count of j in LocalCFPTree</cell></row><row><cell></cell><cell></cell><cell></cell><cell>40</cell><cell>end for</cell></row><row><cell></cell><cell></cell><cell></cell><cell>41</cell><cell>end for</cell></row><row><cell></cell><cell></cell><cell></cell><cell>42</cell><cell>for each frequent item k ∈ LocalItemTable</cell></row><row><cell></cell><cell></cell><cell></cell><cell>43</cell><cell>Attach k as a child of i</cell></row><row><cell></cell><cell></cell><cell></cell><cell>44</cell><cell>end for</cell></row><row><cell></cell><cell></cell><cell></cell><cell>45</cell><cell>RecMine(i)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>46</cell><cell>end for</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="2">47 end</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Acknowledgement</head><p>We are very grateful to Jian Pei for providing the executable code of FP-Growth, Bart Goethals for providing his FP-Growth program, Christian Borgelt for the Apriori program, and Junqiang Liu for the OpportuneProject program.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Below that, LCM outperforms others. We can conclude that for higher support levels, kDCI is the best, but for lower support levels, LCM is the best. These best two algorithms are compared with CT-PRO.</p><p>The kDCI algorithm <ref type="bibr" target="#b18">[19]</ref> is a multiple heuristics hybrid algorithm that able to adapt its behaviour during the execution. It is an extension of the DCI (Direct Count and Intersect) algorithm <ref type="bibr" target="#b22">[23]</ref> by adding its adaptability to the dataset specific features. kDCI is also a resource aware algorithm which can decides mining strategy based on the hardware characteristics of the computing platform used. Moreover, kDCI also used counting inference strategy which originally proposed in <ref type="bibr" target="#b23">[24]</ref>.</p><p>The LCM (Linear time Closed itemset Miner) algorithm <ref type="bibr" target="#b17">[18]</ref> uses the parent-child relationship defined on frequent itemsets. The search tree technique is adapted from the algorithms for generating maximal bipartite cliques <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b25">26]</ref> based on reverse search <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b27">28]</ref>. In enumerating all frequent itemsets, LCM uses hybrid techniques involving occurrence deliver or diffsets <ref type="bibr" target="#b28">[29]</ref> according to the density of the database. Figure <ref type="figure">8</ref> shows the performance comparisons on Chess and Connect4 datasets. From these results, CT-PRO always outperforms others at high support levels. For lower support levels, the performances of these three algorithms are similar. Since LCM and kDCI are the best algorithms in FIMI Repository 2003 on Chess and Connect4 datasets, we can conclude that CT-PRO outperforms all other algorithms available in FIMI Repository 2003 <ref type="bibr" target="#b16">[17]</ref> on Chess and Connect4 datasets. CT-PRO was explained in detail using a running example and the best-case and worst-case time complexity of the algorithm also was presented. Performance </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Mining Association Rules between Sets of Items in Large Databases</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Imielinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Swami</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACM SIGMOD International Conference on Management of Data</title>
				<meeting>ACM SIGMOD International Conference on Management of Data<address><addrLine>Washington DC</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1993">1993</date>
			<biblScope unit="page" from="207" to="216" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Algorithms for Association Rule Mining -A General Survey and Comparison</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hipp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Guntzer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Nakhaeizadeh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explorations</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="58" to="64" />
			<date type="published" when="2000-07">July 2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Real World Performance of Association Rule Algorithms</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kohavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mason</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD)</title>
				<meeting>the 7th International Conference on Knowledge Discovery and Data Mining (KDD)<address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Efficient Frequent Pattern Mining</title>
		<author>
			<persName><forename type="first">B</forename><surname>Goethals</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
		<respStmt>
			<orgName>University of Limburg, Belgium</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD Thesis</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Integrating Classification and Association Rule Mining</title>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Hsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACM SIGKDD</title>
				<meeting>ACM SIGKDD<address><addrLine>New York, NY</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Building a More Accurate Classifier Based on Strong Frequent Patterns</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">G</forename><surname>Sucahyo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">P</forename><surname>Gopalan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 17th Australian Joint Conference on Artificial Intelligence</title>
				<meeting>the 17th Australian Joint Conference on Artificial Intelligence<address><addrLine>Cairns, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Clustering Transactions Using Large Items</title>
		<author>
			<persName><forename type="first">K</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACM CIKM</title>
				<meeting>ACM CIKM<address><addrLine>USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Mining Sequential Patterns</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th International Conference on Data Engineering (ICDE)</title>
				<meeting>the 11th International Conference on Data Engineering (ICDE)<address><addrLine>Taipei, Taiwan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nishio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Data Mining (ICDM)</title>
				<meeting>the IEEE International Conference on Data Mining (ICDM)<address><addrLine>San Jose, California</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Mining Frequent Patterns without Candidate Generation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM SIGMOD International Conference on Management of Data</title>
				<meeting>the ACM SIGMOD International Conference on Management of Data<address><addrLine>Dallas, TX</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Mining Frequent Item Sets by Opportunistic Projection</title>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACM SIGKDD</title>
				<meeting>ACM SIGKDD<address><addrLine>Edmonton, Alberta, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">High Performance Frequent Pattern Extraction using Compressed FP-Trees</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">P</forename><surname>Gopalan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">G</forename><surname>Sucahyo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of SIAM International Workshop on High Performance and Distributed Mining (HPDM)</title>
				<meeting>SIAM International Workshop on High Performance and Distributed Mining (HPDM)<address><addrLine>Orlando, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Synthetic Data Generation Code for Associations and Sequential Patterns</title>
		<author>
			<persName><surname>Ibm</surname></persName>
		</author>
		<ptr target="http://www.almaden.ibm.com/software/quest/Resources/index.shtml" />
	</analytic>
	<monogr>
		<title level="m">Intelligent Information Systems</title>
				<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
		<respStmt>
			<orgName>IBM Almaden Research Center</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Fast Algorithms for Mining Association Rules</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th International Conference on Very Large Data Bases</title>
				<meeting>the 20th International Conference on Very Large Data Bases<address><addrLine>Santiago, Chile</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Efficient Implementations of Apriori and Eclat</title>
		<author>
			<persName><forename type="first">C</forename><surname>Borgelt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI)</title>
				<meeting>the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI)<address><addrLine>Melbourne, Florida</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">UCI repository of machine learning databases</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Blake</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Merz</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<pubPlace>Irvine, CA</pubPlace>
		</imprint>
		<respStmt>
			<orgName>University of California, Department of Information and Computer Science</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><surname>Fimi</surname></persName>
		</author>
		<ptr target="http://fimi.cs.helsinki.fi" />
		<title level="m">FIMI Repository</title>
				<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets</title>
		<author>
			<persName><forename type="first">T</forename><surname>Uno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Uchida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Arimura</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE ICDM Workshop of Frequent Itemset Mining Implementations (FIMI)</title>
				<meeting>the IEEE ICDM Workshop of Frequent Itemset Mining Implementations (FIMI)<address><addrLine>Melbourne, Florida</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lucchese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Orlando</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Palmerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Perego</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Silvestri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE ICDM Workshop of Frequent Itemset Mining Implementations (FIMI)</title>
				<meeting>the IEEE ICDM Workshop of Frequent Itemset Mining Implementations (FIMI)<address><addrLine>Melbourne, Florida</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">CT-ITL: Efficient Frequent Item Set Mining Using a Compressed Prefix Tree with Pattern Growth</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">G</forename><surname>Sucahyo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">P</forename><surname>Gopalan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th Australasian Database Conference</title>
				<meeting>the 14th Australasian Database Conference<address><addrLine>Adelaide, Australia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Induction of Association Rules: Apriori Implementation</title>
		<author>
			<persName><forename type="first">C</forename><surname>Borgelt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kruse</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th Conference on Computational Statistics</title>
				<meeting>the 15th Conference on Computational Statistics<address><addrLine>Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Home page of Bart Goethals</title>
		<author>
			<persName><forename type="first">B</forename><surname>Goethals</surname></persName>
		</author>
		<ptr target="http://www.cs.helsinki.fi/u/goethals" />
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Adaptive and Resource-Aware Mining of Frequent Sets</title>
		<author>
			<persName><forename type="first">S</forename><surname>Orlando</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Palmerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Perego</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Silvestri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Data Mining (ICDM)</title>
				<meeting>the IEEE International Conference on Data Mining (ICDM)<address><addrLine>Maebashi City, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Mining Frequent Patterns with Counting Inference</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bastide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Taouil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Pasquier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Stumme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lakhal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM SIGKDD Explorations</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="66" to="75" />
			<date type="published" when="2000-12">December 2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">A Practical Fast Algorithm for Enumerating Cliques in Huge Bipartite Graphs and Its Implementation</title>
		<author>
			<persName><forename type="first">T</forename><surname>Uno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 89th Special Interest Group of Algorithms</title>
				<meeting>the 89th Special Interest Group of Algorithms<address><addrLine>Japan</addrLine></address></meeting>
		<imprint>
			<publisher>Information Processing Society</publisher>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Fast Algorithms for Enumerating Cliques in Huge Graphs</title>
		<author>
			<persName><forename type="first">T</forename><surname>Uno</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
		<respStmt>
			<orgName>Research Group of Computation, IEICE, Kyoto University</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Reverse Search for Enumeration</title>
		<author>
			<persName><forename type="first">D</forename><surname>Avis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Fukuda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Discrete Applied Mathematics</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="page" from="21" to="46" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">A New Approach for Speeding Up Enumeration Algorithms</title>
		<author>
			<persName><forename type="first">T</forename><surname>Uno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ISAAC&apos;98</title>
				<meeting>ISAAC&apos;98</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">CHARM: An Efficient Algorithm for Closed Itemset Mining</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Zaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hsiao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of SIAM International Conference on Data Mining (SDM)</title>
				<meeting>SIAM International Conference on Data Mining (SDM)<address><addrLine>Arlington, VA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
