<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Takeaki</forename><surname>Uno</surname></persName>
							<email>uno@nii.jp</email>
							<affiliation key="aff0">
								<orgName type="institution">National Institute of Informatics</orgName>
								<address>
									<addrLine>2-1-2 Hitotsubashi, Chiyoda-ku</addrLine>
									<postCode>101-8430</postCode>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Masashi</forename><surname>Kiyomi</surname></persName>
							<email>masashi@grad.nii.ac.jp</email>
							<affiliation key="aff0">
								<orgName type="institution">National Institute of Informatics</orgName>
								<address>
									<addrLine>2-1-2 Hitotsubashi, Chiyoda-ku</addrLine>
									<postCode>101-8430</postCode>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hiroki</forename><surname>Arimura</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Information Science and Technology</orgName>
								<orgName type="institution">Hokkaido University Kita</orgName>
								<address>
									<addrLine>14-jo Nishi 9-chome</addrLine>
									<postCode>060-0814</postCode>
									<settlement>Sapporo</settlement>
									<country key="JP">JAPAN</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">047A070585289CE3ECD5ADC81BAE3021</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:45+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>For a transaction database, a frequent itemset is an itemset included in at least a specified number of transactions. A frequent itemset P is maximal if P is included in no other frequent itemset, and closed if P is included in no other itemset included in the exactly same transactions as P .</p><p>The problems of finding these frequent itemsets are fundamental in data mining, and from the applications, fast implementations for solving the problems are needed. In this paper, we propose efficient algorithms LCM (Linear time Closed itemset Miner), LCMfreq and LCMmax for these problems. We show the efficiency of our algorithms by computational experiments compared with existing algorithms.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Frequent item set mining is one of the fundamental problems in data mining and has many applications such as association rule mining, inductive databases, and query expansion. From these applications, fast implementations of frequent itemset mining problems are needed. In this paper, we propose the second versions of LCM, LCMfreq and LCMmax, for enumerating closed, all and maximal frequent itemsets. LCM is an abbreviation of Linear time Closed item set Miner .</p><p>In FIMI03 <ref type="bibr" target="#b6">[7]</ref>, we proposed the first version of LCM, which is for enumerating frequent closed itemsets. LCM uses prefix preserving closure extension (ppc extension in short), which is an extension from a closed itemset to another closed itemset. The extension induces a search tree on the set of frequent closed itemsets, thereby we can completely enumerate closed itemsets without duplications. Generating a ppc extension needs no previously obtained closed itemset. Hence, the memory use of LCM does not depend on the number of frequent closed itemsets, even if there are many frequent closed itemsets.</p><p>The time complexity of LCM is theoretically bounded by a linear function in the number of frequent closed itemsets, while the existing algorithms are not. We further developed algorithms for the frequency counting, occurrence deliver and hybrid of diffsets. They reduce the practical computation time efficiently. Moreover, the framework of LCM is simple. Generating ppc extensions needs no sophisticated data structure such as binary trees. LCM is implemented with only arrays. Therefore, LCM is fast, and outperforms than other algorithms for some sparse datasets.</p><p>However, LCM does not have any routine for reducing the database, while many existing algorithms have. Thus, the performance of LCM is not good for dense datasets with large minimum supports, which involve many unnecessary items and transactions. At FIMI03, we also proposed modifications of LCM, LCMfreq and LCMmax, for enumerating all frequent itemsets and maximal frequent itemsets. Although they are fast for some instances, if LCM is not fast for an instance, they are also not fast for the instance. Existing maximal frequent itemset mining algorithms have efficient pruning methods to reduce the number of iterations, while LCMmax does not have. It is also a reason of the slowness of LCMmax.</p><p>This paper proposes the second version of LCM algorithms. We added database reduction to LCM, so that problems of dense datasets can be solved in short time. The second version of LCMmax includes a pruning method, thus the computation time is reduced when the number of maximal frequent itemsets is small. We further developed new algorithms for checking the maximality of a frequent itemset and for taking the closure of an itemset. We compare the performance of LCM algorithms and other algorithms submitted to FIMI03 by computational experiments. In many instances, LCM algorithms perform above other algorithms.</p><p>The organization of the paper is as follows. Section 2 introduces preliminaries. The main algorithms and practical techniques of LCM algorithms are described in Section 3. Section 4 shows the results of computational experiments, and Section 5 concludes the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminaries</head><p>Let I = {1, ..., n} be the set of items. A transaction database on I is a set T = {t 1 , . . . , t m } such that each t i is included in I. Each t i is called a transaction. We denote by ||T || the sum of sizes of all transactions in T , that is, the size of database T . A set P ⊆ I is called an itemset.</p><p>For itemset P , a transaction including P is called an occurrence of P . The denotation of P , denoted by T (P ) is the set of the occurrences of P . |T (P )| is called the frequency of P, and denoted by f rq(P ). For given constant θ, called a minimum support, itemset P is frequent if f rq(P ) ≥ θ. If a frequent itemset P is included in no other frequent itemset, P is called maximal. For any itemsets P and Q, T (P ∪ Q) = T (P )∩T (Q) holds, and if</p><formula xml:id="formula_0">P ⊆ Q then T (Q) ⊆ T (P ). An itemset P is called closed if no other itemset Q satisfies T (P ) = T (Q), P ⊆ Q.</formula><p>Given set S ⊆ T of transactions, let I(S) be the set of items common to all transactions in S, i.e., I(S) = T ∈S T . Then, we define the closure of itemset P in T , denoted by clo(P ), by I(T (P ))(= t∈T (P ) t). For every pair of itemsets P and Q, the following properties hold <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>.</p><formula xml:id="formula_1">(1) If P ⊆ Q, then clo(P ) ⊆ clo(Q). (2) If T (P ) = T (Q), then clo(P ) = clo(Q).</formula><p>(3) clo(clo(P )) = clo(P ). ( <ref type="formula">4</ref>) clo(P ) is the unique smallest closed itemset including P . (5) A itemset P is a closed itemset if and only if clo(P ) = P .</p><p>For itemset P and item i ∈ P , let P (i) = P ∩ {1, . . . , i} be the subset of P consisting only of elements no greater than i, called the i-prefix of P . An itemset Q is a closure extension of an itemset P if Q = clo(P ∪ {i}) holds for some i ∈ P . If Q is a closure extension of P , then Q ⊃ P, and f rq(Q) &lt; f rq(P ). We call the item with the maximum index in P the tail of P , and denote by tail(P ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Algorithms for Efficient Enumeration</head><p>In this section, we explain the techniques used in the second versions of LCM algorithms. We explain them one-by-one with comparing to the techniques used by the other algorithms, in the following subsections. The techniques used in the existing algorithms and the first version have citations to the previous papers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Enumerating Frequent Itemsets</head><p>Any itemset included in a frequent itemset is itself frequent. Thereby, the property "frequent" is monotone. From this, we can construct any frequent itemset from the empty set by adding items one-byone without passing through any infrequent itemset. Roughly speaking, the existing algorithms are classified into two groups, and algorithms in both groups use this property.</p><p>The first group is so called apriori or level-by-level algorithms <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>. Let D k be the set of frequent itemsets of size k. Apriori algorithms start with D 0 , that is {∅}, and compute D k from D k−1 in the increasing order of k from k = 1. Any itemset in D k is obtained from an itemset of D k−1 by adding an item. Apriori algorithms add every item to each itemset of D k−1 , and choose frequent itemsets among them. If D k = ∅ holds for some k, then D k = ∅ holds for any k &gt; k. Thus, apriori algorithms stop at such k. This is the scheme of apriori algorithms.</p><p>The other group is so called backtracking algorithms <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b18">19]</ref>. Backtracking algorithm is based on recursive calls. An iteration of a backtracking algorithm inputs a frequent itemset P , and generates itemsets by adding every items to P . Then, for each itemset being frequent among them, the iteration generates recursive calls with respect to it. To avoid duplications, an iteration of backtracking algorithms adds items with indices larger than the tail of P . We describe the framework of backtracking algorithms as follows.</p><p>ALGORITHM BackTracking (P :current solution) 1. Output P 2. For each e ∈ I, e &gt; tail(P ) do 3.</p><p>If P ∪ {e} is frequent then call BackTracking (P ∪ {e})</p><p>An execution of backtracking algorithms gives a tree structure such that the vertices of the tree are iterations, and edges connect two iterations if one of the iteration calls the other. If an iteration I recursively calls another iteration I , then we say that I is the parent of I , and I is a child of I. For an iteration, the itemset received from the parent is called the current solution.</p><p>Apriori algorithms use much memory for storing D k in memory, while backtracking algorithms use less memory since they keep only the current solution. Backtracking algorithms need no computation for maintaining previously obtained itemsets, so the computation time of backtracking algorithms is generally short. However, apriori algorithms have advantages for the frequency counting.</p><p>LCM algorithms are based on backtracking algorithms, and use an efficient techniques for the frequency counting, which are occurrence deliver and anytime database reduction described below. Hence, LCM algorithms compute the frequency efficiently without keeping previously obtained itemsets in memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Maintaining Databases</head><p>In the existing studies, database reduction is said to be important to reduce the computation time. It is to reduce the input database as the following rules:</p><p>1. remove each item included in less than θ transactions 2. remove each item included in all transactions 3. merge the identical transactions into one.</p><p>Database reduction performs well when the minimum support is large, and many existing algorithms use it. LCM algorithms also use database reduction.</p><p>In the existing studies, the input databases are often stored and maintained by using FP-tree (frequent pattern tree), which is a version of prefix tree (trie) <ref type="bibr" target="#b8">[9]</ref>. By using FP-tree, we can search specified transactions from the datasets efficiently. FP-tree compresses the common prefix, so we can decrease the memory use. In addition, FP-tree can detect the identical transactions, thus we can merge them into one. This merge accelerates the frequency counting. From these reasons, FP-trees are used in many algorithms and implementations.</p><p>Although FP-tree has many good advantages, we do not use it in the implementation of LCM, but use simple arrays. The main reason is that LCM does not have to search transactions in the database. The main operation of LCM is tracing the transactions in the the denotation of the current solution. Thus, we do not need to use sophisticated data structures for searching.</p><p>The other reason is the computation time for the initialization. If we use a standard binary tree for implementing FP-tree, the initialization of the input database takes O(||T || + |T | log |T |) time. for constructing FP-tree in memory. Compared to this, LCM detects the identical transactions and stores the database in memory within linear time of the database size. This is because that LCM uses radix sort for this task, which sorts the transactions in a lexicographic order in linear time. In general, the datasets of data mining problems have many transactions, and each transaction has few items. Thus, ||T || is usually smaller than |T | log |T |, and LCM has an advantage. The constant factors of the computation time of binary tree operations are relatively larger than that of array operations. LCM also has an advantage at this point. Again, we recall that LCM never search the transactions, so each operation required by LCM can be done in constant time.</p><p>FP-tree has an advantage in reducing the memory use. This memory reduction can also reduce the computation time of the frequency counting. To check the efficiency of the reduction, we checked the reduction ratio by FP-tree for some datasets examined in FIMI03. The result is shown in Table <ref type="table" target="#tab_1">1</ref>. Each cell shows the ratio of the number of items needed to be stored by arrays and FP-tree. Usually, the input database is reduced in each iteration, hence we sum up the numbers over all iterations to compute the ratio. In the results of our experiments, the ratio was not greater 3 in many instances. If |I| is small, |T | is large, and the dataset has a randomness, such as accidents, the ratio was up to 6. Generally, a binary tree uses memory three times as much as an array. Thus, the performance of FP-tree seems to be not quite good rather than an array in both memory use and computation time, for many datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Hypercube Decomposition</head><p>LCM finds a number of frequent itemsets at once for reducing the computation time <ref type="bibr" target="#b17">[18]</ref>. Since the itemsets obtained at once compose a hypercube in the itemset lattice, we call the technique hypercube decomposition. For a frequent itemset P , let H(P ) be the set of items e satisfying e &gt; tail(P ) and T (P ) = T (P ∪ {e}). Then, for any Q ⊆ H(P ), T (P ∪ Q) = T (P ) holds, and P ∪ Q is frequent. LCMfreq uses this property. For two itemsets P and P ∪ Q, we say that P is between P and P ∪ Q if P ⊆ P ⊆ P ∪ Q. In the iteration with respect to P , we output all P between P and P ∪ H(P ). This saves about 2 |H(P )| times of the frequency counting.</p><p>To avoid duplications, we do not generate recursive calls with respect to items included in H(P ). Instead of generating these recursive calls, we output frequent itemsets including items of H(P ) in recursive calls with respect to items not included in H(P ). When the algorithm generates a recursive call with respect to e ∈ H(P ), we pass H(P ) to it. In the recursive call, we output all itemsets between P ∪ {e} and P ∪ {e} ∪ H(P ) ∪ H(P ∪ {e}). Since any itemset Q satisfies T (P ∪ Q ∪ H(P )) = T (P ∪ Q), the itemsets output in the recursive calls are frequent. We describe hypercube decomposition as follows.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ALGORITHM HypercubeDecomposition</head><p>(P :current solution, S:itemset) S := S ∪ H(P ) Output all itemsets including P and included in P ∪ S For each item e ∈ I \ (P ∪ S ), e &gt; tail(P ) do If P ∪ {e} is frequent then call HypercubeDecomposition (P ∪ {e}, S ) End for</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Frequency Counting</head><p>Generally, the most heavy part of the frequent itemset mining is the frequency counting, which is to count the number of transactions including a newly generated itemset. To reduce the computation time, existing algorithms uses down project. For an itemset P , down project computes its denotation T (P ) by using two subsets P 1 and P 2 of P . If P = P 1 ∪ P 2 , then T (P ) = T (P 1 )∩T (P 2 ). Under the condition that the items of P 1 and P 2 are sorted by their indices, the intersection can be computed in O(|T (P 1 )| + |T (P 2 )|) time. Down project uses this property, and computes the denotations quickly. Moreover, if |T (P 1 )| &lt; θ or |T (P 2 )| &lt; θ holds, we can see that P never be frequent. It also helps to reduce the computation time.</p><p>Apriori-type algorithms accelerates the frequency counting by finding a good pair P 1 and P 2 of subsets of P , such that |T (P 1 )| + |T (P 2 )| is small, or either P 1 or P 2 is infrequent. Backtracking algorithm adds an item e to the current solution P in each iteration, and compute its denotation. By using T ({e}), the computation time for the frequency counting is reduced to O( e&gt;tail(P</p><formula xml:id="formula_2">) (|T (P )| + |T ({e})|)).</formula><p>The bitmap method <ref type="bibr" target="#b4">[5]</ref> is a technique for speeding up the computation of taking the intersection in down project. It uses a bitmap image (the characteristic vector) of the denotations. To take the intersection, we have to take O(|T |) time with bitmap. However, a 32bit CPU can take the intersection of 32bits at once, thus roughly speaking the computation time is reduced to 1/32. This method has a disadvantage for sparse datasets, and is not orthogonal to anytime database reduction described in the below. From the results of the experiments in FIMI 03, bitmap method seems to be not good for sparse large datasets.</p><p>LCM algorithms use another method for the frequency counting, called occurrence deliver <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19]</ref>. Occurrence deliver computes the denotations of P ∪ {e} for e = tail(P ) + 1, ..., |I| at once by tracing transactions in T (P ). It use a bucket for each e to be added, and set them to empty set at the beginning. Then, for each transaction t ∈ T (P ), occurrence deliver inserts t to the bucket of e for each e ∈ t, e &gt; tail(P ). After these insertions, the bucket of e is equal to T (P ∪ {e}). 2. For each transaction t ∈ T (P ) do 3.</p><p>For each item e ∈ t, e &gt; tail(P ) do 4.</p><p>Insert t to Bucket[e] 5.</p><p>End for 6. End for 7. Output Bucket[e] for all e &gt; tail(P ) Let F (P ) be the set of items e such that e &gt; tail(P ) and P ∪ {e} is frequent. Apriori algorithms have possibility to find out in short time that P ∪ {e} is infrequent, thus, in the best case, their computation time can be reduced to O( e∈F (P ) |T (P ∪{e})|). If e&gt;tail(P ),e ∈F (P ) |T (P ∪ {e})| is large, occurrence deliver will be slow.</p><p>To decrease e&gt;tail(P ),e ∈F (P ) |T (P ∪ {e})|, LCM algorithms sort indices of items e in the increasing order of |T ({e})|. As we can see in Table <ref type="table" target="#tab_1">1</ref>, this sort reduces e&gt;tail(P ),e ∈F (P ) |T (P ∪ {e})| to 1/4 of e&gt;tail(P ) |T (P ∪ {e})| in many cases. Since apriori algorithms take much time to maintain previously obtained itemsets, the possibility of speeding up by apriori algorithms is not so large.</p><p>LCM algorithms further speeds up the frequency counting by iteratively reducing the database. Suppose that an iteration I of a backtracking algorithm receives a frequent itemset P from its parent. Then, in any descendant iteration of I, no item of indices smaller than tail(P ) is added. Hence, any such item can be removed from the database while the execution of the descendant iterations. Similarly, the transactions not including P never include the current solution of any descendant iteration, thus such transactions can be removed while the execution of the descendant iterations. Indeed, infrequent items can be removed, and the identical transactions can be merged.</p><p>According to this, LCM algorithms recursively reduce the database while the execution of recursive calls. Before the recursive call, LCM algorithms generate a reduced database according to the above discussion, and pass it to the recursive call. We call this technique anytime database reduction.</p><p>Anytime database reduction reduces the computation time of the iterations located at the lower levels of the recursion tree. In the recursion tree, many iterations are on the lower levels and few iterations are on the upper levels. Thus, anytime database reduction is expected to be efficient. In our experiments, anytime database reduction works quite well. The following table shows the efficiency of anytime database reduction. We sum up over all iterations the sizes of the database received from the parent, in both cases with anytime database reduction and without anytime database reduction. Each cell shows the sum. The reduction ratio is large especially if the dataset is dense and the minimum support is large.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Prefix Preserving Closure Extension</head><p>Many existing algorithms for mining closed itemsets are based on frequent itemset mining. That is, the algorithms enumerate frequent itemsets, and output those being closed. This approach is efficient when the number of frequent itemsets and the number of frequent closed itemsets differ not so much. However, if the difference between them is large, the algorithms generate many non-closed frequent itemsets, LCM uses prefix preserving closure extension (ppcextension in short) for generating closed itemsets <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19]</ref>. For a closed itemset P , we define the closure tail clo tail(P ) by the item i of the minimum index satisfying clo(P (i)) = P . clo tail(P ) is always included in P . We say that P is a ppc extension of P if P = clo(P ∪ {e}) and P (e − 1) = P (e − 1) hold for an item e &gt; clo tail(P ). Let P 0 be the itemset satisfying T (P ) = T . Any closed itemset P = P 0 is a ppc extension of another closed itemset P , and such P is unique for P . Moreover, the frequency of P is strictly larger than P , hence ppc extension induces a rooted tree on frequent closed itemsets. LCM starts from P 0 , and finds all frequent closed itemsets in a depth first manner by recursively generating ppc extensions. The proof of ppc extension algorithms are described in <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19]</ref>.</p><p>By ppc extension, the time complexity is bounded by a linear function in the number of frequent closed itemsets. Hence, the computation time of LCM never be super linear in the number of frequent closed itemsets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6">Closure Operation</head><p>To enumerate closed itemsets, we have to check whether the current solution P is a closed itemset or not. In the existing studies, there are two methods for this task. The first method is to store in memory previously obtained itemsets which are currently maximal among itemsets having the identical denotation. In this method, we find frequent itemsets one-by-one, and store them in memory with removing itemsets included in another itemset having the identical denotation. After finding all frequent itemsets, only closed itemsets remain in memory. We call this storage method. The second method is to gener-ate the closure of P . By adding to P all items e such that f rq(P ) = f rq(P ∪ {e}), we can construct the closure of P . We call the second closure operation.</p><p>LCM uses closure operations for generating ppc extensions. Similar to the frequency counting, we use database reduction for closure operation. Suppose that the current solution is P , the reduced database is composed of transactions S 1 , ..., S h , and each S l is obtained from transactions T l 1 , ..., T l k of the original database. For each S l , we define the interior intersection In(S l ) by T ∈{T l 1 ,...,T l k } T . Here the closure of P is equal to S∈{S1,...,S h } In(S). Thus, by using interior intersections, we can efficiently construct the closure of P .</p><p>When we merge transactions to reduce the database, interior intersections can be updated efficiently, by taking the intersection of their interior intersections. In the same way as the frequency counting, we can remove infrequent items from the interior intersections for more reduction. The computation time for the closure operation in LCM depends on the size of database, but not on the number of previously obtained itemsets. Thus, storage method has advantages if the number of frequent closed itemsets is small. However, for the instances with a lot of frequent closed itemsets, which take long time to be solved, LCM has an advantage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.7">Enumerating Maximal Frequent Itemsets</head><p>Many existing algorithms for maximal frequent itemset enumeration are based on the enumeration of frequent itemsets. In breadth-first manner or depthfirst manner, they enumerate frequent itemsets and output maximal itemsets among them. To reduce the computation time, the algorithms prune the unnecessary itemsets and recursive calls. Similar to these algorithms, LCMmax enumerates closed itemsets by backtracking, and outputs maximal itemsets among them. It uses a pruning to cut off unnecessary branches of the recursion. The pruning is based on a re-ordering of the indices of items, in each iteration. We explain the re-ordering in the following.</p><p>Let us consider a backtracking algorithm for enumerating frequent itemsets. Let P be the current solution of an iteration of the algorithm. Suppose that P is a maximal frequent itemset including P . LCMmax puts new indices to items with indices larger than tail(P ) so that any item in P has an index larger than any item not in P . Note that this reordering of indices has no effect to the correctness of the algorithm.</p><p>Let e &gt; tail(P ) be an item in P , and consider the recursive call with respect to P ∪ {e}. Any frequent itemset P found in the recursive call is included in P , since every item having an index larger than e is included in P , and the recursive call adds to P items only of indices larger than e. From this, we can see that by the re-ordering of indices, recursive calls with respect to items in P ∩ H generates no maximal frequent itemset other than P .</p><p>According to this, an iteration of LCMmax chooses an item e * ∈ H, and generates a recursive call with respect to P ∪ {e * } to obtain a maximal frequent itemset P . Then, re-orders the indices of items other than e * as the above, and generates recursive calls with respect to each e &gt; tail(P ) not included in P ∪ {e * }. In this way, we save the computation time for finding P , and by finding a large itemset, increase the efficiency of this approach. In the following, we describe LCMmax. H := H \ {e} 11.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ALGORITHM</head><p>LCMmax (P ∪ {e}, H ) 12. End for</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.8">Checking Maximality</head><p>When LCMmax finds a frequent itemset P , it checks the current solution is maximal or not. We call this operation maximality check. Maximality check is a heavy task, thus many existing algorithms avoid it. They store in memory maximal itemsets among previously obtained frequent itemsets, and update them when they find a new itemset. When the algorithms terminate and obtain all frequent itemsets, only maximal frequent itemsets remain in memory. We call this storage method. If the number of maximal frequent itemsets is small, storage method is efficient. However, if the number is large, storage method needs much memory. When a frequent itemset is newly found, storage method checks whether the itemset is included in some itemsets in the memory or not. If the number of frequent itemsets is large, the operation takes long time.</p><p>To avoid the disadvantage of storage method, LCMmax operates maximality check. LCMmax checks the maximality by finding an item e such that P ∪ {e} is frequent. If and only if such e exists, P is not maximal. To operate this efficiently, we reduce the database. Let us consider an iteration of LCMmax with respect to a frequent itemset P . LCM algorithms reduce the database by anytime database reduction for the frequency counting. Suppose that the reduced database is composed of transactions S 1 , ..., S h , and each S l is obtained by merging transactions T l 1 , ..., T l k of the original database. Let H be the set of items to be added in the iteration. Suppose that we remove all items e from H such that P ∪ {e} is infrequent. Then, for any l, T l</p><p>1 ∩ H = T l 2 ∩ H =, ..., = T l k ∩ H holds. For an item e and a transaction S l , we define the weight w(e, S l ) by the number of transactions in T l 1 , ..., T l k including e. Here the frequency of P ∪{e} is S∈{S1,...,S h } w(e, S). Thus, by using the weights, we can efficiently check the maximality, in linear time of the size of the reduced database.</p><p>When we merge transactions to reduce the database, the weights can be updated easily. For each item e, we take the sum of w(e, S) over all transactions S to be merged. In the same way as frequency counting, we can remove infrequent items from the database for maximality checking, for more reduction.</p><p>The computation time for maximality check in LCMmax depends on the size of database, but not on the number of previously obtained itemsets. Thus, storage method has advantages if the number of maximal frequent itemsets is small, but for the instances with a lot of maximal frequent itemsets, which take long time to be solved, LCMmax has an advantage. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Computational Experiments</head><p>In this section, we show the results of our computational experiments. We implemented our three algorithms LCM, LCMfreq, and LCMmax. They are coded by ANSI C, and complied by gcc. The experiments were executed on a notebook PC, with AMD athron XP 1600+ of 224MB memory. The performance of LCM algorithms are compared with the algorithms which marked good score on FIMI 03: fpgrowth <ref type="bibr" target="#b7">[8]</ref>, afopt <ref type="bibr" target="#b10">[11]</ref>, MAFIA <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>, kDCI <ref type="bibr" target="#b11">[12]</ref>, and PATRICIAMINE <ref type="bibr" target="#b15">[16]</ref>. We note that kDCI and PA-TRICIAMINE are only for all frequent itemset mining. To reduce the time for experiments, we stop the execution when an algorithm takes more than 10 minute. The following figures show the results. We do not plot if the computation time is over 10 minutes, or abnormal terminations. The results are displayed in Figure <ref type="figure" target="#fig_1">1 and 2</ref>. In each graph, the horizontal axis is the size of minimum supports, and the virtical axis is the CPU time written in a log scale.</p><p>From the performances of implementations, the instances were classified into three groups, in which the results are similar. Due to the space limitation, we show one instance as a representative for each group.</p><p>The first group is composed of BMS-WebView1, BMS-WebView2, BMS-POS, T10I4D100K, kosarak, and retail. These datasets have many items and transactions but are sparse. We call these datasets sparse datasets. We chosen BMS-WebView2 as the representative.</p><p>The second group is composed of datasets taken  from UCI-Machine Learning Repository 1 , connect, chess, mushrooms, pumsb, and pumsb-star. These datasets have many transactions but few items. We call these datasets middle density datasets. As a representative, we show the result of chess.</p><p>The third group is accidents. It is different from any other dataset. It has huge number of transactions, but few items. Transactions includes many items, so the dataset is very dense. We call this dataset very dense dataset.</p><p>In almost instances and minimum supports, LCM algorithms perform well. When the minimum support is large, LCM algorithms are the fastest for all instances, because of the fast initialization. For all instances with any minimum support, LCM outperforms other closed itemset mining algorithms. This shows the efficiency of ppc extension.</p><p>For sparse datasets, LCM algorithms are the fastest, for any minimum support. The efficiency of FP-tree is not large, and occurrence deliver works efficiently. The performances of afopt and fp-growth are quite similar for these problems. They are the second bests, and 2 to 10 times slower than LCM algorithms. For enumerating frequent closed itemsets, they take much time when the number of closed itemsets is large. Although PATRICIAMINE is fast as much as fp-groth and afopt, it abnormally terminated for some instances. kDCI is slow when the number of frequent itemsets is large. MAFIA was the slowest for these instances, for any minimum support.</p><p>For middle density datasets, LCM is the fastest for all instances on closed itemset mining. On all and maximal frequent itemset mining, LCMfreq and LCMmax are the fastest for large minimum supports, for any dataset. For small minimum supports, for half instances LCMfreq and LCMmax are the fastest. For the other instances, the results are case by case: each algorithm won in some cases.</p><p>For accidents, LCM algorithms are the fastest when the minimum support is large. For small supports, LCM(closed) is the fastest, however LCMfreq and LCMmax are slower than fp-growth For this dataset, the efficiency of FP-tree is large, and the compression ratio is up to 6. Bitmap is also efficient from the density. Hence, the computation time for the frequency counting is short in the execution of existing implementations. However, by ppc extension, LCM has an advantage for closed itemset mining. hence LCM(closed) is the fastest.</p><p>1 http://www.ics.uci.edu/ mlearn/MLRepository.html</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>In this paper, we proposed a fast implementation of LCM for enumerating frequent closed itemsets, which is based on prefix preserving closure extension. We further gave implementations LCMfreq and LCMmax for enumerating all frequent itemsets and maximal frequent itemsets by modifying LCM. We show by computational experiments that our implements of LCM, LCMfreq and LCMmax perform above the other algorithms for many datasets, especially for sparse datasets. There is a possibility of speeding up LCM algorithms by developing more efficient maximality checking algorithms, or developing a hybrid of array and FP-tree like data structures.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Results 2</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>For each transaction t, occurrence deliver takes O(|t ∩ {tail(P ) + 1, ..., |I|}|) time. Thus, the computation time is O( T ∈T (P ) |T ∩ {tail(P )+1, ..., |I|}|) =O(|T (P )|+ e&gt;tail(P ) |T (P ∪ {e})|). This time complexity is smaller than down project. We describe the pseudo code of occurrence deliver in the following. Efficiency test of FP-tree, hypercube decomposition, and apriori: the reduction factor of FP-tree is (sum of # of elements in reduced database by LCM) / ( sum of # of elements in reduced database by FP-tree), over all iterations, the reduction ratio of hypercube decomposition is the average number of output frequent itemsets in an iteration, and the reduction ratio of apriori is (sum of e&gt;tail(P ) |T (P ∪ {e})|) / (sum of e&gt;F (P ) |T (P ∪ {e})|), over all iterations.</figDesc><table><row><cell>ALGORITHM OccurrenceDeliver</cell></row><row><cell>(T:database, P :itemset)</cell></row><row><cell>1. Set Bucket[e] := ∅ for each item e &gt; tail(P )</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 :</head><label>2</label><figDesc>Accumulated number of transactions in database in all iterations</figDesc><table><row><cell>dataset and</cell><cell>connect</cell><cell cols="3">pumsb BMS-WebView2 T40I10D100K</cell></row><row><cell>minimum support</cell><cell>50%</cell><cell>60%</cell><cell>0.1%</cell><cell>0.03%</cell></row><row><cell>Database reduction</cell><cell cols="2">188319235 2125460007</cell><cell>2280260</cell><cell>1704927639</cell></row><row><cell>Anytime database reduction</cell><cell>538931</cell><cell>7777187</cell><cell>521576</cell><cell>77371534</cell></row><row><cell>Reduction factor</cell><cell>349.4</cell><cell>273.2</cell><cell>4.3</cell><cell>22.0</cell></row><row><cell cols="3">thus they will be not efficient. Many pruning meth-</cell><cell></cell><cell></cell></row><row><cell cols="3">ods have been developed for speeding up, however</cell><cell></cell><cell></cell></row><row><cell cols="3">they are not complete. Thus, the computation time</cell><cell></cell><cell></cell></row><row><cell cols="3">is not bounded by a linear function in the number of</cell><cell></cell><cell></cell></row><row><cell cols="3">frequent closed itemsets. There is a possibility of over</cell><cell></cell><cell></cell></row><row><cell cols="3">linear increase of computation time in the number of</cell><cell></cell><cell></cell></row><row><cell>output.</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head></head><label></label><figDesc>} 7. LCMmax (P ∪ {e}, H ) 8. P := frequent itemset of the maximum size found in the recursive call in 7 9. For each item e ∈ H \ P do 10.</figDesc><table><row><cell></cell><cell>LCMmax (P :itemset, H:items to</cell></row><row><cell></cell><cell>be added)</cell></row><row><cell cols="2">1. H := the set of items e in H s.t. P ∪ {e} is frequent</cell></row><row><cell cols="2">2. If H = ∅ then</cell></row><row><cell>3.</cell><cell>If P ∪ {e} is infrequent for any e then</cell></row><row><cell></cell><cell>output P ; return</cell></row><row><cell>4.</cell><cell>End if</cell></row><row><cell cols="2">5. End if</cell></row><row><cell cols="2">6. Choose an item e</cell></row></table><note>* ∈ H ; H := H \ {e *</note></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgment</head><p>This research is supported by joint-research funds of National Institute of Informatics.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Fast Algorithms for Mining Association Rules in Large Databases</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of VLDB &apos;94</title>
				<meeting>VLDB &apos;94</meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="487" to="499" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Fast Discovery of Association Rules</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mannila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Toivonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Verkamo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Knowledge Discovery and Data Mining</title>
				<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="307" to="328" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Efficiently Mining Long Patterns from Databases</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Bayardo</surname><genName>Jr</genName></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SIGMOD&apos;98</title>
				<meeting>SIGMOD&apos;98</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="85" to="93" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets</title>
		<author>
			<persName><forename type="first">E</forename><surname>Boros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Gurvich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Khachiyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Makino</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">STACS</title>
		<imprint>
			<biblScope unit="page" from="133" to="141" />
			<date type="published" when="2002">2002. 2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases</title>
		<author>
			<persName><forename type="first">D</forename><surname>Burdick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Calimlim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gehrke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. ICDE 2001</title>
				<meeting>ICDE 2001</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="443" to="452" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">MAFIA: A Performance Study of Mining Maximal Frequent Itemsets</title>
		<author>
			<persName><forename type="first">D</forename><surname>Burdick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Calimlim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Flannick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gehrke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Yiu</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/vol-90" />
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE ICDM&apos;03 Workshop FIMI&apos;03</title>
		<title level="s">Available as CEUR Workshop Proc. series</title>
		<meeting>IEEE ICDM&apos;03 Workshop FIMI&apos;03</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">90</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Goethals</surname></persName>
		</author>
		<ptr target="http://fimi.cs.helsinki.fi/" />
		<title level="m">the FIMI&apos;03 Homepage</title>
				<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Efficiently Using Prefix-trees in Mining Frequent Itemsets</title>
		<author>
			<persName><forename type="first">G</forename><surname>Grahne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhu</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/vol-90" />
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE ICDM&apos;03 Workshop FIMI&apos;03</title>
		<title level="s">Available as CEUR Workshop Proc. series</title>
		<meeting>IEEE ICDM&apos;03 Workshop FIMI&apos;03</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">90</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Mining Frequent Patterns without Candidate Generation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD Conference</title>
				<imprint>
			<date type="published" when="2000">2000. 2000</date>
			<biblScope unit="page" from="1" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">KDD-Cup 2000 Organizers&apos; Report: Peeling the Onion</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kohavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Brodley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Frasca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mason</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explorations</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="86" to="98" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">AFOPT: An Efficient Implementation of Pattern Growth Approach</title>
		<author>
			<persName><forename type="first">Guimei</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hongjun</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><forename type="middle">Xu</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wang</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiangye</forename><surname>Xiao</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/vol-90" />
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE ICDM&apos;03 Workshop FIMI&apos;03</title>
		<title level="s">Available as CEUR Workshop Proc. series</title>
		<meeting>IEEE ICDM&apos;03 Workshop FIMI&apos;03</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">90</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets</title>
		<author>
			<persName><forename type="first">S</forename><surname>Orlando</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lucchese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Palmerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Perego</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Silvestri</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/vol-90" />
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE ICDM&apos;03 Workshop FIMI&apos;03</title>
		<title level="s">Available as CEUR Workshop Proc. series</title>
		<meeting>IEEE ICDM&apos;03 Workshop FIMI&apos;03</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">90</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Efficient Mining of Association Rules Using Closed Itemset Lattices</title>
		<author>
			<persName><forename type="first">N</forename><surname>Pasquier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bastide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Taouil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Inform. Syst</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="25" to="46" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Discovering Frequent Closed Itemsets for Association Rules</title>
		<author>
			<persName><forename type="first">N</forename><surname>Pasquier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bastide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Taouil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. ICDT&apos;99</title>
				<meeting>ICDT&apos;99</meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="398" to="416" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery</title>
				<imprint>
			<date type="published" when="2000">2000. 2000</date>
			<biblScope unit="page" from="21" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Mining Frequent Itemsets using Patricia Tries</title>
		<author>
			<persName><forename type="first">A</forename><surname>Pietracaprina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zandolin</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/vol-90" />
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE ICDM&apos;03 Workshop FIMI&apos;03</title>
		<title level="s">Available as CEUR Workshop Proc. series</title>
		<meeting>IEEE ICDM&apos;03 Workshop FIMI&apos;03</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">90</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A New Algorithm for Generating All the Maximum Independent Sets</title>
		<author>
			<persName><forename type="first">S</forename><surname>Tsukiyama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ariyoshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Shirakawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM Journal on Computing</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="505" to="517" />
			<date type="published" when="1977">1977</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets</title>
		<author>
			<persName><forename type="first">T</forename><surname>Uno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Uchida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Arimura</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/vol-90" />
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE ICDM&apos;03 Workshop FIMI&apos;03</title>
		<title level="s">Available as CEUR Workshop Proc. series</title>
		<meeting>IEEE ICDM&apos;03 Workshop FIMI&apos;03</meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">90</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases</title>
		<author>
			<persName><forename type="first">T</forename><surname>Uno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Uchida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Arimura</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of Discovery Science</title>
				<meeting>of Discovery Science</meeting>
		<imprint>
			<date type="published" when="2004">2004. 2004</date>
		</imprint>
	</monogr>
	<note>to appear</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">CHARM: An Efficient Algorithm for Closed Itemset Mining</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Zaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hsiao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2nd SIAM International Conference on Data Mining (SDM&apos;02)</title>
				<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="457" to="473" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Real World Performance of Association Rule Algorithms</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kohavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mason</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">KDD</title>
		<imprint>
			<biblScope unit="page" from="401" to="406" />
			<date type="published" when="2000">2001. 2000</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
