<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Takeaki</forename><surname>Uno</surname></persName>
							<email>uno@nii.jp</email>
							<affiliation key="aff0">
								<orgName type="institution">National Institute of Informatics</orgName>
								<address>
									<addrLine>2-1-2 Hitotsubashi, Chiyoda-ku</addrLine>
									<postCode>101-8430</postCode>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tatsuya</forename><surname>Asai</surname></persName>
							<email>t-asai@i.kyushu-u.ac.jp</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Informatics</orgName>
								<orgName type="institution">Kyushu University</orgName>
								<address>
									<addrLine>6-10-1 Hakozaki</addrLine>
									<postCode>812-0053</postCode>
									<settlement>Fukuoka</settlement>
									<country key="JP">JAPAN</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yuzo</forename><surname>Uchida</surname></persName>
							<email>y-uchida@i.kyushu-u.ac.jp</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Informatics</orgName>
								<orgName type="institution">Kyushu University</orgName>
								<address>
									<addrLine>6-10-1 Hakozaki</addrLine>
									<postCode>812-0053</postCode>
									<settlement>Fukuoka</settlement>
									<country key="JP">JAPAN</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hiroki</forename><surname>Arimura</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Department of Informatics</orgName>
								<orgName type="institution">Kyushu University</orgName>
								<address>
									<addrLine>6-10-1 Hakozaki</addrLine>
									<postCode>812-0053</postCode>
									<settlement>Fukuoka</settlement>
									<country key="JP">JAPAN</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F28D04E6B9DB67DBF609390125EFD508</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:54+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we propose three algorithms LCMfreq, LCM, and LCMmax for mining all frequent sets, frequent closed item sets, and maximal frequent sets, respectively, from transaction databases. The main theoretical contribution is that we construct treeshaped transversal routes composed of only frequent closed item sets, which is induced by a parent-child relationship defined on frequent closed item sets. By traversing the route in a depth-first manner, LCM finds all frequent closed item sets in polynomial time per item set, without storing previously obtained closed item sets in memory. Moreover, we introduce several algorithmic techniques using the sparse and dense structures of input data. Algorithms for enumerating all frequent item sets and maximal frequent item sets are obtained from LCM as its variants. By computational experiments on real world and synthetic databases to compare their performance to the previous algorithms, we found that our algorithms are fast on large real world datasets with natural distributions such as KDD-cup2000 datasets, and many other synthetic databases.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Frequent item set mining is one of the fundamental problems in data mining and has many applications such as association rule mining <ref type="bibr" target="#b0">[1]</ref>, inductive databases <ref type="bibr" target="#b8">[9]</ref>, and query expansion <ref type="bibr" target="#b11">[12]</ref>.</p><p>Let E be the universe of items, consisting of items 1, ..., n. A subset X of E is called an item set. T is a set of transactions over E, i.e., each T ∈ T is composed of items of E. For an item set X, let T (X) = { t ∈ T | X ⊆ t } be the set of transactions including X. Each transaction of T (X) is called an occurrence of X. For a given constant α ≥ 0, an item set X is called frequent if |T (X)| ≥ α. If a frequent item set is included in no other frequent set, it is said to be maximal. For a transaction set S ⊆ T , let I(S) = T ∈S T . If an item set X satisfies I(T (X)) = X, then X is called a closed item set. We denote by F and C the sets of all frequent itemsets and all frequent closed item sets, respectively.</p><p>In this paper, we propose an efficient algorithm LCM for enumerating all frequent closed item sets. LCM is an abbreviation of Linear time Closed item set Miner. Existing algorithms for this task basically enumerate frequent item sets with cutting off unnecessary frequent item sets by pruning. However, the pruning is not complete, hence the algorithms operate unnecessary frequent item sets, and do something more. In LCM, we define a parent-child relationship between frequent closed item sets. The relationship induces tree-shaped transversal routes composed only of all the frequent closed item sets. Our algorithm traverses the routes, hence takes linear time of the number of frequent closed item sets. This algorithm is obtained from the algorithms for enumerating maximal bipartite cliques <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>, which is designed based on reverse search technique <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b15">16]</ref>.</p><p>In addition to the search tree technique for closed item sets, we use several techniques to speed-up the update of the occurrences of item sets. One technique is occurrence deliver, which simutaneously computes the occurrence sets of all the successors of the current item set during a single scan on the current occurrence set. The other is diffsets proposed in <ref type="bibr" target="#b17">[18]</ref>. Since there is a trade-off between these two methods that the former is fast for sparse data while the latter is fast for dense data, we developed the hybrid algorithm combining them. In some iterations, we make a decision based of the estimation of their computation time, hence our algorithm can use appropriate one for dense parts and sparse parts of the input.</p><p>We also consider the problems of enumerating all frequent sets, and maximal frequent sets, and derive two algorithms LCMfreq and LCMmax from LCM. LCMmax is obtained from LCM by adding the explicit check of maximality. LCMfreq is not merely a LCM without the check of closedness, but also achives substantial speed-up using closed itemset discovery techniques because it enumerates only the representatives of groups of frequent item sets, and generate other frequent item sets from the representatives.</p><p>From computer experiments on real and artificial datasets with the previous algorithms, we observed that our algorithms LCMfreq, LCM, and LCMmax significantly outperform the previous algorithms on real world datasets with natural distributions such as BMS-Web-View-1 and BMS-POS datasets in the KDD-CUP 2000 datasets as well as large synthesis datasets such as IBM T10K4D100K. The performance of our algorithms is similar to other algorithms for hard datasets such as Connect and Chess datasets from UCI-Machine Learning Repository, but less significant than MAFIA, however LCM works with small memory rather than other algorithms.</p><p>The organization of the paper is as follows. In Section 2, we explain our tree enumeration method for frequent closed item sets and our algorithm LCM. In Section 3, we describe several algorithmic techniques for speeding up and saving memory. Then, Section 4 and 5 give LCMmax and LCMfreq for maximal and all frequent item sets, respectively. Techniques for implementation is described in Section 6, and the results of computational experiments are reported in Section 7. Finally, we conclude in Section 8.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Enumerating Frequent Closed Item</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sets</head><p>In this section, we introduce a parent-child relationship between frequent closed item sets in C, and describe our algorithm LCM for enumeration them.</p><p>Recent efficient algorithms for frequent item sets, e.g., <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18]</ref>, use a tree-shaped search structure for F , called the set enumeration tree <ref type="bibr" target="#b3">[4]</ref> defined as follows. Let X = {x 1 , . . . , x n } be an itemset as an ordered sequence such that</p><formula xml:id="formula_0">x 1 &lt; • • • &lt; x n , where the tail of X is tail(X) = x n ∈ E. Let X, Y be item sets. For an index i, X(i) = X ∩ {1, . . . , i}. X is a E X parent of X T i(X)</formula><p>occurrences of the parent occurrences of X Figure <ref type="figure">1</ref>: An example of the parent of X: The parent of X is obtained by deleting items larger than i(X) (in the gray area) and take a closure.</p><formula xml:id="formula_1">prefix of Y if X = Y (i) holds for i = tail(X).</formula><p>Then, the parent-child relation P for the set enumeration tree for F is define as</p><formula xml:id="formula_2">X = P(Y ) iff Y = X ∪ {i} for some i &gt; tail(X), or equivalently, X = Y \{tail(Y )}.</formula><p>Then, the whole search space for F forms a prefix tree (or trie) with this edge relation P.</p><p>Now, we define the parent-child relation P for closed item sets in C as follows. For X ∈ C, we define the parent of X by P(X) = I(T (X(i(X) − 1))), where i(X) be the minimum item i such that</p><formula xml:id="formula_3">T (X) = T (X(i)) but T (X) = T (X(i − 1)). If Y is the parent of X, we say X is a child of Y . Let ⊥ = I(T (∅)</formula><p>) be the smallest item set in C called the root. For any X ∈ C \ {⊥}, its parent P(X) is always defined and belongs to C. An illustration is given in Fig. <ref type="figure">1</ref>.</p><p>For any X ∈ C and its parent Y , the proper inclusion Y ⊂ X holds since T (X(i(X) − 1)) ⊂ T (X). Thus, the relation P is acyclic, and its graph representation forms a tree. By traversing the tree in a depth-first manner, we can enumerate all the frequent closed item sets in linear time in the size of the tree, which is equal to the number of the frequent closed item sets in C. In addition, we need not store the tree in memory. Starting from the root ⊥, we find a child X of the root, and go to X. In the same way, we go to a child of X. When we arrive at a leaf of the tree, we backtrack, and find another child. Repeating this process, we eventually find all frequent closed item set in C.</p><p>To find the children of the current frequent closed item set, we use the following lemma. For an item set X and an index i, let X[i] = X ∪ H where H is the set of the items j ∈ I(T (X ∪ {i})) satisfying j ≥ i.</p><formula xml:id="formula_4">Lemma 1 X is a child of X ∈ C (X ∈ C and the parent of X is X) if and only if (cond1) X = X[i] for some i &gt; i(X), (cond2) X = I(T (X )) (X is a closed item set) (cond3) X is frequent Proof : Suppose that X = X[i]</formula><p>satisfies the conditions (cond1), (cond2) and (cond3). Then, X ∈ C. Since T (X(i − 1)) = T (X) and T (X(i − 1) ∪ {i}) = T (X ) holds thus i(X ) = i holds. Hence, X is a child of X. Suppose that X is a child of X. Then, (cond2) and (cond3) hold. From the definition of i(X ), T (X(i(X )) ∪{i(X )}) = T (X ) holds. Hence, X = X[i(X )] holds. We also have i(X ) &gt; i(X) since T (X (i(X ) − 1)) = T (X). Hence (cond1) holds.</p><formula xml:id="formula_5">Clearly, T (X[i]) = T (X ∪ {i}) holds if (cond2) I(T (X[i])) = X[i] holds. Note that no child X of X satisfies X[i] = X[i ], i = i , since the minimum item of X[i]\X and X[i ]</formula><p>\X are i and i , respectively. Using this lemma, we construct the following algorithm scheme for, given a closed itemset X, enumerating all descendants in the search tree for closed itemsets. The existing enumeration algorithm for frequent closed item sets are based on backtrack algorithm, which traverse a tree composed of all frequent item sets in F , and skip some item sets by pruning the tree. Since the pruning is not complete, however, these algorithms generate unnecessary frequent item sets. On the other hand, the algorithm in <ref type="bibr" target="#b9">[10]</ref> directly generates only closed item sets with the closure operation I(T (•)) as ours, but their method may generate duplicated closed item sets and needs expensive duplicate check.</p><formula xml:id="formula_6">Algorithm LCM (X : frequent closed item set) 1. output X 2. For each i &gt; i(X) do 3. If X[i] is frequent and X[i] = I(T (X[i])) then Call LCM( X[i]</formula><p>On the other hand, our algorithm traverses a tree composed only of frequent closed item sets, and each iteration is not as heavy as the previous algorithms. Hence, our algorithm runs fast in practice. If we consider our algorithm as a modification of usual backtracking algorithm, each iteration of our algorithm re-orders the items larger than i(X) such that the items not included in X follow the items included in X. Note that the parent X is not a prefix of X[i] in a recursive call. The check of (cond2) can be considered as a pruning of non-closed item sets. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>T(X)</head><formula xml:id="formula_7">••• jQ[|E|-2] jQ[|E|-1] jQ[|E|] A C D F G A B C F A B C F H A C D F G occurrences jQ[i] = T(X[i]) A B C D F G H Right first sweep • • •</formula><formula xml:id="formula_8">[i]. For each oc- currence T of X, occurrence deliver inserts T to J [i] such that i ∈ T. When the algorithm generates a re- cursive call respect to X[|E| − 2], the recursive calls respect to X[|E| − 1] and X[|E|] have been termi- nated, J [|E| − 1] and J [|E|] are cleared. The recur- sive call of X[|E|−2] uses only J [|E| − 1] and J [|E|],</formula><p>and hence the algorithm re-uses them in the recursive call.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Reducing Computation Time</head><p>The computation time of LCM described in the previous section is linear in |C|, with a factor depending on T (X) for each closed item set X ∈ C. However, this still takes long time if it is implemented in a straightforward way. In this section, we introduce some techniques based on sparse and dense structures of the input data.</p><p>Occurrence Deliver. First, We introduce the technique called the occurrence deliver for reducing the construction time for T (X[i]), which is needed to check (cond3). This technique is particularly efficient in the case that |T (X</p><formula xml:id="formula_9">[i])| is much smaller than |T (X)|. In a usual way, T (X[i]) is obtained from T (X) in O(|T (X)|) time by removing all transac- tions not including i based on the equiality T (X[i]) = T (X ∪ {i}) = T (X) ∩ T (</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>{i}) (this method is known as down-project). Thus, the total computation for all children takes |E| scans and O(|T (X)| • |E|) time.</head><p>Instead of this, we build for all i = i(X), . . . , |E| the occurrence lists</p><formula xml:id="formula_10">J [i] def = T (X[i]</formula><p>) simultaneously by scanning the transactions in T (X) at once as follows. We initialize J [i] = ∅ for all i = i(X), . . . , |E|. For each T ∈ T (X) and for each i ∈ T (i &gt; i(X)), we insert T to J [i]. See Fig. <ref type="figure" target="#fig_1">2</ref> for explanation, where we write jQ[i] for J [i]. This correctly computes J [i] for all i in the total time O(|T (X)|). Furthermore, we need not make recursive call of LCM for X[i] if T (X[i]) = ∅ (this is often called lookahead <ref type="bibr" target="#b3">[4]</ref>). In our experiments on BMS instances, the occurrence deliver reduces the computation time up to 1/10 in some cases.</p><p>Right-first sweep.</p><p>The occurrence deliver method needs eager computation of the occurrence sets J [i] = T (X[i]) for all children before expanding one of them. A simple implementation of it may require much memory than the ordinary lazy computation of T (X[i]) as in <ref type="bibr" target="#b16">[17]</ref>. However, we can reduce the memory usage using a method called the rightfirst sweep as follows.</p><p>Given a parent X, we make the recursive call for X[i] in the decreasing order for each i = |E|, . . . , i(X) (See Fig. <ref type="figure" target="#fig_1">2</ref>). At each call of X[i], we collect the memory allocated before for J [i + 1], . . . , J [|E|] and then re-use it for J [i]. After terminating the call for Diffsets. In the case that |T (X[i])| is nearly equal to |T (X)| we use the diffset technique proposed in <ref type="bibr" target="#b17">[18]</ref>. The diffset for index i is</p><formula xml:id="formula_11">X[i], the memory for J [i] is released for the future use in J [j] for j &lt; i. Since |J [i]| = |T (X[i])| ≤ |T ({i})|</formula><formula xml:id="formula_12">DJ [i] = T (X)\T (X[i]), where T (X[i]) = T (X ∪ {i}). Then, the frequency of X[i] is obtained by |T (X[i])| = |T (X)| − |DJ [i]|.</formula><p>When we generate a recursive call respect to X[i], we update DJ [j], j &gt; i by setting DJ [j] to be DJ</p><formula xml:id="formula_13">[j] \ DJ [i] in time O( i&gt;i(X),X[i]∈F ((|T (X)|−|T (X[i])|).</formula><p>Diffsets are needed for only i such that X[i] is frequent. By diffsets, the computation time for instances such as connect, chess, pumsb are reduced to 1/100, where |T (X[i])| is as large as |T (X)|.</p><p>Hybrid Computation. As we saw in the preceding subsections, our occurrence deliver is fast when |T (X[i])| is much smaller than |T (X)| while the diffset of <ref type="bibr" target="#b17">[18]</ref> is fast when |T (X[i])| is nearly close to |T (X)|. Therefore, our LCM dinamically decides which of occurrence deliver and diffsets we will use. To do this, we compare two quantities on X:</p><formula xml:id="formula_14">A(X) = i |T (X ∪ {i})| and B(X) = i:X∪{i}∈F (|T (X)| − |T (X ∪ {i})|).</formula><p>For some fixed constant α &gt; 1, we decide to use the occurrence deliver if A(X) &lt; αB(X) and the diffset otherwise. We make this decision only at the child iterations of the root set ⊥ since this decision takes much time. Empirically, restricting the range i ∈ {1, . . . , |E|} of the the index i in A(X) and B(X) to i ∈ {i(X) + 1, . . . , |E|} results significant speedup. By experiments on BMS instances, we observe that the hybrid technique reduces the computation time up to 1/3. The hybrid technique is also useful in reducing the memory space in diffset as follows. Although the memory B(X) used by diffsets is not bounded by the input size ||T || in the worst case, it is ensured in hybrid that B(X) does not exceed A(X) ≤ ||T || because the diffset is chosen only when A(X) ≥ αB(X).</p><p>Checking the closedness in occrrence deliver. Another key is to efficiently check the closedness</p><formula xml:id="formula_15">X[i] = I(T (X[i])) (cond 2</formula><p>). The straightforward computation of the closure I(T (X[i])) takes much time since it requires the access to the whole sets T (X[j]), j &lt; i and i is usually as large as |E|.</p><p>By definition, (cond 2) is violated iff there exists some j ∈ {1, . . . , i − 1} such that j ∈ T for every T ∈ T (X ∪ {i}). We first choose a transaction T * (∪{i}) ∈ T (X ∪ {i}) of minimum size, and tests if j ∈ T for increasing j ∈ T * (∪{i}). This results O( j∈T * (X∪{i}) m(X[i], j)) time algorithm, where m(X , j) is the maximum index m such that all of the first m transactions of T (X ) include j, which is much faster than the straightforward algorithm with O( j&lt;i |T (X ∪ {i} ∪ {j}])|) time.</p><p>In fact, the efficient check requires the adjacency matrix (sometime called bitmap) representing the inclusion relationship between items and transactions. However, the adjacency matrix requires O(|T | × |E|) memory, which is quite hard to store for large instances. Hence, we make columns of adjacency matrix for only transactions of size larger than ( T ∈T |T |)/δ. Here δ is a constant. This uses at most O(δ × T ∈T |T |), linear in the input size.</p><p>Checking the closedness in diffsets. In the case that |T (X[i])| is nearly equal to |T (X)|, the above check is not done in short time. In this case, we keep diffset DJ [j] for all j &lt; i, i ∈ X such that X[i] is frequent. To maintain DJ for all i is a heavy task, thus we discard unnecessary DJ 's as follows. If T (X ∪ {j}) includes an item included in no T (X[i ]), i &gt; i(X), then for any descendant X of X, j ∈ I(T (X [j ])) for any j &gt; i(X ). Hence, we no longer have to keep DJ [j] for such j. Let N C(X) be the set of items j such that X[j] is frequent and any item of T (X) \ T (X ∪ {j}) is included in some T (X[j ]), j &gt; i(X). Then, the computation time for checking (cond2) is written as O( j∈NC(X),j&lt;i |T (X) \ T (X ∪ {j})|). By checking (cond2) in these ways, the computation time for checking (cond2) is reduced from 1/10 to 1/100. Detailed Algorithm. We present below the description of the algorithm LCM, which recursively computes (X, T (X), i(X)), simultaneously. For each j ∈ T, j &gt; i(X), insert t to J [j] 4. For each j, J [j] = ∅ in the decreasing order 5.</p><formula xml:id="formula_16">If |J [j]| ≥ α and (cond2) holds then LCM Iter( T (J [j]), J [j], j ) 6. Delete J [j] 7. End for LCM Iter2( X, T (X), i(X), DJ ) /* diffset */ 1. output X 2. For each i, X[i] is frequent 3. If X[i] satisfies (cond2) then 4. For each j, X[i] ∪ {j} is frequent, DJ [j] := DJ [j] \ DJ [i] 5. LCM Iter2( T (J [j]), J [j], j, DJ ) 6. End if 7. End for Theorem 2 Algorithm LCM enumerates all fre- quent closed item sets in O( j&gt;i(X) |T (X[j])| + j&gt;i(X),X[j]∈F j ∈T * (X) m(X[j], j )) time, or O( i&gt;i(X),X[i]∈F ((|T (X)| − |T (X[i])|) + j∈NC(X),j&lt;i |T (X) \ T (X ∪ {j})|)</formula><p>) time for each frequent closed item set X, with memory linear to the input size.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Enumerating Maximal Frequent</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sets</head><p>In this section, we explain an enumeration algorithm of maximal frequent sets with the use of frequent closed item set enumeration. The main idea is very simple. Since any maximal frequent item set is a frequent closed item set, we enumerate frequent closed item sets and output only those being maximal frequent sets. For a frequent closed item set X, X is a maximal frequent set if and only if X ∪ {i} is infrequent for any i ∈ X. By adding this check to LCM, we obtain LCMmax. This modification does not increase the memory  complexity but increase the computation time. In the case of occurrence deliver, we generate T (X ∪{j}) for all j in the same way as the occurrence deliver, and check the maximality. This takes O( j&lt;i(X) |T (X ∪ {j}|) time. In the case of difference update, we do not discard diffsets unnecessary for closed item set enumeration. We keep diffsets DJ for all j such that X ∪ {j} is frequent. To update and maintain this, we spend O( j,X∪{j}∈F |T (X) \ T (X ∪ {j})|) time. Note that we are not in need of check the maximality if X has a child.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Theorem 3 Algorithm LCMmax enumerates all maximal frequent item sets in O( i |T (X ∪ {i})|) time, or O( i,X∪{i}∈F ((|T (X)|−|T (X ∪{i})|)) time</head><p>for each frequent closed item set X, with memory linear in the input size.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Enumerating Frequent Sets</head><p>In this section, we describe an enumeration algorithm for frequent item sets. The key idea of our algorithm is that we classify the frequent item sets into groups and enumerate the representative of each group. Each group is composed of frequent item sets included in the class of a closed item set. This idea is based on the following lemma.</p><p>Lemma 2 Suppose that frequent item sets X and S ⊃ X satisfy T (X) = T (S). Then, for any item set X including X, T (X ) = T (X ∪ S).</p><p>Particularly, T (X ) = T (R) holds for any X ⊆ R ⊆ X ∪S, hence all R are included in the same class of a closed item set. Hence, any frequent item set X is generated from X \ (S \ X). We call X \ (S \ X) representative.</p><p>Let us consider a backtracking algorithm finding frequent item sets which adds items one by one in lexicographical order. Suppose that we currently have a frequent item set X, and find another frequent item set X ∪ {i}. Let S = X[i]. Then, according to the above lemma, we can observe that for any frequent item set X including X and not intersecting S \ X, any item set including X and included in X ∪ S is also frequent. Conversely, any frequent item set including X is generated from X not intersecting S\X. Hence, we enumerate only representatives including X and not intersecting S \ X, and generate other frequent item sets by adding each subset of S \ X. This method can be considered that we "decompose" classes of closed item sets into several sublattices (hypercubes) each of whose maximal and minimal elements are S and X , respectively (see Fig. <ref type="figure" target="#fig_3">3</ref>). We call this technique hypercube decomposition.</p><p>Suppose that we are currently operating a representative X including X, and going to generate a recursive call respect to</p><formula xml:id="formula_17">X ∪ {j}. Then, if (X [i] \ X ) \ S = ∅, X and S ∪ (X [i] \ X ) satisfies the condition of Lemma 2. Hence, we add X [i] \ X to S.</formula><p>We describe LCMfreq as follows.</p><p>Algorithm LCMfreq ( X : representative,</p><formula xml:id="formula_18">S : item set, i : item ) 1. Output all item sets R, X ⊆ R ⊆ X ∪ S 2. For each j &gt; i, j ∈ X ∪ S 3. If X ∪ {j} is frequent then Call LCMfreq ( X ∪ {j}, S ∪ (X[j] \ (X ∪ {j})), j) 4.</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>End for</head><p>For some synthetic instances such that frequent closed item sets are fewer than frequent item sets, the average size of S is up to 5. In these cases, the algorithm finds 2 |S| = 32 frequent item sets at once, hence the computation time is reduced much by the improvement.</p><p>To check the frequency of all X ∪ {j}, we can use occurrence deliver and diffsets used for LCM. LCMfreq does not require the check of (cond2), hence The computation time of each iteration is O( j&gt;i(X) |T (X[j])|) time for occurrence deliver, and O( j&gt;i(X),X[j]∈F |T (X) \ T (X[j])|) for diffsets.</p><p>Since the computation time change, we use another estimators for hybrid. In almost all cases, if once j&gt;i(X),X[j]∈F |T (X) \ T (X[j])| becomes smaller than j&gt;i(X) |T (X[j])|, the condition holds in any iteration generated by a recursive call. Hence, the algorithm first starts with occurrence deliver, and compares them in each iteration. If j&gt;i(X),X[j]∈F |T (X) \ T (X[j])| becomes smaller, then we change to diffsets. Note that these estimators can computed in short time by using the result of occurrence deliver.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Theorem 4 LCMfreq enumerates all frequent sets of</head><formula xml:id="formula_19">F in O( j&gt;i(X) |T (X[j])|) time or O( j&gt;i(X),X[j]∈F |T (X) \ T (X[j])|) time for each frequent set X, within O( T ∈T |T |) space.</formula><p>Particularly, LCMfreq requires one integer for each item of any transaction, which is required to store the input data. Other memory LCMfreq uses is bounded by O(|T | + |E|).</p><p>Experimentally, an iteration of LCMfreq inputting frequent set X takes O(|T (X)| + |X|) or O((size of diffset) + |X|) steps in average. In some sense, this is optimal since we have to take O(|X|) time to output, and O(|T (X)|) time or O((size of diffset)) time to check the frequency of X.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Implementation</head><p>In this section, we explain our implementation. First, we explain the data structure of our algorithm. A transaction T of input data is stored by an array with length |T |. Each cell of the array stores the index of an item of T. For example, t = {4, 2, 7} is stored in an array with 3 cells, <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b6">7]</ref>. We sort the elements of the array so that we can take {i, ..., |E|} ∩ T in linear time of {i, ..., |E|}∩T. J is also stored in arrays in the same way. We are not in need of doubly linked lists or binary trees, which take much time to be operated.</p><p>To reduce the practical computation time, we sort the transactions by their sizes, and items by the number of transactions including them. Experimentally, this reduces j&gt;i(X) |T (X ∪{j})|. In some cases, the computation time has been reduced by a factor of 1/3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Computational Experiments</head><p>To examine the practical efficiency of our algorithms, we run the experiments on the real and synthetic datasets, which are made available on FIMI'03 site. In the following, we will report the results of the experiments. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1">Datasets and Methods</head><p>We implemented our algorithms our three algorithms LCMfreq (LCMfreq), LCM (LCM), LCMmax (LCMmax) in C and compiled with gcc3.2. The algorithms were tested on the datasets shown in Table <ref type="table" target="#tab_2">1</ref>. available from the FIMI'03 homepage 1 , which include: T10I4D100K, T40I10D100K from IBM Almaden Quest research group; chess, connect, mushroom, pumsb, pumsb star from UCI ML repository 2 and PUMSB; BMS-WebView-1, BMS-WebView-2, BMS-POS from KDD-CUP 2000 3 .</p><p>We compare our algorithms LCMfreq, LCM, LCMmax with the following frequent item set mining algorithms: Implementations of Fp-growth <ref type="bibr" target="#b6">[7]</ref>, Eclat <ref type="bibr" target="#b16">[17]</ref>, Apriori <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref> by Bart Goethals 4 ; We also compare the LCM algorithms with the implementation of Mafia <ref type="bibr" target="#b5">[6]</ref>, a fast maximal frequent pattern miner, by University of Cornell's Database group 5 . This versions of mafia with frequent item sets, frequent closed item sets, and maximal frequent item sets options are denoted by mafia-fi, mafia-fci, mafia-mfi, respectively. Although we have also planned to make the performance comparison with Charm, the state-ofthe-art frequent closed item set miner, we gave up the comparison in this time due to the time constraint.</p><p>All experiments were run on a PC with the configuration of Pen4 2.8GHz, 1GB memory, and RPM 7200 hard disk of 180GB. In the experiments, LCMfreq, LCM and LCMmax use at most 123MB, 300MB, and 300MB of memory, resp. Note that LCM and LCMmax can save the memory use by decreasing δ. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">Results</head><p>Figures 6 through Figure <ref type="figure" target="#fig_2">12</ref> show the running time with varying minimum supports for the seven algorithms, namely LCMfreq, LCM, LCMmax, FPgrowth, eclat, apriori, mafia-mfi on the nine datasets described in the previous subsection. In the following, we call all, maximal, closed frequent item set mining simply by all, maximal, closed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results on Synthetic Data</head><p>Figure <ref type="figure">4</ref> shows the running time with minimum support ranging from 0.15% to 0.025% on IBM-Artificial T10I4D100K datasets. From this plot, we see that most algorithms run within around a few 10 minutes and the behaviors are quite similar when minimum support increases. In Figure <ref type="figure">4</ref>, All of LCMmax, LCM, and LCMfreq are twice faster than FP-growth on IBM T10I4D100K dataset. On the other hand, Mafia-mfi, Mafia-fci, and Mafia-fi are slower than every other algorithms. In Figure <ref type="figure">5</ref>, Mafia-mfi is fastest for maximal, and LCMfreq is fastest for all, for minimum support less than 1% on IBM T10I4D100K dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results on KDD-CUP datasets</head><p>Figures 6 through Figure <ref type="figure">8</ref> show the running time with range of minimum supports from ranging from 0.1% to 0.01% on three real world datasets BMS-WebView-1, BMS-WebView-2, BMS-POS datasets.</p><p>In the figure, we can observe that LCM algorithms outperform others in almost cases, especially for lower minimum support. In particular, LCM was best among seven algorithms in a wide range of minimum support from 0.1% to 0.01% on all datasets. For higher minimum support ranging from 0.1% to 0.06%, the performances of all algorithms are similar, and LCM families have slightly better performance. For lower minimum support ranging from 0.04% to 0.01%, Eclat and Apriori are much slower than every other algorithms. LCM outperforms others. Some frequent item set miners such as Mafia-fi, and Mafiafci runs out of 1GB of main memory for these minimum supports on BMS-WebView-1, BMS-WebView-2, BMS-POS datasets. LCMfreq works quite well for higher minimum support, but takes more than 30 minutes for minimum support above 0.04% on BMS-Web-View-1. In these cases, the number of frequent item sets is quite large, over 100,000,000,000. Interestingly, Mafia-mfi's performance is stable in a wide range of minimum support from 0.1% to 0.01%.</p><p>In summary, LCM family algorithms significantly perform well on real world datasets BMS-WebView-1, BMS-WebView-2, BMS-POS datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results on UCI-ML repository and PUMSB datasets</head><p>Figures 9 through Figure <ref type="figure" target="#fig_2">12</ref> show the running time on middle sized data sets pumsb and pumsb*, kosarak and small sized datasets connect, chess, and mushroom. These datasets taken from machine learning domains are small but hard datasets for frequent pattern mining task since they have many frequent patterns even with high minimum supports, e.g., from 50% to 90%. These datasets are originally build for classification task and have slightly different characteristics than large business datasets such In these figures, we see that Mafia-mfi constantly outperforms every other maximal frequent item sets mining algorithms for wide range of minimum supports except on pumsb*. On the other hand, Apriori is much slower than other algorithm. On the mining of all frequent item sets, LCMfreq is faster than the others algorithms. On the mining of frequent closed item sets, there seems to be no consistent tendency on the performance results. However, LCM does not store the obtained solutions in the memory, while the other algorithms do. Thus, in the sense of memorysaving, LCM has an advantage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">Conclusion</head><p>In this paper, we present an efficient algorithm LCM for mining frequent closed item sets based on parentchild relationship defined on frequent closed item sets. This technique is taken from the algorithms for enumerating maximal bipartite cliques <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref> based on reverse search <ref type="bibr" target="#b2">[3]</ref>. In theory, we demonstrate that LCM exactly enumerates the set of frequent closed item sets within polynomial time per closed item set in the total input size. In practice, we show by experiments that our algorithms run fast on several real world datasets such as BMS-WebView-1. We also showed variants LCMfreq and LCMmax of LCM for computing maximal and all frequent item sets. LCMfreq uses new schemes hybrid and hypercube decomposition, and the schemes work well for many problems. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>) 4 .Theorem 1</head><label>41</label><figDesc>End for Let 0 &lt; σ &lt; 1 be a minimum support. Algorithm LCM enumerates, given the root closed item set ⊥ = I(T (∅)), all frequent closed item sets in linear time in the number of frequent closed item sets in C.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Occurrence deliver and right first sweep: In the figure, J [i] is written as JQ[i]. For each occurrence T of X, occurrence deliver inserts T to J [i] such that i ∈ T. When the algorithm generates a recursive call respect to X[|E| − 2], the recursive calls respect to X[|E| − 1] and X[|E|] have been terminated, J [|E| − 1] and J [|E|] are cleared. The recursive call of X[|E|−2] uses only J [|E| − 1] and J [|E|], and hence the algorithm re-uses them in the recursive call.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>2 .</head><label>2</label><figDesc>global: J , DJ /* Global sets of lists */ Algorithm LCM() 1. X := I(T (∅)) /* The root ⊥ */ 2. Fori := 1 to |E| 3. If X[i] satisfies (cond2) and (cond3) then Call LCM Iter( X[i], T (X[i]), i ) or Call LCMd Iter2( X[i], T (X[i]), i, DJ )based on the decision criteria 4. End for LCM Iter( X, T (X), i(X) ) /* occurrence deliver */ 1. output X For each T ∈ T (X)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Hypercube decomposition: LCMfreq decomposes a closed item set class into several sublattices (gray rectangles).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :Figure 5 :Figure 6 :Figure 7 :Figure 8 :</head><label>45678</label><figDesc>Figure 4: Running time of the algorithms on IBM-Artificial T10I40D100K</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 9 :Figure 10 :Figure 11 :Figure 12 :</head><label>9101112</label><figDesc>Figure 9: Running time of the algorithms on pumsb</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 13 :Figure 14 :</head><label>1314</label><figDesc>Figure 13: Running time of the algorithms on connect</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>for any i and X, the total memory i |J [i]| is bounded by the input size ||T || = T ∈T |T |, and thus, it is sufficient to allocate the memory for J at once as a global variable.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1 :</head><label>1</label><figDesc>The datasets. AvTrSz means the average transaction size</figDesc><table><row><cell cols="4">Dataset #items #Trans AvTrSz</cell><cell>#FI</cell><cell>#FCI</cell><cell>#MFI</cell><cell>Minsup (%)</cell></row><row><cell>BMS-Web-View1</cell><cell cols="2">497 59,602</cell><cell cols="2">2.51 3.9K-NA</cell><cell cols="3">3.9K-1241K 2.1K-129.4K 0.1-0.01</cell></row><row><cell cols="3">BMS-Web-View2 3,340 77,512</cell><cell cols="2">4.62 24K-9897K</cell><cell>23K-755K</cell><cell>3.9K-118K</cell><cell>0.1-0.01</cell></row><row><cell cols="3">BMS-POS 1,657 517,255</cell><cell cols="4">6.5 122K-33400K 122K-21885K 30K-4280K</cell><cell>0.1-0.01</cell></row><row><cell cols="3">T10I4D100K 1,000 100,000</cell><cell cols="2">10.0 15K-335K</cell><cell>14K-229K</cell><cell cols="2">7.9K-114K 0.15-0.025</cell></row><row><cell cols="3">T40I10D100K 1,000 100,000</cell><cell>39.6</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>2-0.5</cell></row><row><cell cols="3">pumsb 7,117 49,046</cell><cell>74.0</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>95-60</cell></row><row><cell cols="3">pumsb star 7,117 49,046</cell><cell>50.0</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>50-10</cell></row><row><cell>mushroom</cell><cell cols="2">120 8,124</cell><cell>23.0</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>20-0.1</cell></row><row><cell>connect</cell><cell cols="2">130 67,577</cell><cell>43.0</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>95-40</cell></row><row><cell>chess</cell><cell>76</cell><cell>3196</cell><cell>37.0</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>90-30</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head></head><label></label><figDesc>1 http://fimi.cs.helsinki.fi/testdata.html 2 http://www.ics.uci.edu/ mlearn/MLRepository.html 3 http://www.ecn.purdue.edu/KDDCUP/ 4 http://www.cs.helsinki.fi/u/goethals/software/ 5 University of Cornell Database group, Himalaya Data Mining Tools, http://himalaya-tools.sourceforge.net/</figDesc><table /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgement</head><p>We gratefully thank to Prof. Ken Satoh of National Institute of Informatics. This research had been supported by group research fund of National Institute of Informatics, JAPAN.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Fast Algorithms for Mining Association Rules in Large Databases</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. VLDB &apos;94</title>
				<meeting>VLDB &apos;94</meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
			<biblScope unit="page" from="487" to="499" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Fast Discovery of Associa</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mannila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Toivonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Verkamo</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Reverse Search for Enumeration</title>
		<author>
			<persName><forename type="first">D</forename><surname>Avis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Fukuda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Discrete Applied Mathematics</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="page" from="21" to="46" />
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Efficiently Mining Long Patterns from Databases</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Bayardo</surname><genName>Jr</genName></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SIGMOD&apos;98</title>
				<meeting>SIGMOD&apos;98</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="85" to="93" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets</title>
		<author>
			<persName><forename type="first">E</forename><surname>Boros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Gurvich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Khachiyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Makino</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. STACS 2002</title>
				<meeting>STACS 2002</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="133" to="141" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases</title>
		<author>
			<persName><forename type="first">D</forename><surname>Burdick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Calimlim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gehrke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. ICDE 2001</title>
				<meeting>ICDE 2001</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="443" to="452" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Mining Frequent Patterns without Candidate Generation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SIGMOD&apos;00</title>
				<meeting>SIGMOD&apos;00</meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="1" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">KDD-Cup 2000 Organizers&apos; Report: Peeling the Onion</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kohavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Brodley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Frasca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mason</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGKDD Explorations</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="86" to="98" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Multiple Uses of Frequent Sets and Condensed Representations</title>
		<author>
			<persName><forename type="first">H</forename><surname>Mannila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Toivonen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. KDD&apos;96</title>
				<meeting>KDD&apos;96</meeting>
		<imprint>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="189" to="194" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Discovering frequent closed itemsets for association rules</title>
		<author>
			<persName><forename type="first">N</forename><surname>Pasquier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bastide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Taouil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. ICDT&apos;99</title>
				<meeting>ICDT&apos;99</meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="398" to="416" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery</title>
				<imprint>
			<date type="published" when="2000">2000. 2000</date>
			<biblScope unit="page" from="21" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Set-based model: a new approach for information retrieval</title>
		<author>
			<persName><forename type="first">B</forename><surname>Possas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ziviani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Meira</surname><genName>Jr</genName></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">A</forename><surname>Ribeiro-Neto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SIGIR&apos;02</title>
				<meeting>SIGIR&apos;02</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="230" to="237" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A New Algorithm for Generating All the Maximum Independent Sets</title>
		<author>
			<persName><forename type="first">S</forename><surname>Tsukiyama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ariyoshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Shirakawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM Journal on Computing</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="505" to="517" />
			<date type="published" when="1977">1977</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A Practical Fast Algorithm for Enumerating Cliques in Huge Bipartite Graphs and Its Implementation</title>
		<author>
			<persName><forename type="first">Takeaki</forename><surname>Uno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">89th Special Interest Group of Algorithms</title>
				<meeting><address><addrLine>Japan</addrLine></address></meeting>
		<imprint>
			<publisher>Information Processing Society</publisher>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Fast Algorithms for Enumerating Cliques in Huge Graphs</title>
		<author>
			<persName><forename type="first">Takeaki</forename><surname>Uno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Research Group of Computation</title>
		<imprint>
			<biblScope unit="page" from="55" to="62" />
			<date type="published" when="2003">2003</date>
		</imprint>
		<respStmt>
			<orgName>IEICE, Kyoto University</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A New Approach for Speeding Up Enumeration Algorithms</title>
		<author>
			<persName><forename type="first">Takeaki</forename><surname>Uno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. ISAAC&apos;98</title>
				<meeting>ISAAC&apos;98</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="287" to="296" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Scalable algorithms for association mining</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Zaki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="372" to="390" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">CHARM: An Efficient Algorithm for Closed Itemset Mining</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Zaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hsiao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SDM&apos;02, SIAM</title>
				<meeting>SDM&apos;02, SIAM</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="457" to="473" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Real World Performance of Association Rule Algorithms</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kohavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mason</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. SIGKDD-01</title>
				<meeting>SIGKDD-01</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="401" to="406" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
