ABS: Adaptive Borders Search of frequent itemsets Frédéric Flouvat1, Fabien De Marchi2 , Jean-Marc Petit1 1 Laboratoire LIMOS, UMR CNRS 6158 Université Blaise Pascal - Clermont-Ferrand II, 24 avenue des Landais, 63 177 Aubière cedex, France flouvat@isima.fr, jmpetit@math.univ-bpclermont.fr 2 Laboratoire LIRIS, FRE CNRS 2672 Université Claude Bernard Lyon 1 8, boulevard Niels Bohr, 69 622 Villeurbanne cedex France fabien.demarchi@liris.cnrs.fr Abstract inclusion dependencies in databases [8]. This algorithm is called ABS for Adaptive Borders Search, and is in the spirit In this paper, we present an ongoing work to discover of algorithms based on the concept of dualization [14, 21]. maximal frequent itemsets in a transactional database. We The basic idea of our proposition is to combine the propose an algorithm called ABS for Adaptive Borders strength of both levelwise algorithm [1, 18] and Dualize and Search, which is in the spirit of algorithms based on the Advance algorithm [14] in such a way that: concept of dualization. From an abstract point of view, our • "small" maximal frequent itemsets are efficiently gen- contribution can be seen as an improvement of the basic erated with levelwise strategies. APRIORI algorithm for mining maximal frequent itemsets. The key point is to decide dynamically at which iteration, if • "large" maximal frequent itemsets may be found effi- any, the dualization has to be made to avoid the enumera- ciently by dualization. tion of all subsets of large maximal itemsets. Once the first dualization has been done from the current negative border, The dualization performed is quite similar to that pro- APRIORI is no longer used and instead, another dualiza- posed in the Dualize and Advance algorithm. Nevertheless, tion is carried out from the positive border known so far. instead of starting from some subset of maximal frequent The process is repeated until no change occurs anymore in itemsets as Dualize and Advance algorithm does, we use the positive border in construction. infrequent itemsets to perform the dualization. As a conse- Experiments have been done on FIMI datasets from quence, we obtain the so-called optimistic positive border of which tradeoffs on adaptive behavior have been proposed maximal frequent itemsets. The set of such candidates cor- to guess the best iteration for the first dualization. Far responds exactly to k-uniform hypergraph clique proposed from being the best implementation wrt FIMI’03 contribu- in [22]. As a consequence, our proposition contributes to tions, performance evaluations of ABS exhibit better per- clarify some related contributions [22, 17, 3, 12, 7]) since formance than IBE, the only public implementation based it gives an exact characterization of the optimistic positive on the concept of dualization. border of maximal frequent itemsets from some subset of infrequent itemsets. From an abstract point of view, our contribution can be seen as an improvement of the basic APRIORI algorithm 1 Introduction for mining maximal frequent itemsets. The key point is to decide dynamically at which iteration, if any, the dualiza- In this paper, we present an ongoing work to discover tion has to be made to avoid the enumeration of all subsets maximal frequent itemsets in a transactional database. We of large maximal itemsets. Once the first dualization has propose to adapt an algorithm originally devised for mining been done from the current negative border available at that iteration, APRIORI is no longer used and instead, another be still competitive for maximal frequent itemsets mining dualization is carried out from the positive border known so in many cases [11], when the size of elements to discover far. The process is repeated until no change occurs anymore remain small. in the positive border in construction. Our goal is to exploit the efficiency of Apriori, but to Experiments have been done on FIMI datasets [10]. The automatically detect when it will fall into troubles and stop adaptive behavior of our algorithm has been tuned from its execution. Then we propose to exploit the knowledge results gathered from these experiments. For the tested mined so far to initialize a different search, based on the dataset, we were able to guess dynamically the best iteration concept of dualization between positive and negative bor- for the first dualization, a key parameter of our algorithm. ders; each border is updated and used to compute the corre- Far from being the best implementation wrt FIMI’03 sponding dual border, until a fix point is reached. contributions [11], performance evaluations of ABS exhibit better performance than IBE [21], the only public imple- 3.2 From negative to positive border mentation based on the concept of dualization. In the sequel, let r be a binary database over a set of 2 Preliminaries items R, minsup a minimum support, and F I the set of frequent itemsets in r. After the levelwise part, our method Let R be a set of symbols called items; a line is a subset is still iterative; at each iteration i, new elements of the posi- of R, and a binary relation r over R is a multiset of lines. tive and negative borders are expected to be discovered. We We suppose the reader is familiar with the notions of item- denote by Bd+ − + i (resp. Bdi ) the subset of Bd (F I) (resp. − th sets, support, and with the main aspects of frequent item- Bd (F I)) discovered until the i iteration. In other words, sets mining problem in a binary relation, given a threshold ∀i < j, Bd+ + − − i ⊆ Bdj and Bdi ⊆ Bdj . Roughly speaking, minsup (see e.g. [1] for details). We recall the notion of candidates for Bd i are obtained from elements of Bd − + i , and borders of a set of itemsets [18]. Given F a set of item- candidates for Bd − i+1 are obtained from elements of Bd + i . sets over R, the positive border of F denoted by Bd + (F ) is The following definitions and results have been proposed defined by Bd+ (F ) = max⊆ {X ∈ F }. The negative bor- in [8] for inclusion dependency discovery problem in rela- der of F is defined by Bd − (F ) = min⊆ {Y ⊆ R | ∀X ∈ tional databases. We recall them in the context of maximal F, Y ⊆ X}. If F I is the set of all itemsets frequent in r, frequent itemsets mining, only the proofs are omitted. then Bd+ (F I) is called the set of maximal frequent itemsets We first define the notion of Optimistic positive border. in r. Definition 1(Optimistic positive border) Given a set F of We will use the concepts of hypergraph and minimal itemsets, the optimistic positive border of F is: F opt(F ) = transversal of a hypergraph, whose definition is pointed out max⊆ {X ⊆ R | ∀Y ∈ F, Y ⊆ X}. here (see for example [4] for more details). Given V a The next theorem gives a constructive characterization finite set of elements. A subset E of V defines a hyper- of F opt(F ). graph H = (V, E), where elements of V are called ver- tices of H and elements of E edges of H. A transversal T Theorem 1[8] F opt(F ) = T r(F ) of H = (V, E) is a subset of V that intersect all the ele- Therefore, the idea is to compute the optimistic positive ments of E. T is minimal if no other transversal of H are border for Bd − i to obtain exactly the largest itemsets which included in T. The set of all minimal transversals of H is do not contain any infrequent itemset discovered so far. noted T r(H). Proposition 1 Let X ∈ Fopt(Bd− i ). If sup(X) ≥ The relationship between the notion of borders and min- minsup, X ∈ Bd+ (F I). imal transversals of hypergraph has been exhibited in [18]. Indeed, any set of itemsets can be seen as a hypergraph; if Proof Since X is maximal in the definition of F opt(Bd − i ), F I is the set of frequent itemsets in a binary relation r, we each of its superset contains at least one element of Bd − i , have: T r(F I) = Bd− (F I), where F I = {R − X | X ∈ and is infrequent by anti-monotonicity. 2 F I}. Then, Bd+ i is exactly made up of all the frequent itemsets in F opt(Bd− i ). 3 Method description 3.3 From positive to negative border 3.1 Starting with a levelwise algorithm In a dual way, the set Bd +i is then used to compute its The algorithm Apriori [1] was initially devoted to fre- negative border Bd − (Bd+i ), to finally update the negative quent itemset mining; Nevertheless, it has been proved to border in construction and obtain Bd − i+1 . The next theorem gives a constructive characterization Algorithm 1 ABS: Adaptive Border Search of Bd− (F ), for any set F of frequent itemsets. Require: a binary database r, a integer minsup Theorem 2[18] Bd− (F ) = T r(F ) Ensure: Bd+ (F I) and Bd− (F I) 1: F1 = {A ∈ R | sup(A) ≥ minsup} Proposition 2 Let X ∈ Bd− (Bd+ i ). If sup(X) < 2: C2 = AprioriGen(F1 ) minsup, X ∈ Bd− (F I). − + 3: i = 2; Bd1 = R − F1 ; Bd0 = ∅ 4: while Ci = ∅ do Proof Let X be an element of Bd − (Bd+ i ). By the definition 5: Fi = {X ∈ Ci | sup(X) ≥ minsup} of the negative cover of a set, each subset of X is included 6: Bd− − i = Bdi−1 ∪ (Ci − Fi ) in an element of Bd+ (F I) and then is frequent. 2 7: Bdi−1 = Bd+ + i−2 ∪ {X ∈ Fi−1 | ∀Y ∈ Fi , X ⊆ Y } Then, Bd− i+1 is exactly made up of all the infrequent 8: if IsDualizationRelevant(i, |Bd− i |, |Fi |, |Ci |) = itemsets in Bd− (Bd+i ). T RU E then 9: Bd+i = {X ∈ GenP osBorder(Bdi ) | |X| ≥ − 3.4 The Algorithm ABS minsup} 10: while Bd+ + i = Bdi−1 do Algorithm 1 computes the positive and negative borders 11: Bdi = {X ∈ GenN egBorder(Bd+ − i ) | |X| ≤ of frequent itemsets in a given binary database. Within minsup} the framework of levelwise algorithms, ABS decides at 12: Bd+i+1 = {X ∈ GenP osBorder(Bdi ) | − each level whether or not the levelwise approach has to |X| ≥ minsup} be stopped. In that case, the levelwise approach is halted, 13: i=i+1 and the two borders are incrementally updated as de- 14: end while scribed previously. The functions GenP osBorder and 15: Return Bd+ − i and Bdi−1 and exit GenN egBorder compute respectively the optimistic pos- 16: end if itive and negative borders, using characterizations in theo- 17: Ci+1 = AprioriGen(Fi ) rems 1 and 2. The algorithm terminates when all elements 18: i=i+1 of the optimistic positive border currently computed are fre- 19: end while + + quent. It is worth noting that no dualization may occur at all: 20: Bdi−1 = Bdi−2 ∪ Fi−1 + − in this case, ABS is reduced to AP RIORI. The proposi- 21: Return Bdi−1 and Bdi−1 tion 3 ensures the correctness of ABS. The behavior of the function IsDualizationRelevant is described in section 3.5. 3.5 Adaptive aspects of ABS Proposition 3 The algorithm ABS returns Bd + (F I) and Bd− (F I). The main adaptive aspect of ABS is conveyed by the Proof If the test performed by IsDualizationRelevant() function IsDualizationRelevant, line 8 of algorithm 1. is never true, the demonstration is obvious. As mentioned, its goal is to estimate if it is interesting to If not, in line 15, from propositions 1 and 2, we have Bd + i ⊆ dualize the current negative border to the optimistic positive Bd+ (F I) and Bd− i−1 ⊆ Bd − (F I) border. Moreover, the termination condition ensures that Bd + i = We have identified four parameters specific to a given GenP osBorder(Bd− i−1 ); all elements in Bd + i are frequent iteration of the levelwise algorithm, which can be obtained and all elements in Bd − i−1 are infrequent. Suppose that dynamically without any overhead: ∃X ∈ Bd− (F I) | X ∈ Bd− i−1 . Then: • The current level i. No jump is allowed until a given • if ∃Y ∈ Bd−i−1 | Y ⊂ X, since Y is infrequent, X ∈ integer threshold; we set the threshold equal to 4, since Bd− (F I) and there is a contradiction Apriori is very efficient in practice to explore the lev- els 1 to 4. In our experiments, dualizing before this • if  ∃Y ∈ Bd−i−1 | Y ⊆ X, then from the definition of level incurs no improvement. the optimistic positive border ∃Z ∈ Bd + i | X ⊆ Z, which contradict the fact that X is infrequent. • |Bd−i |, the size of the current negative border. A sim- ple remark can be made here: if this parameter is very Thus Bd− − i−1 = Bd (F I). An identical reasoning leads to + + large (more than 100000) the minimal transversals Bdi = Bd (F I). computation become prohibitive. We are not aware 2 of existing implementations of minimal transversals computation able to handle such input hypergraphs 1 . A B C D E F G H I Moreover, such cases are likely to correspond to best 1 1 1 1 1 scenario for Apriori. 1 1 1 1 1 1 1 1 1 1 • |Fi |, the number of frequent i-itemsets and |Bd −i | have 1 1 1 1 1 to be compared. Indeed, a small value of |Bd − i | wrt 1 1 1 1 1 |Fi | is likely to give a successful dualization. 1 1 1 1 1 1 1 1 1 1 • |Fi | and |Ci |, the number of candidates in level i, can 1 1 1 1 1 also be compared. If |F i |/|Ci | is close to 1, we can 1 1 1 1 1 suspect to be in a "dense" part of the search space, and 1 1 1 1 1 thus the levelwise search should be stopped. 1 1 1 1 1 1 1 1 1 1 3.6 Practical aspects 1 1 1 1 1 1 1 1 1 1 3.6.1 Candidate generation from the current positive 1 1 1 1 1 border 1 1 1 1 From [18], candidate generation of a levelwise algorithm for a problem representable as sets can be formulated using Table 1. Example database dualization: At the ith iteration, we have Ci+1 = T r(∪j≤i Fj ) − ∪j≤i Cj to compute the optimistic positive border from level 2. It is made up of 16 itemsets of size 5, among which the only It is shown in [18] that candidate itemsets of C i+1 are ex- non frequent itemset is ABCDI. Thus, at this time, Bd + 2 = actly of size i + 1, which allows to improve candidate gen- {ABCHI, ABDGI, ABGHI, ACDF I, ACF HI, eration. ADF GI, AF GHI, BCDEI, BCEHI, BDEGI, In the setting of this paper, we can see C i+1 as the set BEGHI, CDEF I, CEF HI, DEF GI, EF GHI}. We Bd− − i+1 − Bdi , and thus we get: obtain Bd− (Bd+ 2 ) = {ABCD} of size 4, being understood that no elements of size 3 does exist. Ci+1 = T r(Bd+ i ) − ∪j≤i Cj In our first implementation, computing the set Bd− (Bd+ i ) using minimal transversals had a quite Here, the major difference with a pure levelwise ap- prohibitive cost on large hypergraph instances. Therefore, proach is that Bd+ i may contain some elements of size we made the choice to restrict Bd − (Bd+ i ) to its (i + 1)- greater than i + 1. itemsets for efficiency reasons. This choice has no effect One may question about the size of the largest elements on the correctness of the algorithm, since the termination of Ci+1 : does there exist elements of size strictly greater condition is always the same 2 . than i + 1 ? The answer is yes as shown in the following non trivial example. 3.6.2 Dealing with "almost frequent" large candidate Example 1 itemsets Let r be the binary relation over a schema R = {A, B, C, D, E, F, G, H, I} represented in Table 1. For a Let us consider the case of a candidate itemset obtained af- minsup equals to 1, the borders of frequent itemsets in r ter a dualization from the current negative border. Let X are Bd− = {AE, BF, CG, DH, ABCDI} and Bd+ = be this candidate. Two main cases do exist: either X is {ABCHI, ABDGI, ABGHI, ACDF I, ACF HI, frequent, or X is infrequent. In that case, we propose to ADF GI, AF GHI, BCDEI, BCEHI, BDEGI, estimate a degree of error in order to "qualify the jump". BEGHI, CDEF I, CEF HI, DEF GI, EF GHI, Given a new user-defined threshold δ, and a ABCD}. minimal support minsup, an error measure, noted After a levelwise pass until level two, the four NFI 2 We suspect the algorithm P incerSearch [17] to be not complete. of size two have been discovered, i.e. Bd −2 = Indeed, the search strategy of P incerSearch is very close to our propo- {AE, BF, CG, DH}. Suppose the algorithm decides here sition: if we only consider (i + 1)-itemsets in Bd− (Bd+ i ), they cor- to stop the pure levelwise search. Then, these sets are used respond exactly to the candidate set Ci+1 of P incerSearch. Since P incerSearch stops as soon as Ci+1 = ∅, some elements could be for- 1 Experiments conducted in [16, 2] only consider hypergraphs with not gotten. From the example 1, after the level 2, C3 is empty, and therefore more than 32000 edges. the maximal set ABCD seems to be never generated by P incerSearch. error(X, minsup), can be defined as the ratio between the We first evaluate the influence of the level from which minsup minsup and the support of the infrequent itemset the levelwise approach is stopped on the performances of X, i.e. error(X, minsup) = 1 − support(X) minsup . ABS. Then, the impact of "almost frequent" large itemsets Two sub-cases are worth considering: is studied for different threshold values for the error mea- sure. Finally, we compare ABS with four maximal frequent • either error(X, minsup) ≤ δ : the "jump" was not itemsets mining algorithms: Apriori and Eclat [12] imple- successful but solutions should exist among the nearest mented by C.Borgelt [5], F pmax [13] based on F P −trees subsets of X. [15] and IBE [21]. • or error(X, minsup) > δ : In that case, the jump 180 Connect Minsup 60% was over-optimistic and probably, no solution does ex- 160 ist among the nearest generalizations of X. 140 Note that this error measure is decreasing, i.e. X ⊂ 120 Total Time (sec) Y ⇒ error(X, minsup) ≤ error(Y, minsup) 100 In our current implementation, these almost frequent 80 large itemsets are first considered as frequent to enable more 60 pruning in subsequent passes. Afterward, they are consid- 40 ered at the very end of our algorithm. A pure top-down 20 3 4 5 6 7 8 9 10 levelwise approach has been implemented to find out their level subsets which can be maximal frequent itemsets. 80 Pumsb* Minsup 30% 70 4 Implementation details and experimental 60 results Total Time (sec) 50 40 4.1 Implementation details 30 An implementation of the algorithm has been performed 20 in C++/STL. Two existing implementations available from 10 3 4 5 6 7 8 9 10 the FIMI repository website [10] were borrowed: the level Apriori code of C. Borgelt [5] and the prefix-tree imple- mentation of B. Goethals using C++/STL [10]. Figure 1. Forcing the first dualization at level To keep coherence with this implementation, we use a k for connect (top) and pumsb* (bottom) similar data structure for the new parts of the algorithm. The itemsets and the transactions are stored in a prefix-tree [1, 6]. In figure 1, we forced the first dualization for different levels (from 3 to 8), on the connect dataset with a minsup Minimal Transversals computation For the minimal of 80 % and the pumsb* dataset with a minsup of 30%. transversals computation, we implemented the algorithm The results confirm the necessity to fix dynamically this pa- proposed in [9] using a prefix-tree in order to handle rel- rameter, and then justify an adaptive approach. Second, for atively large hypergraph instances. Its incremental aspect all tested datasets, our function IsDualizationRelevant is very interesting in our case, since the negative border is has dynamically determined the best level to begin dualiza- itself incremental. Note that improvements have been per- tion. formed by exploiting the knowledge of previous dualiza- The optimization based on the error measure is evalu- tions. We do not give more details here. ated on figure 2. From pumsb dataset (on the top), this op- timization appears to be interesting with a threshold value 4.2 Experimental results near 0.002. Nevertheless, on the connect dataset (bottom) no improvements is achieved. This comes from the fact that We conducted experiments on a pentium 4.3GHz Pro- the proposed error measure is not strictly decreasing; and cessor, with 1Go of memory. The operating system was the equivalence classes induced by closed frequent itemsets Redhat Linux 7.3 and we used gcc 2.96 for the compilation. are large. Our top down levelwise approach is prone to fail We used four datasets available on the FIMI’03 repository. on this kind of databases . 1600 Minsup 60% Minsup 65% From figure 4, IBE exhibits better performances than Minsup 70% 1400 Minsup 75% ABS for low support thresholds (less than 20%). This is 1200 due to the fact that while the size of the positive border re- mains small (less than 5000 elements) the size of the neg- 1000 ative border exceeds 10 6 elements, where some elements Total Time (sec) 800 appear to have a very large size. This seems to be the worst 600 case for ABS. 400 200 1000 Apriori Eclat FPmax 0 IBE 0 0.01 0.02 0.03 0.04 0.05 0.06 ABS Error 100 1000 Minsup 50% Minsup 60% 900 Minsup 70% Total Time (sec) 800 10 700 Total Time (sec) 600 1 500 400 300 0.1 50 45 40 35 30 25 20 15 10 5 0 200 Minimum Support (%) 100 0 Figure 4. Execution times for database 0 0.01 0.02 0.03 0.04 0.05 0.06 Error Pumsb* Figure 2. Exec. times for pumsb (top) and connect (down) wrt different error measure thresholds From figure 5, ABS behaves like Apriori as expected. Indeed, the positive border of retail is made up of "small" From figures 3, 5 and 6, ABS is far to compete with itemsets, and Apriori turns out to be the best implementa- best known implementations but tends to outperform IBE tion for this kind of datasets. for most of our experimentations. Recall that IBE is the unique implementation based on the concept of dualization available from FIMI’03. We believe that this is due to the 1000 Apriori Eclat number of dualization performed by IBE, which is in the FPmax IBE ABS size of the positive border. 100 1000 Total Time (sec) Apriori Eclat FPmax 10 IBE ABS 100 1 Total Time (sec) 10 0.1 0.1 0.08 0.06 0.04 0.02 0 Minimum Support (%) 1 Figure 5. Execution times for database Retail 0.1 95 90 85 80 75 70 65 60 55 50 Minimum Support (%) Figure 3. Execution times for database From figure 6, ABS is not as efficient as best known Pumsb implementations (e.g. f pmax), but improves Apriori by a factor of ten and beats Eclat and IBE. 10000 Apriori Eclat In [14], the authors propose the Dualize and Advance FPmax IBE ABS algorithm. In their approach, the positive border in con- 1000 struction is always a subset of the positive border to be dis- covered. At each step, from some elements of the positive border already discovered, they generate the correspond- Total Time (sec) 100 ing negative border. If one element of the negative border 10 appears to be satisfied, they generate a specialization of it which belongs to the positive border and they re-iterate the 1 process until each element of the negative border is indeed not satisfied. An implementation of a variant of Dualize and 0.1 Advance has been proposed in [21] with an irredundant du- 100 80 60 40 20 0 Minimum Support (%) alization. Their code is available from the FIMI’03 website. Some algorithms like M af ia [7] or DCI [19] can adapt Figure 6. Execution times for database Con- themselves to mine frequent itemsets, with respect to the nect dataset density and some architectural characteristics (e.g. available memory). Even if these aspects improve perfor- mances, it only concerns choices for data structures; the To sum up, two main reasons explain our mitigate re- mentioned algorithms do not really adapt their strategy to sults: 1) the cost of dualization remains high on very large explore the search space. hypergraph instances and 2) candidate generation and sup- port counting seems to be not enough efficient in our current 6 Conclusion and future works implementation. The main parameter influencing performance of ABS In this paper, we have proposed an ongoing effort to- turns out to be around the negative border. If for a given ward the discovery of maximal frequent itemsets. Our con- minsup, the negative border does not become too huge tribution takes its roots from the algorithm ZigZag devised and its largest element remains "small" with respect to the for inclusion dependency discovery in databases. Even if largest maximal frequent itemset, ABS should have good this two data mining problems fit into the same theoretical performance. framework [18], they widely differ in practice which is not a surprise. Indeed, while ZigZag performed very well in 5 Related works our experiments 3, ABS does not exhibit such a good behav- ior for maximal frequent itemsets mining. Many reasons Several algorithms exist for discovering maximal fre- explain this result, for instance the availability of public quent itemsets mining in a transactional database (see datasets allowing thorough experimentations, the intrinsic FIMI’03). The goal is always to avoid an exponential search properties of each problem, and may be the more impor- by characterizing as fast as possible largest frequent item- tant reason lies in the cost of a database access, in-memory sets without exploring their subsets. M axM iner [3] uses a resident data vs data stored into a DBMS. levelwise approach to explore the candidate itemsets, using Many improvements can be brought to our current im- the Rymon’s enumeration system [20] - in which itemsets plementation. Some are specific to our algorithm like for are arranged in a non redundant tree. But when a candi- instance minimal transversal computation on large hyper- date X is counted over the database, the greatest candidate graph instances or taking into account large equivalence in the subtree of X is also counted; if it is frequent, then classes induced by closed frequent itemsets during candi- all the subtree can be pruned by anti-monotony of the "is date generation from "almost frequent" itemsets. Some oth- frequent" property. Jumps done by M axM iner depend on ers belong to "the state of the art" of maximal frequent item- the ordering of items used to build the tree and are therefore sets implementation : managing huge set of set, support quiet different from jumps proposed in this paper. The al- counting... Complexity issues need also to be addressed. gorithms M af ia [7] and GenM ax [12] use the same prin- To end up, we want to quote a personal note on the main ciple as M axM iner with efficient optimizations, e.g. ver- objective of the FIMI workshop. We believe that frequent, tical bitmaps. closed and maximal itemsets mining are key data mining The P incer − Search Algorithm [17] uses a search tasks since algorithms devised to solve these tasks are likely strategy very close to ours. After a levelwise initialization, to be used in other contexts under some conditions [18]. the principle is also to look at the largest not yet eliminated Roughly speaking, for every problem representable as sets candidates. However, these large candidates are not charac- 3 Note that dynamic parameters were quiet different, e.g. the first dual- terized in a formal way. ization was always performed at the second level. with an anti-monotone predicate as for instance with func- [13] G. Grahne and J. Zhu. Efficiently using prefix-trees in min- tional dependency inference or simply anti-monotone pred- ing frequent itemsets. In FIMI’03 Workshop on Frequent icates on itemsets other than "is frequent", the algorithms Itemset Mining Implementations, November 2003. devised for FIMI should be useful to answer these tasks. [14] D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivo- nen, and R. S. Sharma. Discovering all most specific sen- Nevertheless, it seems rather optimistic to envision the ap- tences. ACM Transaction on Database System, 28(2):140– plication of many FIMI’03 [11] implementations to another 174, 2003. data mining problem representable as sets. Indeed, even if [15] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without the development of efficient data structures for managing candidate generation. In ACM SIGMOD’00, Dallas, Texas, huge sets of set is definitely useful, loading the database in USA, 2000. main memory using sophisticated data structure specially [16] D. J. Kavvadias and E. C. Stavropoulos. Evaluation of an al- devised for the anti-monotone predicate to be mined turns gorithm for the transversal hypergraph problem. In J. S. Vit- out to give very efficient algorithms but deserve other data ter and C. D. Zaroliagis, editors, Algorithm Engineering, In- mining tasks. ternational Workshop, WAE ’99, London, UK, volume 1668, 1999. [17] D.-I. Lin and Z. M. Kedem. Pincer search: A new algo- References rithm for discovering the maximum frequent set. In H.-J. Schek, F. Saltor, I. Ramos, and G. Alonso, editors, Extend- [1] R. Agrawal and R. Srikant. Fast algorithms for mining as- ing Database Technology (EDBT’98), Valencia, Spain, vol- sociation rules in large databases. In J. B. Bocca, M. Jarke, ume 1377 of Lecture Notes in Computer Science, pages 105– and C. Zaniolo, editors, International Conference on Very 119. Springer, 1998. Large Data Bases (VLDB’94), Santiago de Chile, Chile, [18] H. Mannila and H. Toivonen. Levelwise Search and Bor- pages 487–499. Morgan Kaufmann, 1994. ders of Theories in Knowledge Discovery. Data Mining and [2] J. Bailey, T. Manoukian, and K. Ramamohanarao. A fast Knowledge Discovery, 1(1):241–258, 1997. algorithm for computing hypergraph transversals and its ap- [19] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. Adap- plication in mining emerging patterns. In International Con- tive and resource-aware mining of frequent sets. In Inter- ference on Data Mining (ICDM’03), Floride, USA, pages national Conference on Data Mining (ICDM’02), Maebashi 485–488, 2003. City, Japan, pages 338–345, 2002. [3] R. J. Bayardo. Efficiently mining long patterns from [20] R. Rymon. Search through systematic set enumeration. In databases. In L. M. Haas and A. Tiwary, editors, ACM SIG- B. Nebel, C. Rich, and W. R. Swartout, editors, International MOD Conference, Seattle, USA, pages 85–93, 1998. Conference on Principles of Knowledge Representation and [4] C. Berge. Graphs and Hypergraphs. North Holland, Ams- Reasoning (KR’92), Cambridge, USA, pages 539–550. Mor- terdam, 1973. gan Kaufmann, 1992. [5] C. Borgelt. Efficient implementations of apriori and eclat. In [21] T. Uno and K. Satoh. Detailed description of an algorithm FIMI’03 Workshop on Frequent Itemset Mining Implemen- for enumeration of maximal frequent sets with irredundant tations, November 2003. dualization. In B. Goethals and M. Zaki, editors, ICDM [6] C. Borgelt and R. Kruse. Induction of association rules : 2003 Workshop on Frequent Itemset Mining Implementa- Apriori implementation. 2002. tions (FIMI ’03), Melbourne, Florida, USA, volume 90 of [7] D. Burdick, M. Calimlim, and J. Gehrke. Mafia: A maximal CEUR Workshop Proceedings, 2003. frequent itemset algorithm for transactional databases. In [22] M.-J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New International Conference on Data Engineering (ICDE’01), algorithms for fast discovery of association rules. In In- Heidelberg, Germany, pages 443–452. IEEE CS, 2001. ternational Conference on Knowledge Discovery and Data [8] F. De Marchi and J.-M. Petit. Zigzag : a new algorithm Mining (KDD-97), Newport Beach, California, USA, pages for discovering large inclusion dependencies in relational 283–286. AAAI Press, 1997. databases. In International Conference on Data Mining (ICDM’03), Melbourne, Florida, USA, pages 27–34. IEEE Computer Society, 2003. [9] J. Demetrovics and V. Thi. Some remarks on generating armstrong and inferring functional dependencies relation. Acta Cybernetica, 12(2):167–180, 1995. [10] B. Goethals. Frequent itemset mining implementations repository, http://fimi.cs.helsinki.fi/, 2003. [11] B. Goethals and M. Zaki, editors. Workshop on Fre- quent Itemset Mining Implementations. CEUR Workshop Proceedings, 2003. [12] K. Gouda and M. J. Zaki. Efficiently mining of maximal fre- quent itemsets. In N. Cercone, T. Y. Lin, and X. Wu, editors, International Conference on Data Mining (ICDM’01), San Jose, USA. IEEE Computer Society, 2001.