GARM: Generalized Association Rule Mining T. Hamrouni1,2 , S. Ben Yahia1 and E. Mephu Nguifo2 1 Department of Computer Science, Faculty of Sciences of Tunis, Tunis, Tunisia. {tarek.hamrouni, sadok.benyahia}@fst.rnu.tn 2 CRIL-CNRS, IUT de Lens, Lens, France. {hamrouni, mephu}@cril.univ-artois.fr Abstract. A thorough scrutiny of the literature dedicated to association rule min- ing highlights that a determined effort focused so far on mining the co-occurrence relations between items, i.e., conjunctive patterns. In this respect, disjunctive pat- terns presenting knowledge about complementary occurring items were neglected in the literature. Nevertheless, recently a growing number of works is shedding light on their importance for the sake of providing a richer knowledge for users. For this purpose, we propose in this paper a new tool, called GARM, aiming at building a partially ordered structure amongst some particular disjunctive pat- terns, namely the disjunctive closed ones. Starting from this structure, deriv- ing generalized association rules, i.e., those offering conjunctive, disjunctive and negative connectors between items, becomes straightforward. Our experimental study put the focus on the mining performances as well as the quantitative aspect and proved the utility of the proposed approach. Keywords: Data mining, disjunctive closed pattern, frequent essential pattern, disjunctive support, equivalence class, partially ordered structure, generalized as- sociation rules. 1 Introduction and Motivations Association rule mining is a fundamental topic in Data mining [1]. It has been exten- sively investigated since its inception. Its key idea consists in looking for causal rela- tionships between sets of items, commonly called itemsets, where the presence of some items suggests that others follow from them. A typical example of a successful appli- cation of association rules is the market basket analysis, where the discovered rules can lead to important marketing and management strategic decisions. Recently, mining as- sociation rules was extended to various pattern classes like sequential patterns, graphs, etc. Nevertheless, the main moan that can be addressed to the contributions related to association rules is their focus on co-occurrences between items [2], probably as a her- itage of the market basket analysis framework. Indeed, almost all related works neglect the other kinds of relations, like mutually exclusive occurrences [3], that can also bring information of worth interest for users. In this paper, we propose a new tool, called GARM 1 , covering the whole process allowing the extraction of generalized association rules. These latter generalize classical rules – positive rules – to offer disjunctive and negative connectors between items, 1 GARM is the acronym of generalized association rule miner. c Radim Belohlavek, Sergei O. Kuznetsov (Eds.): CLA 2008, pp. 145–156, ISBN 978–80–244–2111–7, Palacký University, Olomouc, 2008. 146 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo in addition to the conjunctive one [4]. Our tool includes a first component making it possible extracting a concise representation of frequent patterns based on disjunctive patterns. Thanks to a second component, these latter will be partially structured w.r.t. set inclusion. Once the partially ordered structure obtained, generalized association rules can be easily derived thanks to the last component of our tool. Noteworthily, extracting an exact concise representation of frequent patterns in the first component of the process makes it possible to exactly derive the different supports of each frequent pattern. This will make us able to compute the exact values of qual- ity measures. Indeed, it was shown in [5] that almost all interestingness measures for association rules are expressed depending on the support of the rule and those of its associated premise and conclusion. In addition, using disjunctive patterns – in particu- lar closed and essential patterns [6] – will provide an interesting starting point towards mining association rules conveying complementary occurrences between items, rather than co-occurrences. Indeed, these latter relationships – co-occurrences within literals 2 – were explored in-depth in the literature through association rules having conjunction of literals, called literalsets, in premise and conclusion. This leads to what is commonly known as positive and negative association rules. While disjunctive association rules only have recently begin to grasp the interest of researchers. In general, generalized association rules are useful in many applications. In partic- ular, disjunctive association rules – having disjunction of items either in premise or in conclusion – were considered for two main purposes: On the one hand, they were used as an intermediate step for defining some concise representations for frequent patterns [1]. On the other hand, they were exploited to provide users with new forms of asso- ciation rules [7, 8]. For example, the added-value of such association rules has been recently highlighted in [2]. It is however important to note that generalized association rules can be considered as particular GUHA rules [9]. Note that we restrict ourselves in this work to disjunctive closed patterns whose smallest seeds, i.e. essential patterns, are frequent with respect to a minimum conjunc- tive support threshold. This is argued by the fact that we aim at retaining the spirit of association rule mining where this threshold, as well as the confidence-based one, is used to dramatically limit the number of extracted association rules. In addition, the use of a partially ordered structure will make it possible to select representative subsets of rules to be extracted. This nucleus of rules will be of paramount help for avoiding to overwhelm users by highly-sized rule lists. The remainder of the paper is organized as follows. The next section discusses the related work. Section 3 recalls the key notions used throughout this paper. The struc- tural properties of the disjunctive search space are explored in Section 4, followed by a detailed description of the GARM tool having for purpose to offer a complete process for the extraction of generalized association rules in Section 5. Experimental results fo- cusing on the mining time as well as the quantitative aspect are reported and discussed in Section 6. Section 7 concludes the paper and points out future works. 2 A literal is an item or the negation of an item. GARM: Generalized Association Rule Mining 147 2 Related Work Contributions related to association rule mining mainly concentrated on the classical rule form, namely that presenting conjunction of items in both premise and conclusion parts. In this respect, many concise representations for such rules were proposed in the literature [10]. Recently, some works focused on introducing negative items. Never- theless, the majority of items are not present in each transaction leading to explosive amounts of association rules with negation. Thus, existing approaches have tried to address this problem through the use of additional background information about the data, incorporating attribute correlations, and additional rule interestingness measures, etc. Here we will mainly detail the reduced number of related works on association rules relying on the disjunctive connector within items. Some works [7, 8] were interested in using the disjunction connector within the association rule mining issue to define what is called generalized association rules. These rules grasped the interest of many researchers since they offer wealthier types of knowledge in many applications. In addition to the inclusive disjunction operator, i.e., the operator ∨, Nanavati et al. in [8] were also interested in the exclusive disjunction operator, denoted ⊕. The authors hence proposed two kinds of rules which are the simple disjunctive rules and the generalized disjunctive ones. Simple disjunctive rules are those having either the premise or the conclusion (i.e., not simultaneously both) composed by a disjunction of items. This disjunction can be inclusive (the simultaneous occurrence of items is possible) or exclusive (two distinct items cannot occur together). On the other hand, generalized disjunctive rules are disjunctive rules whose premises or conclusions contain a conjunction of disjunctions. These disjunctions can either be inclusive or exclusive. In [7], the author mainly focuses on getting out association rules having conclusions containing mutually exclusive items, i.e., the presence of one of them leads to the absence of the others, what is expressed in [8] using the operator ⊕. Other forms of generalized association rules were also described in [11]. In [12], Shima et al. extract what they called disjunctive closed rules. In their work, a disjunctive closed rule simply stands for a clause under the disjunctive normal form (DNF) such that its disjuncts are constituted by frequent closed patterns. Elble et al. used disjunctive rules to handle numerical attributes by considering disjunctions between intervals [13]. This latter work extends other ones taking also into account categorical attributes (see [13] for references). Finally, it is worth noting that the disjunction connector has also been used to define some concise representations of frequent patterns through the so-called disjunctive rule (see for example [1] for references). 3 Basic Concepts In this section, we briefly sketch the key notions that will be of use throughout the paper. Definition 1. An extraction context is a triplet K = (O, I, R) where O and I are, respectively, a finite set of objects (or transactions) and items (or attributes), and R ⊆ O × I is a binary relation between the objects and items. A couple (o, i) ∈ R denotes that the object o ∈ O contains the item i ∈ I. 148 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo Example 1. We will consider in the remainder a context that consists of transactions (1, AB ), (2, ACD ), (3, CDE ), (4, DEF ), (5, ABCDE ), and (6, ABC ) 3 . Definition 2. (S UPPORTS OF A PATTERN ) Let K = (O, I, R) be a context and I be a pattern. We mainly distinguish three kinds of supports related to I: Supp( ∧ I ) = | {o ∈ O | (∀ i ∈ I, (o, i) ∈ R)} | Supp( ∨ I ) = | {o ∈ O | (∃ i ∈ I, (o, i) ∈ R)} | Supp(I ) = | {o ∈ O | (∀ i ∈ I, (o, i) ∈ / R)} | Roughly speaking, the semantics of the aforementioned supports is as follows: • Supp(∧ I ) is the number of objects containing all items of I. • Supp(∨ I ) is the number of objects containing at least one item of I. • Supp(I ) is the number of objects that do not contain any item of I. Note also that Supp(∨ I ) and Supp(I ) are two complementary quantities w.r.t. |O| in the sense that: Supp(∨ I ) + Supp(I ) = |O|. Example 2. Consider our running context. We have Supp(∧ CDE) = | {3, 5} | = 2, Supp(∨ CDE) = | {2, 3, 4, 5, 6} | = 5 and Supp(CDE) = | {1} | = 1. Hereafter, Supp(∧ I ) will simply be denoted Supp(I ). In addition, if there is no risk of confusion, the conjunctive support will simply be called support. A pattern I is said to be frequent if Supp(I ) is greater than or equal to a minimum support threshold, denoted minsupp. Since the set of frequent patterns is an order ideal, the set of items I will be considered as only containing frequent items. Lemma 1 states that conjunctive supports can be derived starting from disjunctive ones. Lemma 1. [14] Let I ⊆ I. The following equalities hold: X 0 Supp(I ) = ( − 1)|I |−1 Supp( ∨ I 0 ) ∅⊂I 0 ⊆I 4 Structural Properties of the Disjunctive Search Space In this section, we will characterize disjunctive patterns through the associated equiva- lence classes induced by the following closure operator: Definition 3. Let K = (O, I, R) be an extraction context. The disjunctive closure op- erator h is defined as follows [6]: h : P (I ) → P (I ) I 7→ h(I ) = {i ∈ I | (∀ o ∈ O) ((o, i) ∈ R) ⇒ (∃ i1 ∈ I )((o, i1 ) ∈ R)}. The disjunctive closure h(I ) of a pattern I is equal to the maximal set of items which only appear in the transactions that contain at least an item of I. The closure operator h induces an equivalence relation on the power-set of I, which partitions it into so-called disjunctive equivalence classes. In each class, all the elements have the same disjunc- tive support. The smallest incomparable elements, w.r.t. set inclusion, of a disjunctive equivalence class are called essential patterns, while the disjunctive closed pattern is the largest one [6]. These particular patterns are defined as follows. 3 We use a separator-free form for the sets, e.g., ABC stands for the set of items {A, B, C}. GARM: Generalized Association Rule Mining 149 Definition 4. • A pattern I ⊆ I is a disjunctive closed pattern if I = h(I ) or, equivalently, Supp( ∨ I ) < min{Supp(∨I 0 ) | I 0 ⊆ I s.t. I ⊂ I 0 }. • A pattern I ⊆ I is an essential pattern if ∀ I 0 ⊂ I, I * h(I 0 ) or, equivalently, Supp(∨ I ) > max{Supp(∨I 0 ) | I 0 ⊆ I s.t. I 0 ⊂ I}. Example 3. Consider our running context. The pattern CDEF is disjunctively closed, while BE is not, since Supp(∨ BE ) = Supp(∨ BEF ). On the other hand, the pattern AC is essential, while DE is not, since Supp(∨ DE ) = Supp(∨ D ). In the remainder, FEP K 4 denotes the set of frequent essential patterns associated to a given context K and a fixed minsupp value. The associated set of disjunctive closure will further be denoted EDCP K 5 . This latter set is hence equal to {h(I ) | I ∈ FEP K }. To establish the link with conjunctive equivalence class – gathering patterns having the same Galois closure [15] – we notice that essential patterns (resp. disjunctive closed patterns) are equivalent to minimal generators aka free-sets (resp. closed patterns) (see [1] for references). These latter patterns were at the basis of the main concise repre- sentations of association rules that were proposed in the literature [10]. This clearly motivates the use of their correspondences within the disjunctive search space. 5 Detailed Description of the GARM Tool As mentioned in the first section, the GARM tool is composed of three complemen- tary components which are as follows: (i) Extracting an exact concise representation of frequent patterns based on disjunctive closed patterns and frequent essential ones. (ii) Building a partially ordered structure w.r.t. set inclusion within disjunctive closed patterns. Each one of these latter will be accompanied by its set of frequent essential patterns. (iii) Deriving generalized association rules from the built structure. 5.1 Extracting a New Concise Representation based on Disjunctive Patterns Our representation is based on the sets FEP K and EDCP K , as stated by Theorem 1. Theorem 1. The set EDCP K ∪ FEP K is an exact concise representation of the set of frequent patterns FP K [16]. Example 4. Figure 1 (Left) lists the set of disjunctive closed patterns associated to the running context. For each closed pattern, its associated disjunctive support and frequent essential patterns, for minsupp = 1, are also given. This representation will be denoted DSSRK 6 . It is extracted thanks to an adapta- tion of our DCPR M INER 7 algorithm [17], what constitutes the first component of the 4 Stands for frequent essential patterns. 5 Stands for essential disjunctive closed patterns. 6 Stands for disjunctive search space-based representation. 7 DCPR M INER is the acronym of disjunctive closed pattern-based representation miner. 150 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo ({AD, AE, BD, BCE}: ABCDEF, 6) EDCP K Disj. Supp. F EP K B 3 B ({AB, BC}: ABC, 5) ({BE}: BEF, 5) ({CD, CE}: CDEF, 5) C 4 C F 1 F AB 4 A ({D}: DEF, 4) EF 3 E ({A}: AB, 4) ABC 5 AC, BC ({E}: EF, 3) BEF 5 BE DEF 4 D CDEF 5 CD, CE ({B}: B, 3) ({C}: C, 4) ({F}: F, 1) ABCDEF 6 AD, AE, BD, BCE ∅ Fig. 1. (Left) The set EDCP K and the associated disjunctive support and frequent essential pat- terns for minsupp = 1. (Right) The equivalence classes partially ordered w.r.t. set inclusion. GARM tool. Starting from DSSRK , the conjunctive and negative supports of frequent patterns can thus be deduced using disjunctive supports. This representation also allows the derivation of the support of each literalset whose positive variation is based on a frequent pattern. This is carried outXusing the following formula [4]: Supp(x1 ∧ x2 ∧ . . . ∧ xn ∧ y1 ∧ y2 ∧ . . . ∧ ym ) = ( − 1)|S| Supp(x1 ∧ x2 ∧ . . . ∧ xn ∧ S), such S⊆{y1 ,...,ym } that its positive variation, namely {x1 , x2 , . . ., xn , y1 , y2 , . . ., ym }, belongs to FP K . 5.2 Building the Partially Ordered Structure In this section, we will propose a new algorithm, called POSB 8 , for partially sorting disjunctive closed patterns w.r.t. set inclusion. The POSB algorithm hence takes as input the representation DSSRK s.t. to each disjunctive closed pattern is associated its set of frequent essential patterns and disjunctive support. A node in the partially ordered structure will be associated to each disjunctive closed pattern. The pseudo- code of POSB is shown by Algorithm 1. Our algorithm inherits two main optimizations used in the algorithm proposed by Valtchev et al. [18], namely the sorting of disjunctive closed patterns, and the use of a border. Indeed, the set of disjunctive closed patterns EDCP K is sorted w.r.t. the increasing pattern size. Since closures of equal size cannot be comparable, this sorting avoids unnecessary comparisons. In addition, it makes possible that the closure f under treatment be of the largest size w.r.t. already treated ones. Thus, it suffices to find its lower cover among the nodes inserted in the structure. This lower cover is composed by those closures which are immediately covered by f . On the other hand, the border B is an anti-chain w.r.t. set inclusion containing max- imal closures among those already treated. In fact, the Valtchev et al. algorithm con- structs the Hasse diagram representing the subset-superset relationship among concepts in the Galois lattice. It begins at the top of the lattice and then recursively identifies the lower neighbors of each concept. Nevertheless, it is not directly adapted to our situa- tion. Indeed, although the intersection of two disjunctive closed patterns is obviously 8 POSB is the acronym of partially ordered structure builder. GARM: Generalized Association Rule Mining 151 Algorithm 1: POSB Input: The set EDCP K of disjunctive closed patterns. Output: The disjunctive closed patterns ordered by set inclusion. Begin B := ∅ ; Foreach (f ∈ EDCP K ) do P rohibited List = ∅; Foreach (b ∈ B ) do inter := b ∩ f ; If (inter = b) then L OWER C OVER I NSERTION (f , b); B := B\ b; Else If (inter 6= ∅) then L OWER C OVER M ANAGEMENT (f , b); B := B ∪ f ; End a disjunctive closed pattern, this latter does not necessarily belong to EDCP K . This is due to the fact that it could have all its essential patterns infrequent and, hence, has been already pruned. On its side, the proposed algorithm in [18] relies on the fact that the in- tersection of two concepts was already treated and it suffices to locate the corresponding node within the Hasse diagram. In Algorithm 1, disjunctive closed patterns are inserted one at a time to a structure which is only partially finished to obtain at the end the entire one. Let f be the current disjunctive closed pattern to be inserted in the partially ordered structure. f will be com- pared to the elements of the border B. If an element b ∈ B is included in f , then it is an element of its lower cover. A link between the node representing b and that representing f will be constructed thanks to the L OWER C OVER I NSERTION procedure (cf. Algo- rithm 2). The element b will then be deleted from the border. If b is not included in f but their intersection is not empty, then the L OWER C OVER M ANAGEMENT procedure will identify the common immediate predecessors of b and f (cf. Algorithm 3). Finally, f will be added to the border. It is important to note that in the L OWER C OVER M ANAGE - MENT procedure, a prohibited list is associated to each disjunctive closed pattern to be inserted in the partially ordered structure. Indeed, when updating the precedence link between disjunctive closed patterns, a node can be visited more than once since it can be an immediate predecessor of many other nodes. This list will avoid such useless treatments by only allowing the visit of nodes that do not belong to it. Example 5. The associated structure to our running context is given by Figure 1 (Right). 5.3 Deriving Generalized Association Rules Once the partially ordered structure built, deriving (subsets) generalized association rules can be easily done. An association rule R: X ⇒ Y based on a pattern Z, denoted Z-based rule, is such that X = {x1 , x2 , . . . , xn } ⊆ I and Y = {y1 , y2 , . . . , ym } ⊆ I be two patterns, X ∩ Y = ∅, and X ∪ Y = Z. An association rule is usually considered as interesting w.r.t. two statistical measures, namely the support and the confidence. The formulae of these measures for an arbitrary rule are as follows: 152 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo Algorithm 2: L OWER C OVER I NSERTION Input: A disjunctive closure f , and an element pred to be inserted in its lower cover. Output: The updated lower cover of f . Begin Foreach (l ∈ Lower Cover (f )) do inter := l ∩ pred; If (inter = pred) then return; Else If (inter = l ) then Lower Cover (f ) := Lower Cover (f ) \ l; Lower Cover (f ) := Lower Cover (f ) ∪ pred; End Algorithm 3: L OWER C OVER M ANAGEMENT Input: A disjunctive closed pattern f , and an element b of the border B. Output: The updated lower cover of f . Begin Foreach (pred b ∈ Lower Cover (b)) do If (pred b ∈/ P rohibited List) then inter := pred b ∩ f ; If (inter = pred b) then L OWER C OVER I NSERTION (f , pred b); Else If (inter 6= ∅) then L OWER C OVER M ANAGEMENT (f , pred b); P rohibited List := P rohibited List ∪ pred b; End Supp(X ∧ Y ) Supp(X ⇒ Y ) = Supp(X ∧ Y ), and, Conf(X ⇒ Y ) = Supp(X ) A rule is said to be exact if its confidence is equal to 1. Otherwise, it is said to be approximate. In addition, it is said to be interesting or valid if its support and confidence values are greater than or equal to their respective minimum thresholds minsupp and minconf. It is clear that whenever we are able to evaluate Supp(X ⇒ Y ), the derivation of the confidence value will be straightforward. Let us now adapt the association rule framework to our context. As shown in Sub- section 5.1, the DSSRK representation allows deriving the disjunctive, conjunctive and negative supports of each set of positive and negative items whose positive variation is based on a frequent pattern. In the sequel, we present an overview of the process by which we retrieve generalized association rules and evaluate their associated supports through traversing the partially ordered structure. Rules can be classified according to the number of nodes required for their extraction. We then distinguish two cases: 1. An intra-node rule: it requires a unique node and highlight relationships between a frequent essential pattern and its disjunctive closure f (here Z = f ). 2. An inter-nodes rule: it is extracted using two nodes N1 and N2 s.t. the associated disjunctive closure of N1 , denoted f1 , is one of the immediate predecessors of that of N2 , denoted f2 . Let e1 be a frequent essential pattern of f1 . An inter-nodes rule describes relationships between either f1 and f2 or e1 and f2 (here Z = f2 ). GARM: Generalized Association Rule Mining 153 Both kinds of rules – intra-node and inter-nodes – can be either exact or approximate. Different forms of generalized association rules can be extracted starting from our representation (cf. [16] for a detailed description). To limit the number of possible ex- tracted rule forms, we mainly focus here on the following ones: 1. Form 1: disjunction of items in premise and conclusion ∨ X ⇒ ∨ Y : Supp(∨ X ⇒ ∨ Y ) = Supp(∨ X ∧ ∨ Y ) = Supp(∨ X ) + Supp(∨ Y ) - Supp((∨ X ) ∨ (∨ Y )) = Supp(∨ X ) + Supp(∨ Y ) - Supp(∨ Z), 2. Form 2: negation of items in premise and conclusion X ⇒ Y : Supp(X ⇒ Y ) = Supp(X ∧ Y ) = Supp((( ∨ X ) ∨ ( ∨ Y ))) = Supp(Z) = |O| - Supp(∨ Z), 3. Form 3: disjunction of items in premise and negation of items in conclusion ∨ X ⇒ Y : Supp(∨ X ⇒ Y ) = Supp(∨ X ∧ Y ) = Supp((∨ X ) ∨ (∨ Y )) - Supp(∨ Y ) = Supp(∨ Z) - Supp(∨ Y ), and, 4. Form 4: negation of items in premise and disjunction of items in conclusion X ⇒ ∨ Y : Supp(X ⇒ ∨ Y ) = Supp(X ∧ ∨ Y ) = Supp((∨ X ) ∨ (∨ Y )) - Supp(∨ X ) = Supp(∨ Z) - Supp(∨ X ), where either X or Y is a frequent essential pattern or a disjunctive closed one, and Z = X ∪ Y is a disjunctive closed pattern (as described above). For each rule, the support of Z is known. It is the same for either X or Y since one of them is assumed to be a frequent essential pattern or a disjunctive closed pattern. For the sake of simplicity, we assume in the remainder that X is a frequent essential pattern or a disjunctive closed pattern. Since Y = Z\X, then Y does not necessarily belong to DSSRK and, may even not be a frequent pattern. Nevertheless, its disjunctive support is required to evaluate that of the associated rule. To this end, we bound the support of Y using a lower bound, denoted lb Supp, and an upper bound, denoted ub Supp, computed as follows: • lb Supp(∨ Y ) = max{Supp(∨ e) | e ∈ FEP K and e ⊆ Y }, • ub Supp(∨ Y ) = min{Supp(∨ f ) | f ∈ EDCP K and Y ⊆ f }. In this respect, if Y is encompassed between a frequent essential pattern and its disjunctive closure, then lb Supp(∨ Y ) = ub Supp(∨ Y ). Hence, the support and confi- dence of the associated rule will be exactly computed. Otherwise, these latter measures will be bounded by a minimal and a maximal possible value using the bounds associated to Y . Such rules, further denoted approximated rules, are defined as follows: Definition 5. An association rule is said to be approximated if it has either its support or its confidence not exactly determined. Then, only valid rules having minimum possible values of support and confidence greater than or equal to minsupp and minconf, respectively, will be retained. Note that an approximated rule is different from an approximate rule in the sense that the latter has its support and confidence exactly computed (with a confidence not equal to 1), what is not the case of the former. In this respect, approximated rules were shown to convey interesting knowledge in the case of positive rules (see for example [19]). Noteworthily, the bounds lb Supp(∨ Y ) and ub Supp(∨ Y ) always exist. Indeed, on the one hand, since the set of items I is pruned w.r.t. minsupp, then Y will be composed of frequent items even if it is infrequent. These items obviously belong to FEP K , what ensures the existence of the lower bound. On the other hand, Y is covered by at least a disjunctive closed pattern, namely Z, what ensures the existence of the upper bound. 154 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo Example 6. Let minsupp = 1 and let minconf = 0.7. Consider the intra-node rule R1 of Form 1 based on the disjunctive closed pattern ABCDEF and its frequent essential pattern BCE: ∨ BCE ⇒ ∨ ADF. Supp(R1 ) = Supp(∨ BCE) + Supp(∨ ADF) - Supp(∨ ABCDEF) = Supp(∨ ADF) (since h(BCE ) = ABCDEF ). Since ADF ∈ / DSSRK , we need to evaluate its support. Since AD ⊆ ADF ⊆ h(AD ) = ABCDEF (cf. Figure 1 (Left)), then lb Supp(∨ ADF) = ub Supp(∨ ADF) = 6. Hence, Supp(R1 ) = 6 and Conf (R1 ) = 1. R1 is hence a valid rule. Now, consider the inter-nodes rule R2 of Form 1 based on ABCDEF and one of its immediate predecessors, namely ABC (cf. Figure 1 (Right)): ∨ ABC ⇒ ∨ DEF. In this case, DEF ∈ EDCP K . Hence, Supp(R2 ) = Supp(∨ ABC) + Supp(∨ DEF) - Supp(∨ ABCDEF) = 5 + 4 - 6 = 3, and Conf (R2 ) = 0.6. Here, we took X = ABC. If we set Y = ABC, then the associated rule R3 = ∨ DEF ⇒ ∨ ABC will have the same support than R2 . Nevertheless, its confidence is equal to 0.75. Hence, R3 is a valid rule while R2 is not. 6 Experimental Results Our experiments 9 focused on the mining time as well as the number of extracted valid rules w.r.t. their associated type, i.e., exact, approximate or approximated. They were carried out on a PC equipped with a Pentium (R) having 3GHz as clock frequency and 1.75GB of main memory, running the GNU/Linux distribution Fedora Core 7 (with 2GB of swap memory). The compiler gcc 4.1.2 is used to generate the executable code starting from our C++ implementation. Table 1. Mining time of generalized association rules on benchmark contexts. Context minsupp (%) Component 1 Component 2 Component 3 Total time C ONNECT 80.00 2.1530 0.0068 0.0380 2.1978 60.00 2.2807 0.0402 0.1618 2.4827 40.00 2.5571 1.0443 0.9813 4.5827 P UMSB 90.00 3.1875 0.0403 0.1015 3.3293 80.00 3.1581 2.9364 1.9693 8.0638 70.00 3.6630 19.5460 8.7276 31.9366 KOSARAK 0.90 12.4551 0.1645 0.2239 12.8435 0.70 16.2936 0.6825 0.3794 17.3555 0.50 26.4491 5.6164 0.8738 32.9393 R ETAIL 2.00 0.8471 0.0039 0.0135 0.8645 1.00 1.0803 0.0113 0.0334 1.1250 0.50 2.3909 0.1127 0.1331 2.6367 In the proposed experiments, the minconf value is set to the relative minimum sup- minsupp port value, i.e., |O| . Table 1 presents the mining time in seconds of the three components of GARM. This table shows the efficiency of our tool towards extract- ing generalized associated rules. Indeed, even for low minsupp values, GARM remains very fast. In this respect, the time consumed by each component, w.r.t. the total time, 9 Test contexts are available at: http://fimi.cs.helsinki.fi/data. GARM: Generalized Association Rule Mining 155 Table 2. Number of extracted generalized association rules on benchmark contexts. Context minsupp (%) Exact Approximate Approximated Total number C ONNECT 80.00 620 316 152 1, 088 60.00 1, 533 1, 337 354 3, 224 40.00 3, 319 5, 813 3, 130 12, 262 P UMSB 90.00 566 1, 322 730 2, 618 80.00 4, 376 13, 426 5, 002 22, 804 70.00 9, 409 26, 747 14, 870 51, 026 KOSARAK 0.90 0 7, 586 0 7, 586 0.70 0 13, 046 0 13, 046 0.50 0 29, 648 0 29, 648 R ETAIL 2.00 0 464 0 464 1.00 0 1, 160 0 1, 160 0.50 0 4, 622 0 4, 622 closely depends on the context characteristics. Nevertheless, the second and third com- ponents are in general faster than the first one. On the other hand, Table 2 highlights that the number of extracted rules closely depends on the context density. Indeed, the higher the value of this latter, the larger the associated equivalence classes are, and the greater the number of frequent essential patterns and closed ones is. This fact augments the number of rules even for high minsupp values for dense contexts. Interestingly enough, the number of exact and approximated rules for R ETAIL and KOSARAK is equal to 0 for the tested minsupp values. This is due to the fact that for both contexts, each essen- tial pattern is equal to its disjunctive closure what is not the case for the C ONNECT and P UMSB contexts. Please note that the mining time and the number of extracted rules when minconf varies is omitted here, due to space limitations. 7 Conclusion and Perspectives In this paper, we presented a complete tool, called GARM, allowing the extraction of generalized association rules. Our tool is composed of three components. The first consists in extracting a concise representation of frequent patterns based on disjunctive closed ones. The second component aimed at partially ordering these closure w.r.t. set inclusion. Once the structure built, extracting subsets of generalized association rules becomes a straightforward task thanks to the last component. Carried out experiments proved the effectiveness of the proposed tool. It is also important to mention that our GARM tool is easily adaptable to the case where the input is composed by conjunctive (closed) patterns instead of disjunctive ones. Other avenues for future work mainly address the following points: First, a detailed comparison of our approach to the general GUHA approach [9] will be carried out. Second, the relationships between the various rule forms will be studied. The purpose is to only retain a lossless subset of rules while being able to derive the remaining re- dundant ones. Adequate axiomatic systems need thus to be set up. Acknowledgments: We would like to thank anonymous reviewers for their helpful comments and suggestions. We are also grateful to Mrs. Nassima Ben Younes for fruit- ful discussions and help in the implementation of the tool. This work is supported by the French-Tunisian project CMCU-Utique 05G1412. 156 Tarek Hamrouni, Sadok Ben Yahia, Engelbert Mephu Nguifo References 1. Ceglar, A., Roddick, J.F.: Association mining. ACM Computing Surveys, volume 38(2) (2006) 2. Steinbach, M., Kumar, V.: Generalizing the notion of confidence. Knowledge and Informa- tion Systems, volume 12(3) (2007) 279–299 3. Tzanis, G., Berberidis, C.: Mining for mutually exclusive items in transaction databases. International Journal of Data Warehousing and Mining, volume 3(3) (2007) 45–59 4. Toivonen, H.: Discovering of frequent patterns in large data collections. PhD thesis, Univer- sity of Helsinki, Helsinki, Finland (1996) 5. Hébert, C., Crémilleux, B.: A unified view of objective interestingness measures. In: Pro- ceedings of the 5th International Conference Machine Learning and Data Mining in Pattern Recognition, Springer-Verlag, LNCS, volume 4571. (2007) 533–547 6. Hamrouni, T., Denden, I., Ben Yahia, S., Mephu Nguifo, E.: A new concise representation of frequent patterns through disjunctive search space. In: Proceedings of the 5th International Conference on Concept Lattices and their Applications. (2007) 50–61 7. Kim, H.D.: Complementary occurrence and disjunctive rules for market basket analysis in data mining. In: Proceedings of the 2nd IASTED International Conference Information and Knowledge Sharing. (2003) 155–157 8. Nanavati, A.A., Chitrapura, K.P., Joshi, S., Krishnapuram, R.: Mining generalised disjunc- tive association rules. In: Proceedings of the 10th International Conference on Information and Knowledge Management. (2001) 482–489 9. Hájek, P., Havránek, T.: Mechanizing Hypothesis Formation: Mathematical Foundations for a General Theory. Springer-Verlag (1978) 10. Kryszkiewicz, M.: Concise representations of association rules. In: Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery in Data Mining, Springer-Verlag, LNCS, volume 2447. (2002) 92–109 11. Grün, G.A.: New forms of association rules. Technical Report TR 1998-15, School of Computing Science, Simon Fraser University, Burnaby, BC, Canada (1998) 12. Shima, Y., Hirata, K., Harao, M., Yokoyama, S., Matsuoka, K., Izumi, T.: Extracting dis- junctive closed rules from MRSA data. In: Proceedings of the 1st International Conference on Complex Medical Engineering. (2005) 321–325 13. Elble, J., Heeren, C., Pitt, L.: Optimized disjunctive association rules via sampling. In: Proceedings of the 3rd IEEE International Conference on Data Mining. (2003) 43–50 14. Galambos, J., Simonelli, I.: Bonferroni-type inequalities with applications. Springer (2000) 15. Ganter, B., Wille, R.: Formal Concept Analysis. Springer (1999) 16. Hamrouni, T., Denden, I., Ben Yahia, S., Mephu Nguifo, E.: Exploring the disjunctive search space towards discovering new exact concise representations for frequent patterns. Technical report, CRIL-CNRS of Lens, Lens, France (2007) 17. Denden, I., Hamrouni, T., Ben Yahia, S.: Efficient exploration of the disjunctive lattice towards extracting concise representations of frequent patterns. To appear in the Proceedings of the 9th African Conference on Research in Computer Science and Applied Mathematics (in French). (2008) 18. Valtchev, P., Missaoui, R., Lebrun, P.: A fast algorithm for building the Hasse diagram of a Galois lattice. In: Proceedings of the Conference on Combinatorics, Computer Science and Applications. (2000) 293–306 19. Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: A condensed representation of Boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery volume 7(1) (2003) 5–22