VIE VIE MGB: MGB: AA Visual Visual Interactive Interactive Exploration Exploration of of Minimal Generic Basis of Assopciation Rules Chiraz Latiri Cherif1 , Wissem Bellegha1 , Sadok Ben Yahia2 , and 1 Chiraz LatiriGhada CherifGuesmi , Wissem 2 Bellegha1 Sadok Ben Yahia and Ghada Guesmi2 2 1 1 École Supérieure de Commerce de Tunis, École Supérieure de Commerce de Tunis, Computer Computer Sciences Sciences Department, Department, Campus Universitaire de La Manouba, 2010, Tunisia chiraz.latiri@gnet.tn chiraz.latiri@gnet.tn wissembellegha@gmail.com 2 wissembellegha@gmail.com 2 Faculty of Sciences of Tunis, Computer Sciences Department Faculty of Sciences of Tunis, Research Computer Team URPAH Sciences Department Research Campus Universitaire ELTeam URPAH Manar, 1060 Tunis, Tunisia sadok.benyahia@fst.rnu.tn Campus Universitaire EL Manar, 1060 Tunis, Tunisia ghada gasmi@yahoo.fr sadok.benyahia@fst.rnu.tn ghada gasmi@yahoo.fr Abstract. Mining association rules is an important task, even though the number of rules discovered is often huge. A possible solution to this problem, is to use the Formal Concept Analysis (FCA) mathematical settings to restrict rules extraction to a generic basis of association rules. This one is considered as a reduced set to which we can apply appropriate inference mechanisms to derive redundant rules. In this paper, we intro- duce a new minimal generic basis MGB of non-redundant association rules based on the augmented Iceberg Galois lattice. The proposed ap- proach involves the inference mechanisms used and a set of experiments applied to several real and synthetic databases. Carried out experiments showed important benefits in terms of reduction in the number of generic rules extracted. We present also a new framework for generating and vi- sually exploring the minimal generic basis MGB. 1 Introduction Discovering association rules in large datasets is an interesting problem of re- search in data mining. Its principal objective is to find out correlations between frequents itemsets. An association rule is a strong one when its support (i.e., frequency of the represented pattern) and confidence (i.e., strength of the de- pendency between the premise and the conclusion) are higher than minimum Radim Bělohlávek, Václav Snášel (Eds.): CLA 2005, pp. 179–196, ISBN 80–248–0863–3. 180 Chiraz Latiri Cherif, Wissem Bellegha, Sadok Ben Yahia, Ghada Guesmi thresholds fixed by the user (minsup and minconf ). However the number of association rules generated is often huge because many of them are redundant. So, a possible solution to this problem, is to restrict rules extraction to a reduced set containing only non redundant ones which are strictly related to the user’s need [1]. This reduced set is known as generic basis for association rules, to which we can apply appropriate inference mechanisms to derive all strong redundant association rules. A generic basis can be evaluated by three properties such as informativity, compacity and inference mechanism which will be detailed later. This paper is organized as follows. Section 2 presents the mathematical back- ground of Formal Concept Analysis (FCA) [2] and describes the association rules mining task. Section 3 deals with the problem of eliminating redundant rules by surveying some of the proposed approaches for constructing generic bases of association rules. In section 4, we present a new minimal generic basis of asso- ciation rules denoted as MGB. Section 5 consolidates the proposed approach by a set of experiments applied to several real synthetic databases. Finally, we will present VIE MGB as a new framework for visually generating and exploring MGB and the Iceberg Galois lattice. We were inspired by MIRAGE [3] which allows interactive graphical visualization and exploration of minimal association rules. 2 Mathematical background In this section, we introduce the mathematical notions based on the FCA. 2.1 Basic notions Formal context: A formal context is a triplet K = (O, A, R), where O rep- resents a finite set of objects, A a finite set of attributes, and R a binary re- lation(i.e., R ⊆ O × A). Each couple (o, a) ∈ R expresses that the transaction o ∈ O contains the attribute a ∈ A. We define two functions, summarizing links between subsets of objects and subsets of attributes induced by R, that map sets of objects to sets of attributes and vice versa. Thus, for a set O ⊆ O, we define φ(O) = {a | ∀o, o ∈ O =⇒ (o, a) ∈ R} ; and for A ⊆ A, ψ(A) = {o | ∀a, a ∈ A =⇒ (o, a) ∈ R}. Both functions φ and ψ form a Galois connection between the sets P(A) and P(O) [4]. Consequently, both compound operators of φ and ψ are closure operators, in particular, ω = φ ◦ ψ is a closure operator. VIE MGB: A Visual Interactive Exploration 181 Frequent closed itemset: An itemset A ⊆ A is said to be closed if A = ω(A), and is said to be frequent with respect to minsup threshold if support(A) = |ψ(A)| |O| ≥ minsup. Formal concept: A formal concept is a couple c = (O; A), where O is called extent, and A is a closed itemset, called intent. Furthermore, both O and A are related through the Galois connection, i.e., φ(O) = A and ψ(A) = O. Minimal generator: An itemset g ⊆ A is called minimal generator of a closed itemset A, if and only if ω(g) = A and  g ⊆ g such that ω(g) = A [5]. Galois lattice: Given a formal context K, the set of formal concepts CK is a complete lattice Lc = (C, ≤), called the Galois (concept) lattice, when CK is considered with inclusion between itemsets [2] [4]. A partial order on formal concepts is defined as follows ∀c1 , c2 ∈ CK , c1 ≤ c2 iff intent(c2 ) ⊆ intent(c1 ), or equivalently extent(c1 ) ⊆ extent(c2 ). The partial order is used to generate the lattice graph, called Hasse diagram, in the following manner: there is an arc (c1 , c2 ), if c1 c2 where is the transitive reduction of ≤, i.e., ∀c3 ∈ CK , c1 ≤ c3 ≤ c2 implies either c1 = c3 or c2 = c3 . Property 1 Lattice operator join provides the least upper bound (LUB) in the concept lattice, and it is defined as follows: Let S ⊆ Lc   Join(S) = ω ∪ c c∈S Iceberg Galois lattice: When only frequent closed itemsets are considered with set inclusion, the resulting structure (Lc , ⊆) only preserves the LUBS , i.e., the join operator. This is called a join semi-lattice or upper semi-lattice. In the remaining of the paper, such structure is referred to as ”Iceberg Galois Lattice”. Example 1 Let us consider the extraction context given by table 1. The asso- ciated Iceberg Galois lattice, for minsup = 12 , is depicted by figure 1. Each node in the Iceberg is represented as couple (closed itemset,support) and is decorated with its associated minimal generators list. 2.2 Association rules An association rule represents an implication between frequent itemsets. It is an expression of the form: R : X =⇒ Y [6], in which X and XY are frequent itemsets, and X ⊂ XY . Itemsets X and Y are called, respectively, premise and conclusion of the rule R. The support of R denoted supp(R), is equal to the support of XY , and its confidence denoted by conf (R) is determined as follows: 182 Chiraz Latiri Cherif, Wissem Bellegha, Sadok Ben Yahia, Ghada Guesmi RACDTW 1 ×× × × 2 ×× × 3 ×× × × 4 ××× × 5 ×××× × 6 ××× Table 1. Extraction context k Fig. 1. Iceberg Galois lattice for minsup = 12 conf (R) = supp(XY ) supp(X) . If conf (R) = 1, then R is an exact association rule (ER), otherwise it is called approximate association rule (AR). The process of association rules extraction can be split into two steps [6]: 1. Finding out all frequent itemsets from a given context extraction k for a specified minsup value. 2. Generating all association rules from the frequent itemsets obtained and restricting extraction to rules which confidence satisfies the minconf value. 3 Extraction of non redundant association rules The problem of the relevance and usefulness of extracted association rules is of primary importance. Indeed, in most real life databases, the set of association rules can rapidly grow to be unwieldy, especially as we lower the minsup value, and among which many are redundant. So, a possible solution to this problem VIE MGB: A Visual Interactive Exploration 183 is to restrict extraction of rules to a generic basis of non redundant association rules. Once this generic basis is obtained, all the remaining (redundant) rules can be derived using an appropriate inference mechanism. In an ideal case a generic basis should satisfy the following conditions [7]: - Informativity: to exactly determine the support and confidence of redundant association rules. - Deriving redundant rules: the derivation axioms should be lossless (should enable derivation of all strong rules) and sound (should forbid derivation of rules that are not strong). - Compacity: to have the most reduced set of generic rules allowing the deriva- tion of all strong remaining ones(redundant rules). In this section, we will overview some representations of strong association rules. 3.1 Minimal non redundant association rules (MNR) In [5], Bastide et al. present a couple of generic bases, one for the exact rules denoted GB, and the second for approximate ones denoted IB. Bastide et al. consider the following rule-redundancy definition: Definition 1 An association rule R : X =⇒ Y is a minimal non redundant association rule iff  an association rule R1 : X1 =⇒ Y1 fulfilling the following constraints: 1. supp(R) = supp(R1 ) and conf (R) = conf (R1 ) 2. X1 ⊂ X and Y ⊂ Y1 Exact association rules, of the form R : X =⇒ Y , are implications between two itemsets X and XY whose closures are identical, i.e., ω(X) = ω(XY ). Indeed, from the aforementioned equality, we deduce that X and XY belong to the same class of the equivalence relation induced by the closure operator ω on P (A) and then supp(X) = supp(XY ) (i.e., conf (R) = 1). Bastide et al. characterized what they called “the generic basis for exact association rules” (adapting the global implications base of Duquenne and Guigues [8]). The generic basis for exact association rules, that has been proved to be lossless information, is defined as follows: Definition 2 Let F C be the set of frequent closed itemsets extracted from a context and, for each frequent closed itemset c, let us denote Gc the set of minimal generators of c. The generic basis for exact association rules is as follows: 184 Chiraz Latiri Cherif, Wissem Bellegha, Sadok Ben Yahia, Ghada Guesmi Association rules Support Conf idence 2 Association rules Support Conf idence D =⇒ C 3 1 1 3 2 T =⇒ ACW 2 4 T =⇒ C 3 1 1 3 5 A =⇒ CT W 2 4 W =⇒ C 6 1 1 3 2 D =⇒ CW 2 4 A =⇒ CW 3 1 5 5 1 C =⇒ W 6 6 DW =⇒ C 2 1 2 2 1 C =⇒ T 3 3 T W =⇒ AC 2 1 2 2 1 C =⇒ D 3 3 AT =⇒ CW 2 1 Table 2. Non redundant exact and approximate association rules extracted from the context extraction k for minsup = 12 and minconf = 23 GB = {R : g =⇒ c − g | c ∈ F C ∧ g ∈ Gc ∧ g = c}. The authors also characterized the informative basis for approximate asso- ciation rules based on extending the family of bases of partial implications of Luxemburger [8], e.g., cover, structural and arborescent bases. The informative basis is defined as follows: Definition 3 The informative basis for approximate association rules is given by: IB = {R : g =⇒ c − g, c ∈ F C ∧ g ∈ G ∧ ω(g) ⊂ c}. With respect to Definitions 2 and 3, we consider that given an Iceberg Galois lattice, representing precedence-relation-based ordered closed itemsets, generic bases of association rules can be derived in a straightforward manner. We assume that in such structure, each closed itemset is ”decorated” with its associated list of minimal generators. Hence, AR represent ”inter-node” implications, assorted with a statistical information, i.e., the confidence, from a sub-closed-itemset to a super-closed-itemset while starting from a given node in an ordered structure. Inversely, ER are ”intra-node” implications extracted from each node in the ordered structure. Example 2 Generic bases of exact and approximate association rules that can be drawn from the context k, are respectively depicted in table 2. As shown in [7] the couple (GB,IB) form a sound and informative generic basis. However, it produces a very important number of association rules, so this generic basis is not compact. VIE MGB: A Visual Interactive Exploration 185 Association rules Support Conf idence 1 3 A =⇒ ACT W 2 4 1 3 T =⇒ ACT W 2 4 1 3 D =⇒ CDW 2 4 2 2 ∅ =⇒ CD 3 3 2 2 ∅ =⇒ CT 3 3 2 2 ∅ =⇒ ACW 3 3 Table 3. Representative basis RBP han extracted from context extraction k for minsup = 12 and minconf = 23 . 3.2 Representative basis (RBP han ) In [9], the author proposes a new generic basis for association rules denoted RBP han , and presents its associated inference mechanism. This approach takes as input all frequent itemsets obtained from a given context extraction k by applying an appropriate algorithm such as Apriori [10]. Phan defines the rule- redundancy as follows: Definition 4 Let E be the set of all strong association rules extracted from a context extraction k. The rule r : l1 =⇒ l2 ∈ E is a redundant one if and    only if there is a rule r : l1 =⇒ l2 ∈ E, and the two following conditions are satisfied:  1. l1 ⊆ l1  2. l2 ⊆ l2 So the representative basis RBP han is defined as follows: Definition 5 Let minsup and minconf be the support and confidence thresholds for interesting association rules.        RBP han = r : I =⇒ J | I ⊂ J ∧ r : I =⇒ J | I ⊆ I ∧ J ⊆ J Remark 1. RBP han consists in a redefinition of representative rules denoted as RR and defined in [7]. However, the difference between them is that for RBP han the premise and the conclusion of a given rule are not necessarily disjoint. Example 3 Given the context extraction k depicted by table 1, we present the representative basis extracted for minsup = 12 and minconf = 23 in table 3. 186 Chiraz Latiri Cherif, Wissem Bellegha, Sadok Ben Yahia, Ghada Guesmi Once the representative basis is generated, we can derive all strong redundant association rules by applying the inference mechanism defined in [9] as follows: Definition 6 Let I, J, K be frequent itemsets and having RBP han , we consider two derivation axioms: 1. Left augmentation: if I =⇒ JK is interesting, then IJ =⇒ JK is interesting. 2. Decomposition: if I =⇒ I1 I2 is interesting, then I =⇒ I1 and I =⇒ I2 are interesting. Moreover, Phan showed that the representative basis is the most reduced set of association rules, from which all remaining rules (redundant) can be derived, so the property of compacity is satisfied. RBP han is sound and lossless, therefore, this approach presents some drawbacks such as: 1. The representative basis is not informative. The support and confidence of derived rules can not be found.    2. For each rule R : X =⇒ Y , we must verify if there is no rule R : X =⇒ Y   such as X ⊆ X ∧Y ⊆ Y . 3.3 Informative Generic Basis (IGB) In [11], the authors propose a new informative generic basis for association rules denoted IGB, which takes as input all the frequent closed itemsets. IGB is defined in [11] as follows: Definition 7 Let F C be the set of frequent closed itemsets and Gc , the list of minimal generators for a given frequent closed itemset c. IGB={R : gs =⇒ I − gs | I ∈ F C ∧ gs ∈ Gc , c ∈ F C ∧ c ⊆ I     ∧ conf (R)≥minconf ∧ g | g ⊂ gs ∧ conf (g =⇒ A − g ) ≥ minconf } Example 4 Given the context extraction k of table 1, we present the basis IGB extracted for minsup = 12 and minconf = 23 in table 4. In [11], the authors prove that IGB is informative. In fact, IGB contains all frequent closed itemsets, and the support of a frequent itemset is equal to the support of the smallest frequent closed itemset which contains it. So, the support and the confidence of redundant rules can easily be determined. In other way, IGB is sound and lossless, and the inference mechanism is composed by three derivation axioms which are [11]: VIE MGB: A Visual Interactive Exploration 187 Association rules Support Conf idence 1 3 A =⇒ CT W 2 4 1 3 T =⇒ ACW 2 4 1 3 D =⇒ CW 2 4 2 2 ∅ =⇒ CD 3 3 2 2 ∅ =⇒ CT 3 3 2 2 ∅ =⇒ ACW 3 3 5 5 ∅ =⇒ CW 6 6 ∅ =⇒ C 1 1 Table 4. IGB extracted from the context extraction k for minsup = 12 and minconf = 2 3 . 1. Conditional Reflexivity: if X =⇒ Y ∈ IGB∧ X = ∅, then X =⇒ Y is a strong association rule. 2. Augmentation: if X =⇒ Y ∈ IGB, then X ∪ Z =⇒ Y − {Z} is a strong association rule, Z ⊂ Y . 3. Conditional Decomposition: if R : X =⇒ Y ∈ IGB∧ X = ∅, then X =⇒ Z is a strong association rule, Z ⊂ Y . Certainly, IGB is informative, therefore this property is satisfied by the presence of all frequent closed itemsets in the generic basis. So, one can imagine the num- ber of frequent closed itemsets extracted from real databases and consequently the number of generic rules that can be extracted. 4 MGB: A new Minimal Generic Basis In the previous section, we have studied the properties satisfied by each approach presented, and so, we deduce that, the problem of the construction of generic bases is to find a compromise among the compacity and the informativity of such basis. Then, we propose a new minimal generic basis of association rules that guarantees a partial informativity and which means that we can exactly determine the support and confidence of some derived rules, and for others, this approach allows to delimit this measures in an interval. MGB is defined as follows: Definition 8 Given 1. Lc : Iceberg Galois lattice decorated by minimal generators and their supports. 188 Chiraz Latiri Cherif, Wissem Bellegha, Sadok Ben Yahia, Ghada Guesmi 2. ci : frequent closed itemset. 3. S: set of immediate successors of a given concept c. 4. Gci : list of minimal generators of a concept ci .   R : g =⇒ ci − g | g ∈ Gci ∧ ci ∈ Lc ∧ conf (R) ≥ minconf MGB= support(s) ∧ s ∈ S | support(g) ≥ minconf 4.1 Algorithm for discovery MGB from frequent closed itemsets In this section, we present the algorithm that extracts the MGB basis directly from Iceberg Galois lattice decorated with its associated list of minimal genera- tors and their supports. It is assumed that for every frequent closed itemset we determine the set of its immediate successors and the list of generators satisfying a minconf constraint. Algorithm 1 MGB Input: Lc : Iceberg Galois lattice decorated by minimal generators and their supports. Output: MGB begin for each (ci ∈ F C | |ci | > 1) do Gci = {minimal generators of concept c | c ⊆ c∧ support(c i)   support(c ) ≥ minconf } for each generator g ∈ Gci do support(s) if (s ∈ Sci | support(g) ≥ minconf ) then MGB =MGB ∪R : g → ci − g   Gci = Gci − {g | g ⊆ g } else Gci = Gci − {g} end Example 5 Given the context extraction shown by table 1, we will illustrate in table 5 the MGB generic basis for minsup = 12 and minconf = 23 . As we have shown, MGB requires to determine the list of all immediate suc- cessors of a given concept c, thus, we will present how to construct this list. So, we will explore the Iceberg Galois lattice considering the set inclusion and the partial order defined between the different frequent closed itemsets. In fact, this process is composed of two steps, which are: 1. Generating the list of all successors of a given concept c. 2. Using this list, we determine the immediate successors of c. VIE MGB: A Visual Interactive Exploration 189 Association rules Support Conf idence 1 3 A =⇒ CT W 2 4 1 3 T =⇒ ACW 2 4 2 4 W =⇒ AC 3 5 2 2 C =⇒ AW 3 3 2 2 C =⇒ T 3 3 2 2 C =⇒ D 3 3 1 3 D =⇒ CW 2 4 Table 5. The generic basis MGB obtained from the context extraction k for minsup = 1 2 and minconf = 23 4.2 Deriving redundant association rules For deriving all strong redundant association rules, we propose to adopt two derivation axioms of the inference mechanism defined in [11]. 1. Augmentation: if X =⇒ Y ∈ MGB, then X ∪ Z =⇒ Y − {Z} is a strong association rule, Z ⊂ Y . 2. Conditional Decomposition: if R : X =⇒ Y ∈ MGB, then X =⇒ Z is a strong association rule, Z ⊂ Y . This inference mechanism has been proved in [11] to be sound and complete. Remark 2. It is important to mention that for deriving all approximate asso- ciation rules, the order of applying the inference axioms is important. Indeed, we have to apply first the Augmentation axiom and next apply the Conditional Decomposition axiom on the resulting set of association rules, i.e., set of approx- imate generic rules and those derived by the Augmentation axiom. In fact, MGB is not informative. The support and confidence of derived rules cannot be found. For example we cannot find the confidence of the rule AT =⇒ CW derived from A =⇒ CT W because the support of AT is unknown. Nevertheless, the non-informativity problem can be partially resolved by in- troducing a new concept denoted ”Partial informativity”. In fact, it means that we can exactly determine the support and confidence of some derived rules, and for others, this approach allows to delimit this measures in an interval. Proposition 1. Let R : X =⇒ Y ∈MGB, A and B two integers, if we apply the inference mechanism, we obtain rules of the following forms: 1. Augmentation: R1 : XZ =⇒ Y − Z 190 Chiraz Latiri Cherif, Wissem Bellegha, Sadok Ben Yahia, Ghada Guesmi A=B A=B Augmentation Augmentation R1.supp=R.supp R1.supp=R.supp R1.conf=R.supp/supp(XZ) R1.conf∈[R.conf,1] Conditional Decomposition Conditional Decomposition R2.supp=supp(XZ) B ≤ R2 .supp ≤ A R2.conf=supp(XZ)/supp(X) R2.conf∈ [R.conf,1] Table 6. Support an confidence of derived rules 2. Conditional Decomposition: R2 : X =⇒ Z (a) support(XZ) ≤ min {support(i1 ), support(i2 ), ..., support(ik ) | ii ⊂ XZ}, A, where ii is a frequent subset of XZ. (b) support(XZ) ≥ max{support(I1 ), support(I2 ), ..., support(Ik )}, B, where Ii is a frequent closed itemset containing XZ and existing in MGB. Remark 3. if A = B, then we have supp(XZ) = A. So, table 6 illustrates how we determine support and confidence of these rules. Proof. a: Since the support of a frequent itemset is less than the support of all its subsets, then we deduce that support(XZ) is less than the smallest support of all its subsets. b: By same, the support of a frequent itemset is higher than the support of all its supersets, so support(XZ) is higher than the support of the smallest frequent closed itemset which contains it and existing in MGB. c: If we apply the augmentation axiom, the generic rule’s base and the re- dundant rule’s base are similar, so support(R1 ) = support(R). d : If we apply the conditional decomposition axiom, then the derived rule’s base is XZ, and so support(R2 ) = support(XZ) Remark 4. Before deriving the redundant rules, we determine the support of all frequent itemsets which are the antecedent of generic rules. Example 6 Given the generic rule R : C =⇒ AW , supp(R) = 23 and conf (R) = 2 supp(R) 3 , so, supp(C) = conf (R) = 1. In table 7, we summarize the properties of the reviewed association rules representations. These ones contain only rules whose bases are frequent closed itemsets and whose antecedent are frequent generators. Let us consider the fol- lowing notations: VIE MGB: A Visual Interactive Exploration 191 Generic basis Intended derivable rules Infer. mechanism Lossless Sound Informative RBP han AR LA + D yes yes no IGB AR CR + A + CD yes yes yes (GB,IB) certainAR+approxAR A + CD yes yes yes scMGB AR A + CD yes yes no Table 7. Summary of Association Rules Representations Database Nature #transactions #items |f req − clos| |max − clos| Retail sparse 88162 16470 580 284 Mushrooms dense 8124 120 427 63 Connect dense 67557 130 810 98 Chess dense 3196 76 1194 71 Table 8. Dataset characteristics – AR: strong association rules – LA: left augmentation – D: decomposition – CR: conditional reflexivity – A: augmentation – CD: conditional decomposition 5 Visual implementation and experimental evaluation All experiments described below were performed on a 3,06GHz PC with 512MB of memory, running Windows XP. The algorithm MGB was coded in JAVA. Table 8 shows characteristics of real and synthetic datasets used in our eval- uation. Table 9 shows the experimental results. The last column consists of the number of all strong association rules extracted. 5.1 Experimental results We compare MGB’results with those concerning RBP han , IGB and (GB,IB) which are detailed in [11]. 192 Chiraz Latiri Cherif, Wissem Bellegha, Sadok Ben Yahia, Ghada Guesmi Database Minsup Minconf MGB RBP han IGB (GB,IB) AR Retail 0.5% 0.5% 435 284 580 1382 1382 1% 402 459 757 1334 1334 10% 232 214 427 770 770 50% 304 305 353 438 438 100% 0 0 0 0 0 Connect 95% 95% 635 97 809 25336 77816 96% 852 1003 2140 23780 73869 97% 1403 1178 2438 18470 60101 98% 1033 1404 2598 11717 41138 99% 1386 1667 2284 5250 19967 100% 682 682 682 682 2260 Mushrooms 30% 30% 332 63 427 7623 94894 50% 366 318 967 5761 79437 70% 364 429 966 4520 58010 90% 498 519 799 2159 24408 100% 557 558 558 557 8450 Chess 87% 87% 440 71 1194 31538 42740 89% 519 430 2423 29704 40451 91% 627 486 2655 26147 36098 93% 793 616 2766 21350 29866 95% 671 768 2754 14373 20312 100% 342 342 342 342 342 Table 9. Number of generic association rules VIE MGB: A Visual Interactive Exploration 193 – For some minconf values, MGB is the most reduced set of generic associa- tion rules: - In fact, concerning the representative basis RBP han , for each frequent closed itemset c, if support(c) ≥ minconf , then we have a factual rule of the form ∅ =⇒ c. - However, for MGB and for the same minconf value, if the set of generators of c is empty, the algorithm do not generate any rule. – If we consider the database Retail, one can observe that for minconf = 1 the number of association rules is equal to 0, so we can say that all the frequent closed itemsets have only one minimal generator which overlaps the associated closed itemset. – By setting minsup = minconf - For IGB, the number of association rules is equal to the number of frequent closed itemsets. In fact, the support of all the frequent closed itemsets is higher or equal than the minconf value. - The number of association rules in RBP han is equal to the number of maximal frequent closed itemsets, – For Mushrooms dataset we can observe that the bases IGB and RBP han have one more association rule than MGB and (GB,IB), because of the presence of the item ”85” in all transactions which produces a factual rule in IGB and RBP han of the form ∅ =⇒ 85. 5.2 VIE MGB: Visual Interactive Exploration of Minimal Generic Basis In this section, we present VIE MGB, a new framework for an interactive graphical visualization and exploration of MGB and the Iceberg Galois lattice. VIE MGB allows to display generic rules and to derive all strong redundant association rules with their supports and confidences. The database of inter- est has previously been mined, using an efficient algorithm for mining frequent closed itemsets and their minimal generators. Once these closed itemsets are mined, they are taken as input. In fact, we have two independent processes executed by VIE MGB, which are: 1. Generating MGB. 2. Constructing the Iceberg Galois lattice and to visualize redundant rules. The visual interactive framework for exploring and visualizing minimal generic basis is depicted in figure 2. 194 Chiraz Latiri Cherif, Wissem Bellegha, Sadok Ben Yahia, Ghada Guesmi To generate MGB As we have shown below, this tool takes as input a pre- viously stored file that contains all the frequent closed itemsets, their minimal generators and supports extracted from a given dataset, then the user must specify a minconf value, thus VIE MGB generates a text file which represents MGB containing all the generic association rules with their supports and confi- dences. Fig. 2. Interactive lattice and rule exploration In other way, from the closed item- sets, VIE MGB creates the Iceberg Galois lattice as shown in figure 2. Each closed itemset is represented by a node, and there is a link connecting two nodes if they are related by a set inclusion relation and there is no intermediate set between them. Once the lattice is displayed on the panel, one can generate the redundant association rules as follows: The user can click on a chosen node on the lattice, then VIE MGB dis- plays on the text area the generic rules corresponding to this closed itemset and all strong redundant association rules derived from the generic ones with their supports and confidences. VIE MGB: A Visual Interactive Exploration 195 Remark 5. When generating the redundant association rules, each frequent item- set is mapped to the smallest frequent closed itemset which contains it to deter- mine the support of the derived rule and its confidence. So, the problem of the informativity of the generic basis is completely resolved. 6 Conclusion In this paper, we have presented theoretical foundations of generic bases of asso- ciation rules and we have pointed out the properties satisfied by some approaches. We have proposed MGB which is a new minimal generic basis of associations rules and its associated inference mechanism. We have also introduced the con- cept of ”partial informativity”. This new approach is consolidated by a set of experiments using real and synthetic databases showing important benefits in terms of reduction in the number of association rules presented to the user. Finally, we have proposed a new framework denoted VIE MGB allowing a visual interactive exploration of MGB. References 1. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Intelligent structur- ing and reducing of association rules with formal concept analysis. In: Proceedings of KI’2001 conference, Lecture Notes in Artificial Intelligence 2174, Springer-verlag. (2001) 335–350 2. Ganter, B., Wille, R.: Formal Concept Analysis. Springer-Verlag, Heidelberg (1999) 3. Zaki, M., Phoophakdee, B.: MIRAGE: A framework for mining, exploring and visualizing minimal association rules. Rpi cs dept technical report (2003) 4. Barbut, M., Monjardet, B.: Ordre et classification. Algèbre et Combinatoire. Ha- chette, Tome II (1970) 5. Bastide, Y., Pasquier, N., Taouil, R., Lakhal, L., Stumme, G.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Proceedings of the International Conference DOOD’2000, Lecture Notes in Computer Sciences, Springer-verlag. (2000) 972–986 6. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between sets of items in large Databases. ACM SIGMOD Records (1993) 207–216 7. Kryszkiewicz, M.: Concise representations of association rules. In: Proceedings of Pattern Detection and Discovery, ESF Exploratory Workshop, London, UK. (2002) 92–109 196 Chiraz Latiri Cherif, Wissem Bellegha, Sadok Ben Yahia, Ghada Guesmi 8. Duquenne, J., Guigues, J.: Famille minimale d’implications informatives résultant d’un tableau de données binaires. Mathématiques et Sciences Humaines 95 (1986) 5–18 9. Luong, V.P.: Raisonnement sur les rgles d’association. In: In Proceedings 17me Journe Bases de Donnes Avances BDA’2001, Agadir(Maroc), Cpadus Edition. (2001) 299–310 10. Agrawal, R., Skirant, R.: Fast algorithms for mining association rules. In: Pro- ceedings of the 20th International Conference on Very Large Databases. (1994) 478–499 11. Gasmi, G., BenYahia, S., Nguifo, E.M., Slimani, Y.: A new informative generic base of association rules. In: In Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-05),Hanoi,Vietnam. (2005) 81–90