Boolean factors as a means of clustering of interestingness measures of association rules ? Radim Belohlavek1 , Dhouha Grissa2,4,5 , Sylvie Guillaume3,4 , Engelbert Mephu Nguifo2,4 , Jan Outrata1 1 Data Analysis and Modeling Lab Department of Computer Science , Palacky University, Olomouc 17. listopadu 12, CZ-77146 Olomouc, Czech Republic radim.belohlavek@acm.org,jan.outrata@upol.cz 2 Clermont Université, Université Blaise Pascal, LIMOS, BP 10448, F-63000 Clermont-Ferrand, France 3 Clermont Université, Université d’Auvergne, LIMOS, BP 10448, F-63000 Clermont-Ferrand, France 4 CNRS, UMR 6158, LIMOS, F-63173 Aubiére, France 5 URPAH, Département d’Informatique, Faculté des Sciences de Tunis, Campus Universitaire, 1060 Tunis, Tunisie dgrissa@isima.fr,guillaum@isima.fr,mephu@isima.fr Abstract. Measures of interestingness play a crucial role in association rule mining. An important methodological problem is to provide a rea- sonable classification of the measures. Several papers appeared on this topic. In this paper, we explore Boolean factor analysis, which uses formal concepts corresponding to classes of measures as factors, for the purpose of classification and compare the results to the previous approaches. 1 Introduction An important problem in extracting association rules, well known since the early stage of association rule mining [32], is the possibly huge number of rules ex- tracted from data. A general way of dealing with this problem is to define the concept of rule interestingness: only association rules that are considered inter- esting according to some measure are presented to the user. The most widely used measures of interestingness are based on the concept of support and con- fidence. However, the suitability of these measures to extract interesting rules was challenged by several studies, see e.g. [34]. Consequently, several other in- terestingness measures of association rules were proposed, see e.g. [35], [23], [12], [38]. With the many existing measures of interestingness arises the problem of selecting an appropriate one. ? We acknowledge support by the ESF project No. CZ.1.07/2.3.00/20.0059, the project is co-financed by the European Social Fund and the state budget of the Czech Re- public (R. Belohlavek); Grant No. 202/10/P360 of the Czech Science Foundation (J. Outrata); and by Grant No. 11G1417 of the French-Tunisian cooperation PHC Utique (D. Grissa). To understand better the behavior of various measures, several studies of the properties of measures of interestingness appeared, see e.g. [12], [27], [23], [16]. Those studies explore various properties of the measures that are considered important. For example, Vaillant et al. [37] evaluated twenty interestingness mea- sures according to eight properties. To facilitate the choice of the user-adapted interestingness measure, the authors applied the clustering methods on the de- cision matrix and obtained five clusters. Tan et al. [35] studied twenty-one in- terestingness measures through eight properties and showed that no measure is adapted to all cases. To select the best interestingness measure, they use both a support-based pruning and standardization methods. By applying a new cluster- ing approach, Huynh et al. [21] classifyed thirty-four interestingness measures with a correlation analysis. Geng and Hamilton [12] made a survey of thirty- eight interestingness measures for rules and summaries with eleven properties and gived strategies to select the appropriate measures. D. R. Feno [10] evalu- ated fifteen interestingness measures with thirteen properties to describe their behaviour. Delgato et al. [9] provided a new study of the interestingness measures by means of the logical model. In addition, the authors proposed and justified the addition of two new principles to the three proposed by Piatetsky-Shapiro [32]. Finally, Heravi and Zaiane [22] studied fifty-three objective measures for associative classification rules according to sixteen properties and explained that no single measure can be introduced as an obvious winner. The assessment of measures according to their properties results in a measure- property binary matrix. Two studies of this matrix were conducted. Namely, [17] describes how FCA can highlight interestingness measures with similar behav- ior in order to help the user during his choice. [16] and [14] attempted to find natural clusters of measures using widely used clustering methods, the agglomer- ative hierarchical method (AHC) and the K-means method. A common feature of these methods is that they only produce disjoint clusters of measures. On the other hand, one could naturally expect overlapping clusters. The aim of this paper is to explore the possibility of obtaining overlapping clusters of measures using factor analysis of binary data and to compare the results with the results of other studies. In particular, we use the recently developed method from [3] and take the discovered factors for clusters. The method uses formal concepts as factors that makes it possible to interpret the factors easily. 2 Preliminaries 2.1 Binary (Boolean) data Let X be a set of objects (such as a set of customers, a set of functions or the like) and Y be a set of attributes (such as a set of products that customers may buy, a set of properties of functions). The information about which objects have which attributes may formally be represented by a binary relation I between X and Y , i.e. I ⊆ X × Y , and may be visualized by a table (matrix) that contains 1s and 0s, according to whether the object corresponding to a row has the attribute corresponding to a column (for this we suppose some orders of objects and attributes are fixed). We denote the entries of such matrix by Ixy . A data of this type is called binary data (or Boolean data). The triplet hX, Y, Ii is called a formal context in FCA but other terms are used in other areas. Such type of data appears in two roles in our paper. First, association rules, whose interestingness measures we analyze, are certain dependencies over the binary data. Second, the information we have about the interestingness measures of association rules is in the form of binary data: the objects are interestingness measures and the attributes are their properties. 2.2 Association rules An association rule [36] over a set Y of attributes is a formula A⇒B (1) where A and B are sets of attributes from Y , i.e. A, B ⊆ Y . Let hX, Y, Ii be a formal context. A natural measure of interestingness of association rules is based on the notions of confidence and support. The confidence and support of an association rule A ⇒ B in hX, Y, Ii is defined by |A↓ ∩ B ↓ | |A↓ ∩ B ↓ | conf(A ⇒ B) = and supp(A ⇒ B) = , |A↓ | |X| where C ↓ for C ⊆ Y is defined by C ↓ = {x ∈ X | for each y ∈ C : hx, yi ∈ I}. An association rule is considered interesting if its confidence and support ex- ceed some user-specified thresholds. However, the support-confidence approach reveals some weaknesses. Often, this approach as well as algorithms based on it lead to the extraction of an exponential number of rules. Therefore, it is impos- sible to validate it by an expert. In addition, the disadvantage of the support is that sometimes many rules that are potentially interesting, have a lower support value and therefore can be eliminated by the pruning threshold minsupp. To ad- dress this problem, many other measures of interestingness have been proposed in the literature [13], mainly because they are effective for mining potentially interesting rules and capture some aspects of user interest. The most important of those measures are subject to our analysis and are surveyed in Section 3.1. Note that association rules are attributed to [1]. However, the concept of associ- ation rule itself as well as various measures of interestingness are particular cases of what is investigated in depth in [18], a book that develops logico-statistical foundations of the GUHA method [19]. 2.3 Factor analysis of binary (Boolean) data Let I be an n × m binary matrix. The aim in Boolean factor analysis is to find a decomposition I =A◦B (2) of I into an n × k binary matrix A and a k × m binary matrix B with ◦ denoting the Boolean product of matrices, i.e. k (A ◦ B)ij = max min(Ail , Blj ). l=1 The inner dimension, k, in the decomposition may be interpreted as the number of factors that may be used to describe the original data. Namely, Ail = 1 if and only if the lth factor applies to the ith object and Blj = 1 if and only if the jth attribute is one of the manifestations of the lth factor. The factor model behind (2) has therefore the following meaning: The object i has the attribute j if and only if there exists a factor l that applies to i and for which j is one of its particular manifestations. We refer to [3] for further information and references to papers that deal with the problem of factor analysis and decompositions of binary matrices. In [3], the following method for finding decompositions (2) with the number k of factors as small as possible has been presented. The method utilizes formal concepts of the formal context hX, Y, Ii as factors, where X = {1, . . . , n}, Y = {1, . . . , m} (objects and attributes correspond to the rows and columns of I). Let F = {hC1 , D1 i, . . . , hCk , Dk i} be a set of formal concepts of hX, Y, Ii, i.e. hCl , Dl i are elements of the concept lattice B(X, Y, I) [11]. Consider the n × k binary matrix AF and a k × m binary matrix BF defined by (AF )il = 1 iff i ∈ Cl and (BF )lj = 1 iff j ∈ Dl . (3) Denote by ρ(I) the smallest number k, so-called Schein rank of I, such that a decomposition of I exists with k factors. The following theorem shows that using formal concepts as factors as in (3) enables us to reach the Schein rank, i.e. is optimal [3]: Theorem 1. For every binary matrix I, there exists F ⊆ B(X, Y, I) such that I = AF ◦ BF and |F| = ρ(I). As has been demonstrated in [3], a useful feature of using formal concepts as factors is the fact that formal concepts may easily be interpreted. Namely, every factor, i.e. a formal concept hCl , Dl i, consists of a set Cl of objects (objects are measures of interestingness in our case) and a set Dl of attributes (properties of measures in our case). Cl contains just the objects to which all the attributes from Dl apply and Dl contains all attributes shared by all objects from Cl . From a clustering point of view, the factors hCl , Dl i may thus be seen as clusters Cl with their descriptions by attributes from Dl . The factors thus have a natural, easy to understand meaning. Since the problem of computing the smallest set of factors is NP-hard, a greedy approximation algorithm was proposed in [3, Algorithm 2]. This algorithm is utilized below in our paper. 3 Clustering interestingness measures using Boolean factors 3.1 Measures of interestingness In the following, we present the interestingness measures reported in the litera- ture and recall nineteen of their most important properties that were proposed in the literature. To identify interesting association rules and to enable the user to focus on what is interesting for him, about sixty interestingness measures [20], [35], [10] were proposed in the literature. All of them are defined using the following parameters: p(XY ), p(X̄Y ), p(X Ȳ ) and p(X̄ Ȳ ), where p(XY ) = nXY n represents the number of objects satisfying XY (the intersection of X and Y ), and X̄ is the negation of X. The following are important examples of interestingness measures: Lift [6]: Given a rule X → Y , lift is the ratio of the probability that X and Y occur together to the multiple of the two individual probabilities for X and Y , i.e., p(XY ) Lift(X → Y ) = p(X)×p(Y ). If this value is 1, then X and Y are independent. The higher this value, the more likely that the existence of X and Y together in a transaction is not just a random occurrence, but because of some relationship between them. Correlation coefficient [31]: Correlation is a symmetric measure evaluating the strength of the itemsets’ connection. It is defined by Correlation = √p(XY )−p(X)p(Y ) . p(X)p(Y )p(X̄)p(Ȳ ) A correlation around 0 indicates that X and Y are not correlated. The lower is its value, the more negatively correlated X and Y are. The higher is its value, the more positively correlated they are. Conviction [6]: Conviction is one of the measures that favor counter-examples. It is defined by Conviction = p(X)p( Ȳ ) p(X Ȳ ) Conviction which is not a symmetric measure, is used to quatify the deviation from independence. If its value is 1, then X and Y are independent. MGK [15]: MGK is an interesting measure, which allows the extraction of neg- ative rules. MGK = p(Y1−p(Y /X)−p(Y ) ) , if X favorise Y MGK = p(Y /X)−p(Y p(Y ) ) , if X defavorise Y It takes into account several situations of references: in the case where the rule is situated in the attractive zone (i.e. p(Y /X) > p(Y )), this measure evaluates the distance between independence and logical implication. Thus, the higher the value of MGK is close to 1, the more the rule is close to the logical implication and the higher the value of MGK is close to 0, the more the rule is close to the independence. In the case where the rule is located in the repulsive zone (i.e. p(Y /X) < p(Y )), MGK evaluates this time a distance between the independence and the incompatibility. Thus, the closer the value of MGK is to −1, the more similar to incompatibility the rule is; and the closer the value of MGK is to 0, the closer to the independence the rule is. As was mentioned above, several studies [35], [23], [25], [13] were reported in the literature on the various properties of interestingess measures to be able to characterize and evaluate the interestingness measures. The main goal of researchers in the domain is then to provide a user assistance in choosing the best interestingness measure meeting his needs. For that, formal properties have been developed [32], [24], [35], [12], [4] in order to evaluate the interestingness measures and to help users understanding their behavior. In the following, we present nineteen properties reported in the literature. 3.2 Properties of the measures Figure 1 lists 19 properties of interestingness measures. The properties are de- scribed in detail in [16]; we omit details due to lack of space. The authors in [14] proposed an evaluation of 61 interestingness measures according to the 19 properties (P3 to P21 ). Properties P1 and P2 were not taken into account in this study because of their subjective character. The measures and their properties result in a binary measure-property matrix that is used for clustering the measures according to their properties. The clustering performed in [14] using the agglomerative hierarchical method and the K-means method revealed 7 clusters of measures which will be used in the next section in a com- parison with the results obtained by Boolean factor analysis applied on the same measure-property matrix. 3.3 Clustering using Boolean factors The measure-property matrix describing interestingness measures by their prop- erties is depicted in Figure 2. It consists of 62 measures (61 measures from [14] plus one more that has been studied recently) described by 21 properties be- cause the three-valued property P14 is represented by three yes-no properties No. Property Ref. P1 Intelligibility or comprehensibility of measure [25] P2 Easiness to fix a threshold to the rule [23] P3 Asymmetric measure. [35], [23] P4 Asymmetric measure in the sense of the conclusion negation. [23], [35] P5 Measure assessing in the same way X → Y and Ȳ → X̄ in the logical [23] implication case. P6 Measure increasing function the number of examples or decreasing func- [32], [23] tion the number of counter-examples. P7 Measure increasing function the data size. [12], [35] P8 Measure decreasing function the consequent/antecedent size. [23], [32] P9 Fixed value a in the independence case. [23], [32] P10 Fixed value b in the logical implication case. [23] P11 Fixed value c in the equilibrium case. [5] P12 Identified values in the attraction case between X and Y . [32] P13 Identified values in the repulsion case between X and Y . [32] P14 Tolerance to the first counter-example. [23], [38] P15 Invariance in case of expansion of certain quantities. [35] P16 Desired relationship between X → Y and X̄ → Y rules. [35] P17 Desired relationship between X → Y and X → Ȳ antinomic rules. [35] P18 Desired relationship between X → Y and X̄ → Ȳ rules. [35] P19 Antecedent size is fixed or random. [23] P20 Descriptive or statistical measure. [23] P21 Discriminant measure. [23] Fig. 1. Interestingness measures properties. P14.1 , P14.2 , and P14.3 . We computed the decomposition of the matrix using Al- gorithm 2 from [3] and obtained 28 factors (as in the case below, several of them may be disregarded as not very important; we leave the details for a full version of this paper). In addition, we extended the original 62 × 21 binary matrix by adding for every property its negation, and obtained a 62×42 binary matrix. The reason for adding negated properties is due to our goal to compare the results with the two clustering methods mentioned above and the particular role of the properties and their negations in these clustering methods. From the 62 × 42 matrix, we obtained 38 factors, denoted F1 , . . . , F38 . The factors are presented in Figures 3 and 4. Figure 3 depicts the object-factor matrix describing the in- terestingness measures by factors, Figure 4 depicts the factor-property matrix explaining factors by properties of measures. Factors are sorted from the most important to the least important, where the importance is determined by the number of 1s in the input measure-property matrix covered by the factor [3]. The first factors cover a large part of the matrix, while the last ones cover only a small part and may thus be omitted [3], see the graph of cumulative cover of the matrix by the factors in Figure 5. 4 Interpretation and comparison to other approaches The aim of this section is to provide an interpretation of the results described in the previous section and compare them to the results already reported in the literature, focusing mainly on [14]. As was described in the previous section, 38 factors were obtained. The first 21 of them cover 94 % of the input measure- property matrix (1s in the matrix), the first nine cover 72 %, and the first five P14.1 P14.2 P14.3 P10 P11 P12 P13 P15 P16 P17 P18 P19 P20 P21 P3 P4 P5 P6 P7 P8 P9 correlation 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 Cohen 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 0 confidence 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 causal confidence 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 Pavillon 1 1 0 1 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 Ganascia 1 1 1 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 0 causal confirmation 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 descriptive confirmation 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 conviction 1 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 cosine 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 coverage 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 dependency 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 causal dependency 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 weighted dependency 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 Bayes factor 1 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 0 0 1 0 1 Loevinger 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 1 1 0 collective strength 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 0 1 Fukuda 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 information gain 0 1 0 1 1 1 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 Goodman 0 1 1 1 1 0 1 1 0 1 1 0 0 1 1 1 0 0 1 1 0 implication index 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 IPEE 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 IP3E 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 1 0 0 PDI 0 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 II 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 0 0 0 EII 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 REII 1 1 1 1 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 0 0 likelihood index 0 1 1 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1 0 0 0 interest 0 1 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 Jaccard 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 Jmeasure 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 Klosgen 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 Laplace 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 Mgk 1 1 1 1 1 0 1 1 0 1 1 0 0 0 1 0 0 0 1 1 0 least contradiction 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 Pearl 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 Piatetsky-Shapiro 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 1 1 1 0 precision 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 prevalence 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 YuleQ 0 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 1 0 0 recall 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 Gini 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 relative risk 1 1 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 Sebag 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 support 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 one way support 1 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 two way support 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 examples and counter-examples rate 1 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 VT100 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 1 1 0 variation support 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 1 YuleY 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 1 0 0 1 0 1 Zhang 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 causal confirmed confidence 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 Czekanowski 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 negative reliability 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 mutual information 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 Kulczynski 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 Leverage 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 novelty 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 odds ratio 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 0 1 specificity 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 causal support 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 Fig. 2. Input binary matrix describing interestingness measures by their properties. F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 F30 F31 F32 F33 F34 F35 F36 F37 F38 F1 F2 F3 F4 F5 F6 F7 F8 F9 correlation 10000100100010000000101001000001000010 Cohen 10000100000010010000101000000001000010 confidence 01010000000100010000000000000100001000 causal confidence 01000100010100010000000000100100001000 Pavillon 10000100010000000001100001000000000000 Ganascia 01010000000100000000000001000100001000 causal confirmation 01000100011000010000000000100100000000 descriptive confirmation 01010000000001000001000001000100000000 conviction 10000010000000110000100000000000100001 cosine 01001000001011000001000000000001000010 coverage 00100000010001000000000110000100000000 dependency 00100000010001000001000100010000000000 causal dependency 01001100011000000001000000100100000000 weighted dependency 00100000001000000101000000100100000001 Bayes factor 10001000000000100010100000000000100001 Loevinger 10000100010100010000100000000000001000 collective strength 10000010000010010000101000000000100011 Fukuda 00000000011000000101000000001100010000 information gain 10000000000010000001110000000000000011 Goodman 10000000100100000100001001000001001010 implication index 00100000010000010100000000011000000000 IPEE 00010001000000000000010010001100010100 IP3E 00010001001000010000010000001100010100 PDI 00000001001010000010000000001000010110 II 00000001000000100010100010000000010100 EII 00000001001000010100010000100100010100 REII 00000001000000110100010000000000010100 likelihood index 00000001000010000000110010000000010110 interest 10001100000010000001100000000001000010 Jaccard 00001010001011000001000000000000000011 Jmeasure 00100010000000000001000100010000100001 Klosgen 10000010000000100101000000010000100001 Laplace 01010000001000010000000000000100000000 Mgk 10000000010100000100000001000000001000 least contradiction 01010000001001000001000000000100000000 Pearl 00100000000000011000001000010001000010 Piatetsky-Shapiro 00000100100010000000101000000001010010 precision 01000100001010010000001000100001000010 prevalence 00100000010001000100000010000100000000 YuleQ 10000000100100000110000000000010001011 recall 01001000011001000001000000000100000000 Gini 00100010001001000000001100000100000001 relative risk 10001010000000100001100000000000100001 Sebag 00010010001001000001000000000100000001 support 01000000001001000101000000000001000010 one way support 10001010000000100001000000010000100001 two way support 10001010000000000001000000010000100011 examples and counter-examples rate 00010000000100000000010010000100001001 VT100 00000100100010000000000001100001000110 variation support 00100010000000011000001000010000100011 YuleY 10000000100000000110001000000000101011 Zhang 10000000000100100110000000000010001001 causal confirmed confidence 01000100010100010000000000100100001000 Czekanowski 01001000001011000001000000000001000010 negative reliability 01000100010100010000000000100100001000 mutual information 00100010001001000000001100000100000001 Kulczynski 00001010001011000001000000000000000011 Leverage 01001100011000000001000000100100000000 novelty 10000100100010000000101001000001000010 odds ratio 10000010000010010000101000000000100011 specificity 01001100011000000001000000100100000000 causal support 01000100001010010000001000100001000010 Fig. 3. Interestingness measures described by factors obtained by decomposition of the input matrix from Figure 2 extended by negated properties. P14.1 P14.2 P14.3 P14.1 P14.2 P14.3 P10 P11 P12 P13 P15 P16 P17 P18 P19 P20 P21 P10 P11 P12 P13 P15 P16 P17 P18 P19 P20 P21 P3 P4 P5 P6 P7 P8 P9 P3 P4 P5 P6 P7 P8 P9 F1 010010100110000000100 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 F2 010100000000000000110 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 F3 000000000000000000000 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 0 1 0 0 0 0 F4 110100001000000000000 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 1 0 0 0 0 0 F5 010001000000000000100 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1 1 1 0 0 0 F6 010111000000000000110 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 F7 000000000000000000101 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 0 F8 011100000001000011000 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1 1 F9 011110000000011100100 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 F10 100000000000000000010 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1 F11 000000000000000000100 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 F12 011100010000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 F13 010101000000000000000 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 F14 000000000000000000000 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 0 F15 110010100110000000000 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0 F16 001000000000000000100 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 F17 001000100100000100100 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 F18 010000000000000000000 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 F19 010100000000100000000 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 F20 000000000000000000100 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 F21 010111100110000000000 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 F22 010100000001000000000 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 1 F23 000000000000000100100 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 F24 100000000000000000000 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 F25 000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1 F26 010100000000001000110 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 F27 010010000000000000100 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 1 F28 000000100000000000100 0 0 0 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 0 0 F29 010000000000000001000 0 0 0 0 1 0 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 F30 100000000000000000000 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 0 F31 011110110111101000100 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 1 F32 000000000000000000110 1 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 F33 000000100100000000101 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 F34 010100000000000001000 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 F35 011100010000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 F36 011100000000000010000 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 F37 000000000000000000000 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 F38 000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 Fig. 4. Factors obtained by decomposition of the input matrix from Figure 2 extended by negated properties. The factors are described in terms of the original and negated properties. 100 90 cumulative cover (%) 80 70 60 50 40 30 20 10 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 number of factors Fig. 5. Cumulative cover of input matrix from Figure 2 extended by negated properties by factors obtained by decomposition of the matrix. Fig. 6. Venn diagram of the first five factors (plus the eighth and part of the sixth and tenth to cover the whole set of measures) obtained by decomposition of the input matrix from Figure 2 extended by negated properties. cover 52.4 %. Another remark is that the first ten factors cover the whole set of measures. Note first that the Boolean factors represent overlapping clusters, contrary to the clustering using the agglomerative hierarchical method and the K-means method performed in [14]. Namely, the clusterings are depicted in Figure 6 de- scribing the Venn diagram of the first five Boolean factors (plus the eighth and part of the sixth and tenth to cover the whole set of measures) and Figure 7, which is borrowed from [14], describing the consensus on the classification ob- tained by the hierarchical and K-means clusterings. This consensus refunds the classes C1 to C7 of the extracted measures, which are common to both tech- niques. Fig. 7. Classes of measures obtained by the hierarchical and K-means clusterings. Due to lack of space, we focus on the first four factors since they cover nearly half of the matrix (45.1 %), and also because most of the measures appear at least once in the four factors. Factor 1. The first factor F1 applies to 20 measures, see Figure 3, namely: correlation, Cohen, Pavillon, conviction, Bayes factor, Loevinger, collective strength, information gain, Goodman, interest, Klosgen, Mgk, YuleQ, relative risk, one way support, two way support, YuleY, Zhang, novelty, and odds ratio. These measures share the following 9 properties: P4, P7, P9, not P11, P12, P13, not P19, not P20, P21, see Figure 4. Interpretation. The factor applies to measures whose evolutionary curve in- creases w.r.t the number of examples and have a fixed point in the case of independence (this allows to identify the attractive and repulsive area of a rule). The factor also applies only to descriptive and discriminant measures that are not based on a probabilistic model. Comparison. When looking at the classification results reported in [14], F1 covers two classes from [14]: C6 and C7 , which together contain 15 measures. Those classes are closely related within the dendrogram obtained with the ag- glomerative hierarchical clustering method used in [14]. The 5 missing measures form a class obtained with K-means method in [14] with Euclidian distance. Factor 2. F2 applies to 18 measures, namely: confidence, causal confidence, Ganascia, causal confirmation, descriptive confirmation, cosine, causal depen- dency, Laplace, least contradiction, precision, recall, support, causal confirmed confidence, Czekanowski, negative reliability, Leverage, specificity, and causal support. These measures share the following 11 properties: P4, P6, not P9, not P12, not P13, P14.2, not P15, not P16, not P19, not P20, P21. Interpretation. The factor applies to measures whose evolutionary curve in- creases w.r.t. the number of examples and has a variable point in the case of independence, which implies that the attractive and repulsive areas of a rule are not identifiable. The factor also applies only to measures that are not dis- criminant, are indifferent to the first counter-examples, and are not based on a probabilistic model. Comparison. F2 corresponds to two classes, C4 and C5 reported in [14]. C4 ∪ C5 contains 22 measures. The missing measures are: Jaccard, Kulczyn- ski, examples and counter-examples rate and Sebag. Those measures are not covered by F2 since they are not indifferent to the first counter-examples. Factor 3. F3 applies to 10 measures, namely: coverage, dependency, weighted dependency, implication index, Jmeasure, Pearl, prevalence, Gini, variation sup- port, and mutual information. These measures share the following 10 properties: not P6, not P8, not P10, not P11, not P13, not P14.1, not P15, not P16, not P17, not P19. Interpretation. The factor applies to measures whose evolutionary curve does not increase w.r.t. the number of examples. Comparison. F3 corresponds to class C3 reported in [14], which contains 8 measures. The two missing measures, variation support and Pearl, belong to the same classes obtained by both K-means and the hierarchical method. Moreover, these two missing measures are similar to those from C3 obtained by the hierarchical method since they merge with the measures in C3 at the next level of the generated dendrogram. Here, there is a strong correspondence between results obtained using Boolean factors and the ones reported in [14]. Factor 4. F4 applies to 9 measures, namely: confidence, Ganascia, descriptive confirmation, IPEE, IP3E, Laplace, least contradiction, Sebag, and examples and counter-examples rate. These measures share the following 12 properties: P3, P4, P6, P11, not P7, not P8, not P9, not P12, not P13, not P15, not P16, not P18. Interpretation. The factor applies to measures whose evolutionary curve in- creases w.r.t. the number of examples and has a fixed value in the equilibrium case. As there is no fixed value in the independence case, we can not get an identifiable area in the case of attraction or repulsion. Comparison. F4 mainly applies to measures of class C5 obtained in [14]. The two missing measures, IPEE et IP3E, belong to a different class. 5 Conclusions and further issues We demonstrated that Boolean factors provide us with clearly interpretable meaningful clusters of measures among which the first ones are highly similar to other clusters of measures reported in the literature. Contrary to other clus- tering methods, Boolean factors represent overlapping clusters. We consider this an advantage because overlapping clusters are a natural phenomenon in human classification. We presented preliminary results on clustering the measures using Boolean factors. Due to limited scope, we presented only parts of the results obtained and leave other results for a full version of this paper. An interesting feature of the presented method, to be explored in the future, is that the method need not start from scratch. Rather, one or more clusters, that are considered important classes of measures, may be supplied at the start and the method may be asked to complete the clustering. Another issue left for future research is the benefit of the clustering of measures for a user who is interested in selecting a type of measure, rather than a particular measure of interestingness of association rules. In the intended scenario, a user may use various interestingness measures that belong to different classes of measures. References 1. Agrawal R., Imielinski T., Swami A.: Mining association rules between sets of items in large databases. Proc. ACM SIGMOD 1993, 207–216. 2. Agrawal R., Srikant R.: Fast algorithms for mining association rules. Proc. VLDB Conf. 1994, 478–499. 3. Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. of Computer and System Sciences 76(1)(2010), 3–20. 4. Blanchard J., Guillet F., Briand H., Gras R.: Assessing rule with a probabilistic measure of deviation from equilbrium. In Proc. Of 11th International Symposium on Applied Stochastic Models and Data Analysis ASMDA 2005, Brest, France, 191–200. 5. Blanchard J., Guillet F., Briand H., Gras R.: IPEE: Indice Probabiliste d’Écart à l’Équilibre pour l’évaluation de la qualité des règles. Dans l’Atelier Qualité des Données et des Connaissances 2005, 26–34. 6. Brin S., Motwani R., Silverstein C.: Beyond Market Baskets: Generalizing Associ- ation Rules to Correlations. In Proc. of the ACM SIGMOD Conference, Tucson, Arizona, 1997, 265–276. 7. Carpineto C., Romano G.: Concept Data Analysis. Theory and Applications. J. Wi- ley, 2004. 8. Davey B. A., Priestley H.: Introduction to Lattices and Order. Cambridge Univer- sity Press, Oxford, 1990. 9. Delgado M., Ruiz D.-L., Sanchez D.: Studying Interest measures for association rules through a logical model. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 18(1)(2010), World Scientific, 87–106. 10. Feno D.R.: Mesures de qualité des règles d’association: normalisation et car- actérisation des bases. PhD thesis, Université de La Réunion, 2007. 11. Ganter B., Wille R.: Formal Concept Analysis. Mathematical Foundations. Springer, Berlin, 1999. 12. Geng L., Hamilton H.J.: Choosing the Right Lens: Finding What is Interesting in Data Mining. Quality Measures in Data Mining 2007, ISBN 978-3-540-44911-9, 3–24. 13. Geng L., Hamilton H. J.: Interestingness measures for data mining: A Survey. ACM Comput. Surveys 38(3)(2006), 1–31. 14. Guillaume S., Grissa D., Mephu Nguifo E.: Catégorisation des mesures d’intérêt pour l’extraction des connaissances. Revue des Nouvelles Technologies de l’Information, 2011, to appear (previously available as Technical Report RR-10-14, LIMOS, ISIMA, 2010). 15. Guillaume S.: Traitement des données volumineuses. Mesures et algorithmes d’extraction des règles d’association et règles ordinales. PhD thesis. Université de Nantes, France, 2000. 16. Guillaume S., Grissa D., Mephu Nguifo E.: Propriétés des mesures d’intérêt pour l’extraction des règles. Dans l’Atelier Qualité des Données et des Connaissances, EGC’2010, 2010, Hammamet-Tunisie, http://qdc2010.lri.fr/fr/actes.php, 15–28. 17. Grissa D., Guillaume S., Mephu Nguifo E.: Combining Clustering techniques and Formal Concept Analysis to characterize Interestingness Measures. CoRR abs/1008.3629, 2010. 18. Hájek P., Havránek T.: Mechanizing Hypotheses Formation. Springer, 1978. 19. Hájek P., Holeňa, Rauch J.: The GUHA method and its meaning for data mining. J. Computer and System Sciences 76(2010), 34–48. 20. Hilderman R. J., Hamilton H. J.: Knowledge Discovery and Measures of Interest, Volume 638 of The International Series in Engineering and Computer Science 81(2)(2001), Kluwer. 21. Huynh X.-H., Guillet F., Briand H.: Clustering Interestingness Measures with Pos- itive Correaltion. ICEIS (2) (2005), 248–253. 22. Heravi M. J., Zaı̈ane O. R.: A study on interestingness measures for associative classifiers. SAC (2010), 1039–1046. 23. Lallich S., Teytaud, O.: Évaluation et validation de mesures d’intérêt des règles d’association. RNTI-E-1, numéro spécial 2004, 193–217. 24. Lenca P, Meyer P., Picouet P., Vaillant B., Lallich S.: Critères d’évaluation des mesures de qualité en ecd. Revue des Nouvelles Technologies de l’Information (En- treposage et Fouille de données) (1)(2003), 123–134. 25. Lenca P., Meyer P., Vaillant B., Lallich, S.: A multicriteria decision aid for interest- ingness measure selection. Technical Report LUSSI-TR-2004-01-EN, Dpt. LUSSI, ENST Bretagne 2004 (chapter 1). 26. Liu J., Mi J.-S.: A novel approach to attribute reduction in formal concept lattices. RSKT 2006, Lecture Notes in Artificial Intelligence 4062 (2006), 522–529. 27. Maddouri M., Gammoudi J.: On Semantic Properties of Interestingness Measures for Extracting Rules from Data. Lecture Notes in Computer Science 4431 (2007), 148–158. 28. Maier D.: The Theory of Relational Databases. Computer Science Press, Rockville, 1983. 29. Pawlak Z.: Rough sets. Int. J. Information and Computer Sciences 11(5)(1982), 341–356. 30. Pawlak Z.: Rough Sets: Theoretical Aspcets of Reasoning About Data. Kluwer, Dor- drecht, 1991. 31. Pearson K.: Mathematical contributions to the theory of evolution, regression, heredity and panmixia. Philosophical Trans. of the Royal Society A (1896). 32. Piatetsky-Shapiro G.: Discovery, Analysis and Presentation of Strong Rules. In G. Piatetsky-Shapiro & W.J. Frawley, editors: Knowledge Discovery in Databases. AAAI Press, 1991, 229–248. 33. Polkowski L.: Rough Sets: Mathematical Foundations. Springer, 2002. 34. Sese J., Morishita S.: Answering the most correlated n association rules efficiently. In Proceedings of the 6th European Conf on Principles of Data Mining and Knowl- edge Discovery 2002, Springer-Verlag, 410–422. 35. Tan P.-N., Kumar V., Srivastava J.: Selecting the right objective measure for as- sociation analysis. Information Systems 29(4)(2004), 293–313. 36. Tan P.-N., Steinbach M., Kumar V.: Introduction to Data Mining. Addison-Wesley, 2005. 37. Vaillant B., Lenca P., Lallich S.: A Clustering of Interestingness Measures. DS’04, the 7th International Conference on Discovery Science LNAI 3245 (2004), 290– 297. 38. Vaillant B.: Mesurer la qualité des règles d’association: études formelles et expérimentales. PhD thesis, ENST Bretagne, 2006. 39. Wang X., Ma J.: A novel approach to attribute reduction in concept lattices. RSKT 2006, Lecture Notes in Artificial Intelligence 4062 (2006), 522–529. 40. Wille R.: Restructuring lattice theory: an approach based on hierarchies of con- cepts. In: Rival I.: Ordered Sets. Reidel, Dordrecht, Boston, 1982, 445–470. 41. Zhang W.-X., Wie L., Qi J.-J.: Attribute reduction in concept lattices based on discernibility matrix. RSFDGrC 2005, Lecture Notes in Artificial Intelligence 3642 (2005), 157–165.