A CbO-based Algorithm for Mining Class Relevant Patterns Hirohisa Seki and Taiki Yamada Dept. of Computer Science, Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan seki@nitech.ac.jp, 28114135@stn.nitech.ac.jp Abstract. In this paper, we consider the problem of mining class rele- vant patterns from a given context, where each object in the context has a label which is either “positive” (i.e., target-class) or “negative” (i.e., non target-class). Garriga et al. studied the notion of relevancy in FCA (Formal Concept Analysis). Based on their work, we propose a CbO- based algorithm which, while traversing the search space of closed sets on the positives, performs pruning based on a relevance check; it checks whether or not a pattern (concept) is dominated by another pattern. The pruning method performs two kinds of checks; dominance check between a node and its children nodes, and dominance check between a node and its sibling nodes in a CbO-based search tree. Although extra costs are required for performing the latter check, our experimentally results show that the proposed approach has outperformed the conventional methods in the literature. 1 Introduction Concise representation is essential in a pattern mining task to handle large amounts of patterns generated during the mining process. Closed patterns and closed itemsets have been extensively studied in the field of FCA (Formal Con- cept Analysis) [2] and data mining (for example, [14,15]) to reduce the number of patterns without losing their information. Lavrač et al. proposed the theory of relevance [10,11], where each object in a given context has a label which is either “positive” (i.e., target-class) or “neg- ative” (i.e., non target-class), and patterns take the form of sets of attributes. A pattern is then considered to be more relevant than (or dominating) another pattern if it covers at least all positives (i.e., target-class objects) covered by the irrelevant (or dominated) pattern, but no additional negative. Garriga et al. [4] have studied the notion of relevancy in terms of closed patterns, and relevant patterns are shown to be useful for many classification tasks. Several methods have been proposed to generate relevant patterns, using the notion of closed patterns [4,12,5]. In particular, the approach in [4] first generates the set of all closed patterns, and then removes from them those irrelevant ones by checking a certain pruning condition. In [5], Grosskreutz first studied an algorithm which is polynomial time in the size of the input and output, and Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Published in Francisco J. Valverde-Albacete, Martin Trnecka (Eds.): Proceedings of the 15th International Conference on Concept Lattices and Their Applications, CLA 2020, pp. 223–234, 2020. 224 Hirohisa Seki and Taiki Yamada also proposed a memory-efficient divide-and-conquer algorithm which visits a superset of relevant patterns. In the divide-and-conquer algorithm, a pruning criterion based on the dominance check in terms of negative supports (supp − - check for short) has been introduced. In this paper, we propose a CbO-based algorithm [8] which, while traversing the search space of closed sets on the positives, performs pruning using two types of the dominance check between patterns: the supp − -check and another check, the siblings-check for short, between sibling nodes in a CbO-based search tree. Our experimental results on some datasets show the effectiveness of our algorithm; they show that the newly introduced dominance check works well for reducing irrelevant patterns. The organization of the rest of this paper is as follows. We first summarize some basic notations and definitions of relevant pattern mining in Sect. 2. We then explain our approach to mining relevant patterns from a formal context in Sect. 3, and show some experimental results in Sect. 4. Finally, we give a summary of this work in Sect. 5. 2 Mining Relevant Patterns 2.1 Preliminaries We use some basic notions of FCA in [2,3]. In FCA, we consider a set G of objects, a set M of attributes and a relation I ⊆ G × M , such that (g, m) ∈ I if and only if object g has the attribute m. We call such a triple K = (G, M, I) a formal context. By A0 we mean the set of attributes shared by a subset A of objects, and 0 B the set of objects sharing a subset B of attributes. A concept (A, B) is a maximal objects-attributes correspondence, satisfying A0 = B and B 0 = A. A is called the extent of the concept, and B its intent. Two formal concepts (A1 , B1 ) and (A2 , B2 ) are ordered by (A1 , B1 ) ≥ (A2 , B2 ) ←→ A1 ⊇ A2 , and form a complete lattice. We assume that each object has a class label in {+, −}. Each label is not in M . The set G of all objects are divided into two subsets: the set G+ of those objects that are labeled by +, the positives (or the positive examples), and the set G− of those objects that are labeled by −, the negatives (or the negative examples). A pattern in K = (G, M, I) is a subset of the attribute set M . An object g satisfies the pattern P if g contains all attributes in P . The occurrence set of P , denoted by occ(G, P ), is the set of those objects in G which satisfy P . We simply write it by occ(P ), when G is obvious from the context. The negative occurrence set of the pattern P , denoted by occ − (P ), is the set of those objects in G− which satisfy P , i.e., occ − (P ) = occ(G− , P ). The positive occurrence set of the pattern P , denoted by occ + (P ), is defined dually, i.e., occ + (P ) = occ(G+ , P ). A CbO-based Algorithm for Mining Class Relevant Patterns 225 The support of P , denoted by supp(G, P ), is the size of occ(G, P ). We simply write it by supp(P ), when G is obvious from the context. The negative support (positive support) of the pattern P , denoted by supp − (P ) (supp + (P )), is the size of occ − (P ) (occ + (P )), respectively. The notion of a closed pattern is defined as usual: a pattern P is closed when there is no other pattern Q such that P ( Q and supp(G, P ) = supp(G, Q). Closed patterns can be defined in terms of the following closure operator: ΓG (P ) = {i ∈ M | ∀o ∈ occ(G, P ) contains attribute i.}, i.e., the closure ΓG (P ) of a pattern P includes all attributes that are present in all objects in G which contain all attributes in P . 2.2 Relevant Patterns Definition 1 (relevant pattern). A pattern P is more relevant than (or dominates) a pattern Q in G iff – occ + (P ) ⊇ occ + (Q), – occ − (P ) ⊆ occ − (Q). – Γ (P ) 6= Q.1 We call a pattern P relevant if there is no other pattern Q more relevant than (or dominating) P . 2 The following theorem by Garriga et al. [4] gives a characterization of relevant patterns in terms of the following closure operator: Γ + (P ) = {i ∈ M | ∀o ∈ occ(G+ , P ) contains attribute i.} Γ + (·) is called the closure on the positives, and a pattern P is closed on the positives if Γ + (P ) = P . Theorem 1 (Garriga et al. [4]). Let R be the set of relevant patterns in a given context. Then, a relevant pattern P satisfies the following conditions: – P is closed on the positives, and – there exists no generalization P0 ⊂ P such that P0 ∈ R and supp − (P0 ) = supp − (P ). 2 Example 1. Figure 1 shows a small dataset and concepts generated from it. Among them, those concepts shown in a rectangle have relevant patterns with their intents; the relevant patterns are ∅, 0, 1 and 2. We often simply write a pattern by the concatenation of its attributes instead of a set notation, i.e., 0 instead of {0}, for example. Similarly for the extent of a concept. We note that each concept is also closed on the positives. In particular, supp − (0) = supp − (01) = ∅, and pattern 0 is more relevant than (or dominates) pattern 01. 2 1 This condition is due to [5]. It ensures that the members of an equivalence class do not dominate each other circularly. 226 Hirohisa Seki and Taiki Yamada Fig. 1: Example of Closed/Relevant Patterns: Relevant patterns are shown in bold font. 3 Mining Relevant Patterns In this section, we describe our algorithm for mining relevant patterns. We also explain its application for association rule mining in Section 3.2. 3.1 The Algorithm Algorithm 1 shows the outline of our algorithm for mining relevant patterns from a formal context. It is based on the Close-by-One (CbO) algorithm by Kuznetsov [8]. We use a version of CbO given in [6,13], which uses a recursive procedure searching for all formal concepts in a depth-first manner. Let hG, M, Ii be a given formal context, where G = {0, 1, . . . , m − 1} and M = {0, 1, . . . , n − 1} for some m, n ≥ 0. Given a concept hA, Bi, y ∈ M and a set of attributes Im , function RS-GenerateFrom in Algorithm 1 recursively traverses the search space of closed sets on the positives, a superset of all relevant patterns, which are obtained by adding j ∈ M to B such that j > y. Im is used for a bookkeeping purpose to record those attributes not used during the process of deriving concepts. The differences of the function with CbO are threefold: (i) we use the closure function Γ + (line 10), (ii) we employ the pruning methods based on the domi- nance check (line 14), and (iii) we perform a modified version of the canonicity A CbO-based Algorithm for Mining Class Relevant Patterns 227 test (or the prefix-preserving test) using Im (line 13). We call it the Im -canonicity test for short. For the pruning methods, we perform two kinds of the dominance check; the one is the dominance check between the node hA, Bi and its child node Nj = hC, Di, where D = Γ + (B ∪ {j}) (line 10), and the other is the dominance check between Nj and its sibling nodes Ni = h(Γ + (B ∪ {i}))0 , Γ + (B ∪ {i})i for some i > j and i ∈ D \B. We refer to the former check as the supp − -check , while the latter one is referred to as the siblings-check for short. If D is dominated either by B or the intent of Ni , we skip the search tree rooted at Nj , and we record attribute j as “marked” and add it to Im . We use the set Im of marked attributes in the Im -canonicity test (line 13). Since the search tree rooted at a node constructed with each of attributes in Im is pruned, we ignore those attributes in the canonicity test so that we check whether or not (Bj)\Im 6= ∅, where Bj = (Γ + (B∪{j})\B)∩{0, 1, . . . , j−1}.2 Algorithm 1: Generate RelSets Input: a formal context, minsup: a minimun support threshold Output: RelSet: the set of relevant patterns 1 RelSet := ∅; 2 call RS-GenerateFrom(h(G+ )00 , (G+ )0 i, 0, ∅); 3 Remove irrelevant patterns in RelSet 4 return RelSet 5 Function RS-GenerateFrom(hA, Bi, y, Im ) is input: a concept hA, Bi, an attribute y, a set of attributes Im . 6 add B to RelSet 7 if y > |M | then return 8 for j from y upto |M | do 9 if j ∈ / B then 10 D ← Γ + (B ∪ {j}) 11 C ← D0 12 if supp(C) < minsup then continue 13 if (B  j) \ Im 6= ∅ then continue 14 if supp − (D) = supp − (B) or D is dominated by Γ + (B ∪ {i}) for ∃i > j such that i ∈ D \ B then 15 mark j as “pruned” 16 Im ← Im ∪ {j} 17 continue 18 else 19 Im1 ← Im 20 call RS-GenerateFrom(hC, Di, j + 1, Im1 ) 2 This notation is due to [7]. 228 Hirohisa Seki and Taiki Yamada Fig. 2: Example of the Topdown Search of Algorithm 1. Example 2. Figure 2 shows a dataset and the computation of Algorithm 1 for it. We can represent the computation of function RS-GenerateFrom in Algorithm 1 by a tree; each node represents an invocation of RS-GenerateFrom, and each edge in the tree is labeled by the current value of j which is used to compute a (new) concept (line 8). Leaf nodes denoted by black squares represent computed concepts for which the Im -canonicity test in line 13 fails. Leaf nodes denoted by ⊥1 (⊥2 ) represent computed concepts for which the supp − -check (the siblings- check) in line 14 fails, respectively. Relevant patterns are ∅, 0 and 03. 2 As in Figure 2, since the computation of function RS-GenerateFrom in Al- gorithm 1 can be represented by a tree, we call such a tree a cPos (search) tree. We denote by N0 `T Nj if N0 has a descendant node Nj in a cPos tree T . We also write by N0 `T,I Nj if N0 has a descendant node Nj in a cPos tree T with the I-canonicity test. We denote simply by B0 `T Bj for short, where B0 (Bj ) is the set of attributes of N0 (Nj ), respectively. Similarly for N0 `T,I Nj . Lemma 1. Let N0 = hA, Bi be a node in a cPos search tree T . Suppose that N0 `T Nj = hAj , Bj i, where Nj = h(Γ + (B ∪ {j}))0 , Γ + (B ∪ {j})i (j ∈ M ) and A CbO-based Algorithm for Mining Class Relevant Patterns 229 Fig. 3: The CbO-based Search Tree in Lemma 1. Nj `T Nd = hAd , Bd i for some node Nd in T . Consider a subset Bd1 ⊆ Bd such that {y ∈ Bd1 | y < j + 1} = {y ∈ Bd | y < j + 1}. Then, there exist nodes Ns and Nd1 with its intent Γ + (Bd1 ) in T such that Ns is a sibling node of Nj and Ns `T,Im Nd1 for Im = Bd \ Bd1 . Proof. Let k = min{i ∈ Bd1 | i > j}. Then, let Ns be the sibling node of Nj with its intent Bs = Γ + (B ∪ {k}). See Fig. 3. We show that Ns is not pruned by the Im -canonicity test. Let Nj1 be the node in the branch from Nj to Nd in T such that attribute k is first introduced in the branch. For the intent Bj1 of Nj1 , we have that Bj1 ⊇ Bs . Since Γ + (·) is monotone, it follows that Γ + (B ∪ {k}) passes the Im -canonicity test. Next, we show that there exists node Nd1 such that Ns `T,Im Nd1 for Im = Bd \ Bd1 . We prove it by the induction on the size of Bd1 \ Bs . The base case Bd1 = Bs is trivial. For the induction step, let Ns1 be the child node of Ns obtained by adding to Bs k1 = min{i ∈ Bd \ Bs | i > k}. We can show that Ns `T,Im Ns1 similarly to the above. The remaining proof then follows from the induction hypothesis. 2 Theorem 2 (Correctness of Algorithm 1). For a given formal context and a minimum support minsup, (i) the set of patterns given by function RS-GenerateFrom in Algorithm 1 includes all the relevant patterns, and (ii) it is a subset of the closed on the positives with the support no less than minsup. Proof. The part (ii) is rather obvious, since the patterns generated by the func- tion are the results of Γ + . For the part (i), since a relevant pattern is closed on the positives from Theorem 1, we show that the pruning methods, line 14 in the algorithm, do not eliminate any relevant pattern. 230 Hirohisa Seki and Taiki Yamada Let N0 = hA, Bi be a node in a cPos search tree T . Suppose that N0 has a descendant node Nd = hAd , Bd i in T , i.e., N0 `T Nd = hAd , Bd i, and I ⊆ {y + 1, . . . , |M | − 1} (0 ≤ y < |M |) is the set of attributes added to each node in the branch br from N0 to Nd . Let I1 = {j ∈ I | supp − (B1 ) = supp − (Γ + (B1 ∪{j})) for the intent B1 of a node in br}, and I2 = {j ∈ I | Nj in br is dominated by a younger sibling node in T }. Let Im = I1 ∪ I2 . We now consider subset Bd1 = Bd \ Im . From Lemma 1, we then have that there exists a node Nd1 such that N0 `T,Im Nd1 and its intent is Γ + (Bd1 ). We note that this derivation from N0 to Nd1 is not pruned in Algorithm 1. Moreover, Γ + (Bd1 ) is more relevant than Bd , since supp − (Bd ) = supp − (Γ + (Bd1 )). 2 Grosskreutz proposed an output-polynomial time algorithm (Algorithm 2 in [5]) for enumerating relevant patterns. It, however, requires memory with the size of the output (i.e., the set of relevant patterns). To address the problem, the author also proposed a divide-and-conquer algorithm (Algorithm 3 in [5]); it uses less memory, but it visits irrelevant patterns. Algorithm 1 in the current paper is similar to that divide-and-conquer algorithm in that both visit a superset of relevant patterns, and every pattern is visited at most once during the search space traversal. The difference is that Algorithm 3 in [5] employs the supp − - check only, while Algorithm 1 in this paper employs the siblings-check as well as the supp − -check. Algorithm 1 visits irrelevant patterns; its search space is the set of all closed on the positives in the worst case. The set of all relevant patterns is obtained by filtering in line 3. For that, we use the filtering method in [4]. 3.2 Relevant Patterns and Correlation Measures Since the framework using the support/confidence only generates too many rules, we usually use another measure to find “interesting” ones among the generated rules. lift-value is such a measure to find correlated rules; the lift-value of a rule A → c+ , where A is a pattern and c+ is a class label, is defined as the ratio of the probability of P (A ∧ c+ | A) to that of P (c+ ): lift(A, c+ ) = PP(A)·P (A∧c+ ) (c+ ) . Given a contingency table in Table 1, where given m and n are both assumed to n a be constants, we have that lift(A, c+ ) = m · a+b . We use the following property of lift-values: for patterns A1 and A2 , lift(A1 , c+ ) ≥ lift(A2 , c+ ) ↔ a1 b2 ≥ a2 b1 , where ai (bi ) (i = 1, 2) is the number of true positives (false positives), respec- tively. We thus have that, if A1 is more relevant than A2 , then lift(A1 , c+ ) ≥ lift(A2 , c+ ). Algorithm 1 can be therefore utilized to find association rules with high lift-values. A CbO-based Algorithm for Mining Class Relevant Patterns 231 Table 1: Contingency Table for Rule r=A → c+ . c+ is true c+ is false Sumrow A is true sup(r)=a b sup(A)=a + b A is false m−a n−m−b sup(¬A)= n − (a + b) Sumcol sup(c+ )=m sup(c− )= n − m n Table 2: Example Databases: from the UCI Machine Learning Repository. Dataset #obj. #attr. target class #pos. Lenses 24 9 hard 4 soft 5 none 15 Mushroom 8124 117 poisonous 3916 Lymphography 148 59 malign lymph 61 Soybean-small 48 72 D3 10 4 Experimental Results We show in Table 2 some datasets used in our experiments. They are from the UCI Machine Learning 3 . The table shows some properties of each dataset, including the target classes considered and the number of their positive examples. We have implemented our proposed method by using Java 8 on a PC with an Intel Core i7 processor running at 2.30GHz, 8GB of main memory, working under Windows 10 (64 bit). We have performed the following experiments with fixed min sup = 1 (i.e., all concepts). 4.1 Experimental Results: the Lenses Dataset To see the effects of our pruning methods, we present some results for the Lenses dataset in Figure 4. In Algorithm 1, we incorporate into the CbO-based miner for closed on the positives (cloPos) two types of pruning methods: the supp − -check and the siblings-check. The figure shows the number of patterns generated in computing the closed on the positives (cloPos) and that of patterns generated by using the supp − -check only, together with that of patterns generated by using both the supp − -check and siblings-check. We have tested three cases, varying target classes: hard, soft and non-contact lenses. We have observed that the pruning method by the siblings-check enables us to generate less patterns compared with the cloPos method (i.e., without 3 http://archive.ics.uci.edu/ml/datasets/statlog+(heart). 232 Hirohisa Seki and Taiki Yamada 81 81 80 cloPos 73 70 supp- sibsChk 60 #patterns 50 40 30 26 26 25 20 10 7 5 3 0 hard soft no-lenses Fig. 4: Comparison in the Lenses Dataset Table 3: #(Generated Patterns) in Lenses Dataset target class hard soft non-lense #(clo) 71 77 107 #(cloPos) 7 26 81 supp − -check 5 26 81 sibs-check 3 25 73 #(RPs) 3 25 73 pruning) and supp − -check only. Table 3 shows more detailed results, includ- ing the number of closed patterns (#(clo)) and that of relevant patterns after removing irrelevant ones (#(RPs)) (in line 3 in Algorithm 1). 4.2 Experimental Results on the Other Datasets In Figure 5, we present some results for the other three datasets in Table 2, Mushroom, Lymphography and the Soybean-small. The figure (left) shows the numbers of generated patterns by Algorithm 1. The figure (right) shows the corresponding execution times in milliseconds. We note that these figures use a logarithmic scale. We have observed that the effects of the pruning method by the siblings-check on these datasets are more prominent than those on the previous Lenses dataset. The numbers of closed patterns (#(clo)) and closed on the positives (#(cloPos)) A CbO-based Algorithm for Mining Class Relevant Patterns 233 105 86547 259833 cloPos cloPos 105 supp- supp- sibsChk sibsChk 104 104 Exec. Time [ms] 4692 4000 #patterns 2643 2710 103 800 103 753 236 177 337 102 63 102 66 13 44 101 4 26 3 Mushroom Lymphography Soybean-small Mushroom Lymphography Soybean-small Fig. 5: #(Generated Patterns) and Execution Time for the Mushroom, Lymphography and the Soybean-small Dataset. Table 4: Results of the Other Datasets in Table 2. Mushroom Lymphography Soybean-small #(clo) 145,353 35,905 2,036 #(cloPos) 86,547 4,692 66 supp − -check 2,643 2,710 44 sibs-check 337 753 26 #(RPs) 187 499 22 in these three datasets are orders of magnitude larger than that of the Lenses dataset. As a result, there are more chances for pruning by the siblings-check to work for reducing the number of irrelevant patterns. The execution times of these datasets are also reduced accordingly. 5 Concluding Remarks In this paper, we have considered the problem of mining relevant patterns for labeled data from a formal context, especially focusing on the pruning methods for reducing irrelevant patterns. Unlike the divide-and-conquer based method [5] in the literature, we have proposed a CbO-based algorithm; while traversing the search space of closed sets on the positives, it performs pruning using two types of the dominance check between patterns: the supp − -check and the siblings- check. We have empirically examined the effectiveness of our algorithm on some datasets, and shown that the effects of the siblings-check on reducing irrelevant patterns have compensated for its extra computational costs. For future work, sine our algorithm visits a superset of relevant patterns from a formal context, it will be interesting to introduce another dominance check to reduce the search space. Another further work is to apply relevant pattern mining 234 Hirohisa Seki and Taiki Yamada to association rule mining with non-monotone measures such as lift (e.g., [9]). The relationship with subgroup discovery, e.g., recent work by Belfodil et al. [1], is also to be studied. Acknowledgment The authors would like to thank anonymous reviewers for their useful comments on the previous version of the paper. This work is partially supported by JSPS Grant-in-Aid for Scientific Research (C) 18K11432. References 1. Belfodil, A., Belfodil, A., Bendimerad, A., Lamarre, P., Robardet, C., Kaytoue, M., Plantevit, M.: FSSD - a fast and efficient algorithm for subgroup set discovery. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA). pp. 91–99 (Oct 2019) 2. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer (1999) 3. Ganter, B., Kuznetsov, S.O.: Formalizing hypotheses with concepts. In: Interna- tional Conference on Conceptual Structures. pp. 342–356. Springer (2000) 4. Garriga, G.C., Kralj, P., Lavrač, N.: Closed sets for labeled data. Journal of Ma- chine Learning Research 9(Apr), 559–580 (2008) 5. Grosskreutz, H.: Class relevant pattern mining in output-polynomial time. In: SDM. pp. 284–294 (2012) 6. Krajca, P., Outrata, J., Vychodil, V.: Parallel algorithm for computing fixpoints of Galois connections. Annals of Mathematics and Artificial Intelligence 59(2), 257–272 (2010) 7. Krajca, P., Outrata, J., Vychodil, V.: Advances in algorithms based on cbo. In: CLA. vol. 672, pp. 325–337 (2010) 8. Kuznetsov, S.O.: Learning of simple conceptual graphs from positive and negative examples. In: Proc. of the Third European Conference on Principles of Data Mining and Knowledge Discovery. pp. 384–391. PKDD ’99, Springer-Verlag, London, UK, UK (1999) 9. Kuznetsov, S., Makhalova, T.: On interestingness measures of formal concepts. Inf. Sci. 442(C), 202–219 (May 2018) 10. Lavrač, N., Gamberger, D., Jovanoski, V.: A study of relevance for learning in deductive databases. J. of Logic Programming 40, 215–249 (Jun 1999) 11. Lavrač, N., Gamberger, D.: Relevancy in constraint-based subgroup discovery. In: Proceedings of the 2004 European Conference on Constraint-Based Mining and Inductive Databases. pp. 243–266. Springer-Verlag, Berlin, Heidelberg (2005) 12. Lemmerich, F., Rohlfs, M., Atzmueller, M.: Fast discovery of relevant subgroup patterns. In: Twenty-Third International FLAIRS Conference (2010) 13. Outrata, J., Vychodil, V.: Fast algorithm for computing fixpoints of galois con- nections induced by object-attribute relational data. Information Sciences 185(1), 114–127 (2012) 14. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules. In: Proc. ICDT’99, pp. 398–416. Springer (1999) 15. Zaki, M.J.: Mining non-redundant association rules. Data Mining and Knowledge Discovery 9(3), 223–248 (2004)