Closure Structure: a Deeper Insight Tatiana Makhalova1 , Sergei O. Kuznetsov2 , and Amedeo Napoli1 1 Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France 2 National Research University Higher School of Economics, Moscow, Russia tatiana.makhalova@inria.fr, skuznetsov@hse.ru, amedeo.napoli@loria.fr Abstract. Discovery of the pattern search space is an essential compo- nent of Itemset Mining. The most common approach to reduce pattern search space is to compute only frequent closed itemsets. Frequent pat- terns are known to be not a good choice due to omitting useful infrequent itemsets and their exponential explosion with decreasing frequency. In our previous work we proposed the closure structure that allows for com- puting itemsets level-by-level without any preset parameters. In this work we study experimentally some properties of the closure levels. 1 Introduction Itemset Mining (IM) envelops a wide variety of tasks and methods related to computing and selecting itemsets. Its main challenges can be summarized in two questions: What itemsets (and how) to compute? Which of them (and how) to use? IM is usually considered as unsupervised learning, meaning that one selects itemsets based on such characteristics as coverage, diversity, interestingness by a certain measure [9], etc. However, IM also includes some supervised approaches, e.g., rule-based classifiers, where itemsets are selected based on a standard qual- ity measure of classifiers. However, both supervised and unsupervised approaches may use the same methods for itemset computing. To date, frequency remains the major criterion for computing itemsets. The methods for computing fre- quent itemsets are brought together under the name Frequent Itemset Mining (FIM). The main drawback of FIM is omitting interesting and useful infrequent itemsets, while the main advantage of FIM is efficiency in the sense that any FIM-approach computes frequent itemsets and only them (because of the anti- monotonicity of frequency w.r.t. the order of pattern inclusion). Nowadays there are almost no other (anti-)monotone measures that are com- monly used in IM for computing itemsets. In [4] authors propose to generate closed itemsets based on ∆-measure, which is monotone w.r.t. projections. Au- thors propose an efficient polynomial algorithm, however, the lack of experiential study of the quality of generated itemsets may hamper a wide use of this ap- proach in practice. In our previous work [11], we proposed the closure structure of concept lattice (i.e., the whole set of closed itemsets) and an algorithm for its gradual comput- ing. The algorithm computes closed itemsets (formal concepts) by levels with Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). polynomial delay. Each level may contain itemsets of different frequency, how- ever, the number of frequent itemsets decreases with each new level. In [11] we presented some theoretical results and described characteristics of closed item- sets by levels based on experiments. In this work, we study further topology of the closure structure and applicability of concepts for classification by closure levels. The paper has the following structure. In Section 2 we recall the basic notions. In Section 3 we describe the GDPM algorithm for computing closure levels and discuss its flaws. In Section 4 we present a simple model of a rule-based classifier. Section 5 contains the results of the experiments. In Section 6 we conclude and give directions of future work. 2 Basic notions 2.1 Concepts and the partial order between them A formal context [7] is a triple (G, M, I), where G is called a set of objects, M is called a set of attributes and I ⊆ G × M is a relation called incidence relation, 0 i.e., (g, m) ∈ I if object g has attribute m. The derivation operators (·) are defined for A ⊆ G and B ⊆ M as follows: A0 = {m ∈ M | ∀g ∈ A : gIm} , B 0 = {g ∈ G | ∀m ∈ B : gIm} . Sets A ⊆ G, B ⊆ M , such that A = A00 and B = B 00 , are said to be closed. For A ⊆ G, B ⊆ M , a pair (A, B) such that A00 = B and B 00 = A, is called a formal concept, A and B are called extent and intent, respectively. In Data Mining, an intent is also called a closed itemset (or closed pattern). A partial order ≤ is defined on the set of concepts as follows: (A, B) ≤ (C, D) iff A ⊆ C (D ⊆ B), a pair (A, B) is a subconcept of (C, D), while (C, D) is a superconcept of (A, B). With respect to this partial order, the set of all formal concepts forms a complete lattice L called the concept lattice of the formal context (G, M, I). 2.2 Equivalence classes and key sets Let B be a closed itemset. Then all subsets D ⊆ B, such that D00 = B are called generators of B and the set of all generators is called the equivalence class of B, denoted by Equiv(B) = {D | D ⊆ B, D00 = B}. A subset D ∈ Equiv(B) is a key [2, 13] or minimal generator of B if for every E ⊂ D one has E 00 6= D00 = B 00 , i.e., every proper subset of a key is a member of the equivalence class of a smaller closed set. We denote a set of keys (key set) of B by K(B). The set of keys is an order ideal, i.e., any subset of a key is a key [13]. The minimum key set K min (B) ⊆ K(B) is a subset of the key set that contains the keys of the minimum size, i.e., K min (B) = {D | D ∈ K(B), |D| = minE∈K(B) |E|}. In an equivalence class there can be several keys, but only one closed itemset, which is maximal in this equivalence class. An equivalence class is called trivial if it consists only of a closed itemset. For the sake of simplicity, we denote attribute sets by strings of characters, e.g., abc instead of {a, b, c}. Example. Let us consider a formal context given in Table 1. Five concepts have nontrivial equivalence classes, namely ({g1 }, acf ), ({g3 }, ade), ({g5 , g6 }, bdf ), ({g5 }, bdef ) and (∅, abcdef ). Among them, only bdf and abcdef have the min- imum key sets that differ from the key sets, i.e., K min (bdf ) = {b}, K(bdf ) = {b, df } and K min (abcdef ) = {ab}, K(abcdef ) = {ab, adf, aef, cef }. Table 1. Formal context and nontrivial equivalence classes. a b c d e f C K min (C) K(C) Equiv(C) g1 × × × g2 × × × ade ad ad ad, ade g3 × ×× acf af , cf af , cf af , cf , acf g4 × × bdf b b, df b, df , bd, bf , bdf g5 × ××× bdef be, ef be, ef be, ef , bde, bef , def , bdef g6 × × × abcdef ab ab, adf , aef , cef ab, adf , aef , cef , ..., abcdef ∗ ∗ The equivalence class includes all itemsets that contain a key from K(abcdef ). 2.3 Level-wise structure on minimum key sets In [11] we introduced the minimum closure structure induced by minimum key sets. Here we recall the main notions. Let C be a set of all closed itemsets and K min (B) be the minimum key set of a closed itemset B ∈ C. We denote a function that maps a closed itemset to the size of its minimum key by level, i.e., level : C → {0, . . . , |M |}, such that level(B) = |D|, where D ∈ Kmin (B) is an arbitrary itemset chosen from Kmin (B). The minimal S structural level k is given by all minimum keys of size k, i.e., Kkmin = B∈C, K min (B). We say that B belongs to the minimum level(B)=k structural level k if keys in K min (B) have size k. We denote the corresponding set of closed itemsets of level k by Ckmin . More formally, Ckmin = {B | B ∈ C, level(B) = k}. We call minimum structural complexity of C the maximal number of not empty levels, Ncmin = max{k | k = 1, . . . , |M |, Kkmin 6= ∅}. Example. The closure structure of the concept lattice from the running example is given in Fig. 1. It consists of 3 closure levels, minimum structural complexity is equal to 3. 3 The GDPM algorithm and related issues Efficiency is a principal parameter of the algorithms for computing closed item- sets (concepts). Apart of polynomial delay, we pay attention to other important Fig. 1. The closure structure of the concept lattice of the context from Table 1 induced by the sizes of the minimum keys. The smallest lexicographically minimum keys for each concept are highlighted in boldface. characteristics, namely (1) the strategy of computing a concept given already generated ones, and (2) the test of uniqueness of generated concepts. Taking into account that the set of minimum keys is an order ideal, we may generate closure levels using a strategy similar to one used in the Titanic and Pascal algorithms, i.e., computing a new minimum key by merging two minimum keys, that differ in one element, from the previous level. This strategy may be non-optimal because (1) for each concept we should keep all its minimum keys, and (2) the time complexity of the key candidate generation is quadratic w.r.t. the size of the last level. An alternative strategy is to add to each key an attribute that is not included into the corresponding intent. For this strategy it is important to use an efficient procedure for verification whether a concept is generated for the first time. We cannot use the canonicity test as it is done, for example, in CbO because a key of a concept may be lexicographically less than its minimum key. The simplest solution to ensure the presence of only one minimum key for each concept is to use a lexicographic tree that contains all previously generated concepts. Thus, for each generated key we additionally need O(|M |) time to check if a concept was generated at previous iterations. We proposed an algorithm called GDPM (Gradual Discovery in Pattern Min- ing) to compute the closure structure of concept lattice by levels. Its detailed description and an example are given in [11] (referred there as CbO-Gen). Here we give its brief description. The pseudocode of GDPM is given in Algorithm 1. The algorithm computes concepts based on the breadth-first traversal, i.e., at the level k it computes all concepts that have a minimum key of size k. Each newly generated key is obtained by computing the union of a minimum key from the previous level and an attribute that is out of the closure of the minimum key (lines 3-13). The key is added to Kk∗ only if its closure is not in the lexicographic tree that stores all generated previously intents (lines 8-11). For each concept we store only one minimum key in Kk∗ . Algorithm 1 GDPM ∗ min Require: Kk−1 , a subset of the minimum key set Kk−1 S k − 1, of level Tk−1 , the lexicographic tree containing all closed itemsets i=1,...,k−1 Cimin Ensure: Kk∗ , a subset of the minimum key set of level k 1: Kk∗ ← ∅ 2: Tk ← Tk−1 ∗ 3: for all X ∈ Kk−1 do 4: Y ← M \ X 00 5: for all m ∈ Y do 6: X ∗ = X ∪ {m} 7: S ← (X ∗ )00 8: if S ∈ / Tk then 9: add(Tk , S) 10: Kk∗ ← Kk∗ ∪ {X ∗ } 11: end if 12: end for 13: end for 14: return Kk∗ 4 Concepts as classifiers. Baseline classification model In order to evaluate intents (closed itemsets) as representatives of the classes we propose to use the class labels of objects that were unavailable during computing the closure structure. Further we describe a simple concept-based classification model. This model is closely related to the JSM-method proposed by Finn, that is widely used in the FCA community [10, 3, 8]. Let (G, M, I) be a training context, and each object g belongs to one class label(g) ∈ Y, where Y is a set of class labels. We use concepts as classifiers. Let c = (A, P B) ∈ C be a formal concept, then its class is given by class(c) = arg maxy∈Y ( g∈A [label(g) = y]), where [·] is an indicator function, taking 1 if the condition in the bracket is true and 0 otherwise. An object g is classified by a set of concept classifiers C ∗ basedPon the weighted majority vote as follows: classif y(g, C ∗ , w, θ) = arg maxy∈Y ( c∈C ∗ ,w(c)≥θ w(c)), where w(·) is a weight class(c)=y function, and θ is a weight threshold. For example, for a concept c = (A, B) weight can be defined based on one of the following functions: tp(c) tp(c) 2 · prec(c) · recall(c) prec(c) = , recall(c) = , F 1(c) = , tp(c) + f p(c) tp(c) + f n(c) prec(c) + recall(c) where tp(c) = |{g | label(g) = class(c), g ∈ A}|, f p(c) = |A| − tp(c), tn = |{g | label(g) 6= class(c), g ∈ G \ A}|, f n = |G \ A| − tn. As a set of classifiers we use either a single level Ckmin or all concepts up to level k, i.e., ∪j≤k Cjmin . Example. Let us consider the context from Table 1. We take the class labels where objects g1 and g2 belong to class “+”, and g3 − g6 belong to class “−”. The weights of concept classifiers are given by the extent precision, e.g., pr(a) = pr(c) = pr(d) = pr(f ) = 2/3. The threshold is θ = 2/3. The intents a and c are from “+”-class, d and f are from “−”-class. Then, to classify object gtest described by acdf we use the intents a, c, d, f from the 1st level, and ac, acf from the 2nd level. Using the intents of the 1st level we are not able to classify gtest since pr(a) + pr(c) = pr(d) + pr(f ) = 4/3. For the intents of the 2nd level we have pr(ac) + pr(acf ) = 2 for “+” class and 0 for “−” class, thus, we classify gtest as “+”. The proposed model is based on all intents from a given level that meet the weight requirements. However, more sophisticated models, e.g., Classy [12] or Krimp [14], can be adapted to use the intents by closure levels instead of frequent itemsets. More proper combination of the intents may improve the classification quality. For large datasets the closure structure can be computed for each class inde- pendently. 5 Experiments In this section we report the results of an experimental study of the minimum closure structure, i.e., closed itemsets within levels Ckmin . We use freely available datasets from the LUCS/KDD data set repository [5], their characteristics are given in Table 2. Table 2. Description of datasets and their level structure. level size |Ckmin | name |G| × |M | dens. |C| 1 2 3 4 5 6 7 8 9 10 11 hepatitis 155×50 0.36 144870 50 761 4373 14696 31240 41995 33048 14724 3570 399 14 heart-d. 303×45 0.29 25538 45 561 2696 6381 7980 5389 2037 417 32 auto 205×129 0.19 57788 127 2695 12539 21311 15283 5042 748 43 glass 214×40 0.23 3245 40 450 1217 1041 386 96 14 1 pima 768×36 0.22 1625 36 284 466 451 271 96 19 2 iris 150×32 0.50 4481 30 263 792 1279 1335 682 100 led7 3200×28 0.50 1950 14 84 280 560 630 321 61 ticTacToe 958×27 0.33 42711 27 324 2266 9664 16982 10648 2800 wine 178×65 0.20 13228 65 1289 4779 5026 1791 265 13 zoo 101×35 0.46 4569 35 377 1194 1656 1059 239 9 breast 699×14 0.64 361 12 50 112 125 55 7 car eval. 1728×21 0.29 7999 21 183 847 2196 3024 1728 ecoli 327×24 0.29 425 24 138 185 67 10 1 5.1 Concept contrast based on F1 measure In IM apart of descriptive quality of itemsets as coverage or diversity, one is interested in assessing the quality of itemsets as representatives of the classes. The latter is closely related to emerging patterns [6, 1]. In this study we evaluate contrast of formal extents by F1 measure. As it was shown in [11], the average F1 measure usually decreases w.r.t. closure levels. However, there are some datasets with atypical behavior, where the average F1 measure increases at the last levels of the closure structure. To address the underlying causes of this behavior we study the values of F1 measure within the frequency ranges of size 0.1, i.e., (0.0, 0.1], (0.1, 0.2], (0.2, 0.3], etc. Fig. 2 shows the results for some datasets. Our experiments showed that usually the value of F1 measure of the concepts within a fixed frequency range remains almost the same at all levels. Thus, the average F1 at a closure level is affected by the proportion of the concepts of a certain frequency. Since the ratio of frequent (infrequent) concepts decreases (increases) with the level number, F1 measure decreases as well. Thus, we may expect increase of the average F1 measure at the last levels for datasets with a large number of frequent and “coherent” attributes and a subset of infrequent “incoherent” attributes. Fig. 2. The average F1 measure for 8 datasets within 10 frequency ranges. In the previous study we showed that the size of closure levels resembles the values of binomial coefficients, i.e., the largest levels are located in the middle of the closure structure, while the first and last levels are the smallest ones. In Fig. 3 we show the size of the levels w.r.t. 10 frequency ranges. As for the whole set of concepts, for the subset of concepts of a given frequency we observe quite similar level size distributions – the largest level is on the middle, the smallest ones are located the first and last levels. The index of the largest levels is shifting – the less frequency the larger the index of the largest level. Fig. 3. The size of the concepts of a given frequency range w.r.t. level number. 5.2 Classification quality In this section we report the average accuracy by 10-fold cross validation of the rule-based classifier described in Section 4. We use 8 folds (training set) to compute itemsets, 1 fold (test set) to select the best parameters and 1 fold (validation set) to assess the performance of the classifiers. We report the average values on the validation sets. We use both concepts from a single closure level (single level, SL) and concepts from all levels up to a given level (cumulative levels, CL) to build a classifier. As a weight function we use precision with the following threshold values: 0.0, 0.6, 0.7, 0.8 and 0.9. The experiments showed that both SL- and CL-classifiers may achieve quite high accuracy. The average accuracy for 8 datasets is given in Fig. 4. The max- imal (or close to the maximal) accuracy of CL-classifiers is achieved at the first levels and usually changes slightly when the classifier is extended by the further closure levels. For SL-classifiers the maximal (or close to the maximal) accuracy is usually achieved at one of the first levels. Fig. 4. Average accuracy of four classifiers: “SL/CL pr. = X” stands for a single-level (SL) or cumulative (CL) classifier where each concept has precision at least X. We also compared the proposed classifier with the state-of-the-art classifiers from the Sklearn library. We consider SVM, Naive Bayes, and 3 tree-based mod- els: Random Forests, CART, C5.0. We also use three sets to select the best parameters for each classifiers, i.e., the number of trees for tree-based classifiers (50 or 100), the maximum tree depth (2, 5, 10, 15) and the kernel types for SVM (polynomial, Radial basis function, sigmoid). The average accuracy is reported in Table 3. The experiments show that even with the simplest model of classifiers based on closure level we can achieve the accuracy comparable with the one of the state-of-the-art classifiers. A more proper selection and combination of the generated concepts may provide better quality. Based on the obtained results we may conclude that the proposed level-wise strategy allows us to generate the concepts that describe meaningful groups of objects and the intents from the first closure levels may be used as an alternative to frequent itemset. 6 Conclusion In this paper, we study further the closure structure of concept lattice by fo- cusing on the ability of concepts to describe meaningful subsets of objects. Our experiments show that the levels of the closure structure are a good alternative Table 3. The average accuracy of classifiers. The best performance of the proposed classifiers (SL/CL) and their parameters (level, precision threshold) are given in the last two columns. only the first 3 levels all levels data RF CART C5.0 SVM NB are considered are considered CL level, pr. SL level,pr. CL level,pr. SL level,pr. auto 0.77 0.66 0.64 0.74 0.54 0.71 3, 0.8 0.65 2, 0.8 0.74 4, 0.9 0.65 2, 0.8 breast 0.94 0.94 0.94 0.94 0.93 0.94 3, 0.0 0.94 3, 0.6 0.94 3, 0.0 0.94 3, 0.6 car ev. 0.96 0.97 0.97 0.99 0.80 0.70 3, 0.8 0.74 3, 0.0 0.86 5, 0.9 0.88 5, 0.0 ecoli 0.87 0.87 0.86 0.86 0.66 0.80 3, 0.7 0.84 3, 0.6 0.81 4, 0.7 0.84 3, 0.6 glass 0.76 0.69 0.66 0.72 0.45 0.65 3, 0.8 0.63 3, 0.0 0.65 4, 0.9 0.63 3, 0.0 heart-d. 0.56 0.54 0.53 0.55 0.20 0.54 1, 0.0 0.54 3, 0.6 0.55 7, 0.9 0.57 7, 0.0 hepatitis 0.81 0.78 0.80 0.83 0.53 0.83 2, 0.9 0.81 3, 0.9 0.83 2, 0.9 0.81 3, 0.9 iris 0.94 0.94 0.94 0.93 0.95 0.96 2, 0.6 0.95 1, 0.7 0.96 2, 0.6 0.95 1, 0.7 led7 0.75 0.75 0.75 0.76 0.75 0.55 3, 0.6 0.62 3, 0.0 0.74 7, 0.0 0.74 6, 0.7 pima 0.74 0.72 0.74 0.73 0.53 0.74 3, 0.6 0.75 3, 0.6 0.75 4, 0.6 0.75 4, 0.6 ticTacToe 0.98 0.95 0.94 0.98 0.68 0.97 3, 0.9 0.98 3, 0.9 0.99 6, 0.9 0.99 6, 0.6 wine 0.94 0.86 0.90 0.95 0.84 0.93 3, 0.9 0.90 3, 0.7 0.93 6, 0.9 0.92 4, 0.6 zoo 0.86 0.91 0.94 0.96 0.95 0.84 3, 0.8 0.83 3, 0.7 0.82 4, 0.8 0.83 3, 0.7 average 0.84 0.81 0.82 0.84 0.68 0.78 0.78 0.81 0.81 to frequency. Each closure level is computed with polynomial delay, and the quality of itemsets decreases after the first levels. One of the main directions of future work is to develop more efficient al- gorithms for computing the closure levels and study other practical applica- tions where the proposed closure structure may provide better results than the frequency-based concept generation. References 1. Asses, Y., Buzmakov, A., Bourquard, T., Kuznetsov, S.O., Napoli, A.: A hybrid classification approach based on FCA and emerging patterns - an application for the classification of biological inhibitors. In: Proceedings of the 9th International Conference on Concept Lattices and Their Applications. CEUR Workshop Pro- ceedings, vol. 972, pp. 211–222 (2012) 2. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. ACM SIGKDD Explorations Newsletter 2(2), 66–75 (2000) 3. Blinova, V., Dobrynin, D., Finn, V., Kuznetsov, S.O., Pankratova, E.: Toxicology analysis by means of the jsm-method. Bioinformatics 19(10), 1201–1207 (2003) 4. Buzmakov, A., Kuznetsov, S., Napoli, A.: Sofia: how to make fca polynomial? In: Proceedings of the 4th International Conference on What can FCA do for Artificial Intelligence?-Volume 1430. pp. 27–34. CEUR-WS. org (2015) 5. Coenen, F.: The LUCS-KDD discretised/normalised ARM and CARM Data Li- brary. http://www.csc.liv.ac.uk/ frans/KDD/Software/LUCS KDD DN, 6. Cuissart, B., Poezevara, G., Crémilleux, B., Lepailleur, A., Bureau, R.: Emerging Patterns as Structural Alerts for Computational Toxicology. In: Dong, G., Bailey, J. (eds.) Contrast Data Mining: Concepts, Algorithms, and Applications, pp. 269– 282. CRC Press (2013) 7. Ganter, B., Wille, R.: Formal concept analysis–mathematical foundations (1999) 8. Ganter, B., Grigoriev, P.A., Kuznetsov, S.O., Samokhin, M.V.: Concept-based data mining with scaled labeled graphs. In: International Conference on Conceptual Structures. pp. 94–108. Springer (2004) 9. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Computing Surveys (CSUR) 38(3), 9–es (2006) 10. Kuznetsov, S.O.: Interpretation on graphs and complexity characteristics of a search for specific patterns. Automatic Documentation and Mathematical Linguis- tics 24(1), 37–45 (1989) 11. Makhalova, T., Kuznetsov, S.O., Napoli, A.: Gradual discovery with closure structure of a concept lattice. In: The 15th International Conference on Concept Lattices and Their Applications (2020) 12. Proença, H.M., van Leeuwen, M.: Interpretable multiclass classification by mdl- based rule lists. Information Sciences 512, 1372–1393 (2020) 13. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with titanic. Data & knowledge engineering 42(2), 189–222 (2002) 14. Vreeken, J., Van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery 23(1), 169–214 (2011)