On Stability of Triadic Concepts Sergei O. Kuznetsov1 and Tatiana P. Makhalova1,2 1 National Research University Higher School of Economics, Kochnovsky pr. 3, Moscow 125319, Russia 2 ISIMA, Complexe scientifique des Cézeaux, 63177 Aubière Cedex, France skuznetsov@hse.ru,t.makhalova@gmail.com Abstract. Triadic formal concept analysis has become a popular re- search direction, since triadic relations give natural models of many data collections. In this paper we address the problem of selecting most inter- esting concepts by proposing triadic stability indices. 1 Introduction Triadic formal concept analysis (3FCA) was introduced by Rudolf Wille and Fritz Lehmann [1] to model hierarchies of classes and dependencies arising from ternary relations. Recently, several algorithms for computing frequent tricon- cepts were proposed [2, 3]. It is noticed that some infrequent concepts are still interesting, since they represent extraordinary or uncommon data. In this paper we propose triadic stability for selecting interesting triadic concepts. Together with exact stability indices we suggest their efficient approximations analogous to ∆-stability introduced in [4]. 2 Main definitions 2.1 Formal Concept Analysis First, we briefly recall some basic definitions of the Formal Concept Analysis (FCA) [5]. A formal context is a triple (G, M, I). G and M are sets of objects and attributes respectively, and I is an incidence relation. It is defined as the Cartesian product G × M , i.e. (g, m) ∈ I if the object g ∈ G has the attribute 0 m ∈ M . The derivation operators (·) are defined for A ⊆ G and B ⊆ M as follows: A0 = {m ∈ M | ∀g ∈ A : gIm} B 0 = {g ∈ G | ∀m ∈ B : gIm} A0 is the set of attributes common to all objects of A, and B 0 is the set of objects 0 sharing all attributes of B. The double application of (·) is a closure operator, 00 i.e. (·) is extensive, idempotent and monotone. Subsets A ⊆ G, B ⊆ M such that A = A00 and B = B 00 are called closed. A (formal) concept is a pair (A, B), where A ⊆ G, B ⊆ M and A0 = B, 0 B = A. A is called the (formal) extent, and B is called the (formal) intent of the concept (A, B). A concept lattice (or Galois lattice) is a partial ordered set of concepts, the order 6 on the set of concepts is defined as follows: (A, B) ≤ (C, D) iff A ⊆ C (D ⊆ B), a pair (A, B) is a subconcept of (C, D), while (C, D) is a superconcept of (A, B). Each finite lattice has the highest element with A = G, called the top element, and the lowest element with B = M , called the bottom element. 2.2 Triadic Concept Analysis In the case of a triadic relation one deals with a quadruple (G, M, B, Y ), called a triadic context. G, M , B are sets and Y is a ternary relation between G, M and B, i.e. Y ⊆ G × M × B; the elements of G, M and B are called objects, attributes and conditions respectively, and (g, m, b) ∈ Y is read: object g has attribute m under condition b. The dyadic derivation operators can be used to construct triadic concepts. A triadic context can be represented as follows: K := (K1 , K2 , K3 , Y ), where K1 is a set of objects G, K2 is a set of attributes and K3 is a set of conditions, and each element of Ki may be seen as an instance of Peirce’s i-th category [1]. For every triadic context one defines  the following dyadic contexts: K1 := K1 , K2 × K3 , Y (1) with gY (1) (m, b) :⇔ (g, m, b) ∈ Y  K2 := K2 , K1 × K3 , Y (2) with mY (2) (g, b) :⇔ (g, m, b) ∈ Y  K3 := K3 , K1 × K2 , Y (3) with bY (3) (g, m) :⇔ (g, m, b) ∈ Y  (i,j) (i,j) For {i, j, k} = {1, 2, 3} and Ak ⊆ Kk , one defines KAk := Ki , Kj , YAk , (i,j) where (ai , aj ) ∈ YAk if and only if (ai , aj , ak ) ∈ Y for all ak ∈ Ak . (i) Put differently, the context K is a flattened representation of the original (i,j) triadic context, while KAk corresponds to the relation between elements of Ki and Kj that belong to Ak . (i)-derivation operator For {i, j, k} = {1, 2, 3} with j < k and for X ⊆ Ki and Z ⊆ Kj × Kk the (i)-derivation operators are defined by X 7−→ X (i) := {(aj , ak ) ∈ Kj × Kk | (ai , aj , ak ) ∈ Y for all ai ∈ X} Z 7−→ Z (i) := {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Z} (i, j, Xk )-derivation operators For {i, j, k} = {1, 2, 3} and for Xi ⊆ Ki , Xj ⊆ Kj and Ak ⊆ Kk the (i, j, Xk )-derivation operators are defined by (i,j,Ak ) Xi 7−→ Xi := {aj ∈ Kj | (ai , aj , ak ) ∈ Y for all (ai , ak ) ∈ Xi × Ak } (i,j,Ak ) Xj 7−→ Xj := {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Xj × Ak } A triadic concept (triconcept) of K := (K1 , K2 , K3 , Y ) is a triple (A1 , A2 , A3 ) (i) with Ai ⊆ Ki for i ∈ {1, 2, 3} and Ai = (Aj × Ak ) for every {i, j, k} = {1, 2, 3} with j < k. The sets A1 ,A2 , and A3 are called extent, intent and modus of the triadic concept respectively. We let T (K) denote the set of all triadic concepts of K. A triadic concept lattice has three maximal elements, namely ((K2 × K3 )(1) , K2 , K3 ), (K1 , (K1 × K3 )(2) , K3 ), and (K1 , K2 , (K1 × K2 )(3) ). For any two ele- ments of a lattice one defines tree types of set inclusion/exclusion relations, which satisfy the following antiordinal dependencies: (A1 , A2 , A3 ) G (B1 , B2 , B3 ) iff A1 ⊆ B1 , A2 ⊇ B2 , A3 ⊇ B3 , (A1 , A2 , A3 ) M (B1 , B2 , B3 ) iff A1 ⊇ B1 , A2 ⊆ B2 , A3 ⊇ B3 or (A1 , A2 , A3 ) C (B1 , B2 , B3 ) iff A1 ⊇ B1 , A2 ⊇ B2 , A3 ⊆ B3 . 3 Stability Indices For Triadic Concepts Stability indices for formal concepts were introduced in [6, 7] and modified in [8]. We define stability indices for the triadic case in a similar way. We describe two types of stability that correspond to the derivation operators defined above. (i)-stability For a triadic concept (A1 , A2 , A3 ) the (i)-stability is defined by:  (i) | X ⊆ Ai |X (i) = (Aj × Ak ) | Stab (A1 , A2 , A3 ) := 2|Ai | This index shows how much the binary relation on sets Xj and Xk is depen- dent on particular elements of a subset Ai . (i, j, Xk )-stability For a triadic concept (A1 , A2 , A3 ) the (i, j, Xk )-stability is defined by:  (i,j,Xk ) | X ⊆ (Ai × Aj ) |X (k) = Ak | Stab (A1 , A2 , A3 ) := 2|Ai |+|Aj | The (i, j, Xk )-stability allows us to estimate the dependence of a subset Ak on elements of the (Xi , Xj )-relation. Example Below we consider a small examples of computing stability indices for a concept. The formal context is given in the table 1. Table 1. Triadic context α β γ δ a b c d a b c d a b c d a b c d 1× × × × 2× ××× ×× ×× 3 ××××× ×× 4 × Let us consider a triconcept C = ({2, 3} , {b, c} , {β, γ}) with (1) - stability and (1, 3, X2 )-stability. (1) Stab(1) (C) = 12 . Since the numerator is comprised by {3} = ({b, c} × {β, γ}) (1) and {2, 3} = ({b, c} × {β, γ}). Table 2. I ⊆ M × C corresponding to all possible subsets of the extent {2, 3} {∅} a b c d {2} a b c d {3} a b c d {2, 3} a b c d α ×××× α × α ×× α β ×××× β ××× β ××× β ××× γ ×××× γ ×× γ ×× γ ×× δ ×××× δ ×× δ δ Stab(1,3,X2 ) (C) = 38 . To compute this value one needs to check 16 subsets of X1 × X3 and corresponding subsets of X2 . The following sets occur in the numerator: {∅, 2, 3, 23} × {∅, β, γ, βγ}. (1,3,A2 ) (1,3,A2 ) {b, c} = ({2, 3} , {β, γ})(1,3,A2 ) = ({2, 3} , {γ}) = ({3} , {γ}) (1,3,A2 ) (1,3,A2 ) (1,3,A2 ) ({3} , {β, γ}) = ({2} , {γ}) = ({2} , {β, γ}) 4 Estimates of stability The problem of computing stability is #P -complete [6, 7], therefore, in practice, when one deals with a big context and with the huge amount of generated concepts, it is very difficult to apply these indices. That’s why, estimates of the stability for dyadic concepts have been proposed [9, 4]. We have expanded the ∆-stability [4] for the case of triadic stability indices. In this regard, it is important to note that the estimates derived from the di- rect descendants of a triconcept can be useless owing to the defined quasiorders, because the number of direct neighbors is usually small. In figure 1 the distri- butions of the descendants number with respect to different inclusion/exclusion relations are represented. Fig. 1. The number of neighbors distributions Instead of considering the set difference between (i)-th components of a tri- concept and each direct descendant, we consider the set difference between (i)-th components of a triconcept c = (A1 , A2 , A3 ) and other, possibly, unclosed con- cepts derived by adding new elements from Kj \ Aj , j 6= i or Kj × Kk \ Aj × Ak , j, k 6= i. Put differently, the lower and upper bounds estimates of stability index look as follows: X −log2 2−∆(c,d) ≤ LStab (c) ≤ ∆min (c) , d∈Enl(c) where ∆min (c) = mind∈Enl(c) ∆ (c, d), n o Enl (c) = X | X = {Ak ∪ x} , x ∈ Kk \ Ak , X (k) ⊆ (Ai × Aj ) and ∆ (c, d) is the difference between |Aj | · |Ak | and the number of elements in X (k) for estimates of (i, j, Xk )-stability. n o Enl (c) = X | X = {Aj × Ak ∪ x} , x ∈ Kj × Kk \ Aj × Ak , X (i) ⊆ Ai and ∆ (c, d) is the difference between |Ai | and the number of elements in X (i) for estimates of (i)-stability. Example. Consider upper and lower bounds of stability estimates for C = ({2, 3} , {b, c} , {β, γ}) from the running example (Table 1). To get an estimate of the (1)-stability we consider elements of the following set {a, b, c, d}×{α, β, γ, δ}\ {b, c}×{β, γ}. Subsets of A1 derived from those elements are {∅} and {2}, which give us −log2 (7 · 2−2 + 2−1 ) and 1 for lower and upper bounds, respectively. To get estimates of (1, 3, X2 )-stability one needs to expand the intent by elements from {a, d}. Adding the first element a reduces the {2, 3}×{β, γ} to {2, 3}×{β}, while expanding the intent by d results in the empty set. Thus, the lower and upper bounds take values 1.678 and 2, respectively. 5 Experimental Results In this section we explore some empirical properties of the introduced indices using synthetic data. We generated four groups of 100 random 10 × 10 × 10 contexts with densities 0.1, 0.2, 0.4, 0.6. The features of the data are given in Figure 2. Fig. 2. Parameters of lattices constructed on 10 × 10 × 10 contexts. The choice of a subset of indices for data exploration can be motivated by the following properties: the indices should be pairwise uncorrelated (to avoid biased results when combining indices) and efficiently computable (if possible). The density function of an index may be a multimodal mixture of two or more distributions. In this case one needs a special justification for the choice of a threshold value separating two distributions. We consider Pearson’s correlation between all pairs of stability indices and cardinalities of sets that comprise a triadic concept (extent, intent, modus). In Figure 3 the values of the coefficient are represented. The sizes of the extent, intent and modus are denoted by |A1 |, |A2 |, |A3 |, respectively. The sizes of dyadic subcontexts are denoted in a similar way. The corresponding stability estimates are referred to by the log prefix. Fig. 3. The Pearson’s correlation coefficient among 100 contexts with the density 0.4 As can be seen from Figure 3, there is a correlation between (i)-stability and |Aj | · |Ak |. The index of (i)-stability correlates less strongly with the estimates of (j, k, Xi )-stability and the size of the set Ai . These types of correlation be- come stronger as the density of a context increases. In fact, these indices can be replaced by the size of a particular set in the case of a dense context. A cor- relation is observed between (i, j, Xk )-stability and its estimates, a less strong correlation is observed between (i, j, Xk )-stability and |Ai | · |Aj |. It is important to note that the strong correlation between (i, j, Xk )-stability and its estimates, as well as very small correlation between (i, j, Xk )-stability and estimates of (k)-stability remains the same with different context densities. The pairwise cor- relation between stability indices is weak, hence it is preferable to use these indices together. For selecting interesting concepts based on values of an index it is important to choose a correct threshold. This choice can be based on the distribution of index values. Figure 4 shows that the distribution of values (2)-stability and (1, 3, Xk )-stability (other (i)-stabilities and (i, j, Xk )-stabilities have similar dis- tributions). The distribution of (i)-stability values allows us to identify a thresh- old easily, since some picks exist in the distribution, while for (i, j, Xk )-stability the distribution varies from density to density, in case of a dense context it mo- tivates further study of the index and the way one selects thresholds for it. For values of the lower bound of stability estimates the modes of the distribution become less distinct or the distribution becomes unimodal (Figures 5,6). The upper bound for (i)-stability estimate (or (i, j, Xk )) in most cases corresponds to |Ai | (or |Ai | · |Aj |), since the closure of a superset of Aj × Ak (or Ak ) results in the empty set. Fig. 4. The distribution of values of (2)-stability and (1, 3, X2 )-stability Fig. 5. The values distribution of (2)-stability estimates The lower bound of (i)-stability is also strongly correlated with the size of set i. This is due to the fact that a big size of the set i leads to larger difference between the sizes of Ai and (Xj ×Xk )(i) , where Xj ×Xk is a superset of Aj ×Ak , and the sum under logarithm. The estimate of (i, j, Xk )-stability are correlated Fig. 6. The values distribution of (i, j, Xk )-stability estimates with the corresponding indices. There is roughly the same correlation between the estimate and the value |Ai | · |Aj |, which results from the bigger difference between |Ai | · |Aj | and |Xi | · |Xj |, which correspond to a superset of Ak . The upper and lower estimates of (i, j, Xk )-stability also correlate, in this case, the correlation could be related to set-difference between the set Ai × Aj and the volume of the rectangular subarea of Xi × Xk for the corresponding superset of Ak . It is noteworthy that the calculation of stability estimates in practice could take more time then the stability calculation itself. It is typical for (i)-stability, where the number 2|Ai | is lower then the number of all possible subsets obtained by adding elements from Kj × Kk \ Aj × Ak . 6 Conclusion In this paper we have introduced two stability indices for triadic concepts, based on two derivation operators, and studied their empirical behavior. We have pro- posed to compute stability using two derivation operators. We have studied correlation of stability indices and their distributions, which is important in practical data analysis. As it was shown, the introduced stability indices are not pairwise correlated and therefore can be used in some combinations for selecting interesting concepts. Moreover, (i)-stability correlates with |Ai |(for dense con- texts) and |Aj ||Ak |, and hence these indices should not be combined together. The values of (i)-stability for all concepts are characterized by the presence of groups of values with high frequency, which facilitates selection of interesting concepts based on threshold values, while the distribution of (i, j, Xk )-stability does not give clearly defined groups of interesting concepts. We have also introduced the estimates of stability indices, which correlate both with the corresponding stability indices and some of stability estimates. This is due to the fact that the estimates of (i)-stability (or (i, j, Xk )-stability) are based on the elements from Kj × Kk \ Aj × Ak (or Kk \ Ak ). Hence, the choice between stability and its estimates must be guided by the comparison of the sizes of sets involved in calculation, e.g. in the case of (i)-stability the number of subsets 2|Ai | , most probably, will be less then the number of elements in Kj × Kk \ Aj × Ak . The proposed indices characterize triconcepts differently, in general they do not agree in the top-n selected concepts, which allow us to use either their combination to set up the strictest selection criteria, or to take some of them depending on the meaning behind a stability index. References 1. Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In: Con- ceptual Structures: Applications, Implementation and Theory: Third International Conference on Conceptual Structures, ICCS ’95 Santa Cruz, CA, USA, August 14– 18, 1995 Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg (1995) 32–43 2. Jäschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Trias–an algorithm for mining iceberg tri-lattices. In: Proceedings of the Sixth International Conference on Data Mining. ICDM ’06, Washington, DC, USA, IEEE Computer Society (2006) 907–911 3. Ji, L., Tan, K.L., Tung, A.K.H.: Mining frequent closed cubes in 3d datasets. In: Proceedings of the 32Nd International Conference on Very Large Data Bases. VLDB ’06, VLDB Endowment (2006) 811–822 4. Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Scalable estimates of concept stability. In Glodeanu, C., Kaytoue, M., Sacarea, C., eds.: Formal Concept Analysis. Lecture Notes in Computer Science, Springer International Publishing (ICFCA, 2014) 157– 172 5. Ganter, B., Wille, R.: Contextual attribute logic. In Tepfenhart, W., Cyre, W., eds.: Conceptual Structures: Standards and Practices. Volume 1640 of Lecture Notes in Computer Science. Springer Berlin Heidelberg (1999) 377–388 6. Kuznetsov, S.O.: Stability as an estimate of degree of substantiation of hypotheses derived on the basis of operational similarity. Nauchn. Tekh. Inf., Ser. 2 (12) (1990) 21–29 7. Kuznetsov, S.O.: On stability of a formal concept. Annals of Mathematics and Artificial Intelligence 49(1-4) (2007) 101–115 8. Kuznetsov, S.O., Obiedkov, S., Roth, C.: Reducing the representation complexity of lattice-based taxonomies. In: Conceptual Structures: Knowledge Architectures for Smart Applications. Springer Berlin Heidelberg (2007) 241–254 9. Babin, M.A., Kuznetsov, S.O.: Approximating concept stability. In Domenach, F., Ignatov, D., Poelmans, J., eds.: Formal Concept Analysis. Volume 7278 of Lecture Notes in Computer Science., Springer Berlin Heidelberg (2012) 7–15