Triadic Factor Analysis Cynthia Vera Glodeanu Technische Universität Dresden, 01062 Dresden, Germany Cynthia_Vera.Glodeanu@mailbox.tu-dresden.de Abstract. This article is an extension of work which suggests using formal concepts as optimal factors of Factor Analysis. They discussed a method for decomposing a p × q binary matrix W into the Boolean matrix product P ◦ Q of a p × n binary matrix P and a n × q binary matrix Q with n as small as possible. We have generalised this factorization problem to the triadic case, looking for a decomposition of a p × q × r Boolean 3d-matrix B into the Boolean 3d-matrix product P ◦ Q ◦ R for p × n, q × n and r × n binary matrices P, Q and R with n as small as possible. The motivation is given by the increasing interest in Triadic Concept Analysis due to Web 2.0 applications. Keywords: Triadic Concept Analysis, Factor Analysis, Web 2.0 1 Introduction and Previous works The interest in decomposition of triadic data in psychology goes back at least to the beginning of the 90’s [2]. However at that time the methodologies were not as developed as nowadays. The interest in triadic data is also increasing due to the structure of Web 2.0 data [4]. This paper contains a new method for factorization of triadic data which could be developed as a generalisation of the dyadic case presented in [1].1 2 Triadic Concept Analysis For an introduction to Formal Concept Analysis we refer the reader to [5]. The triadic approach to Formal Concept Analysis was introduced by Rudolf Wille and Fritz Lehmann in [6]. It is based upon C.S. Pierce’s pragmatic philosophy with his three universal categories. We just give a compressed introduction to this field and refer the interested reader to [6, 7]. Definition 1. A triadic context is defined as a quadruple (K1 , K2 , K3 , Y ) where K1 , K2 and K3 are sets and Y is a ternary relation between K1 , K2 and 1 During the revision period it turned out there is yet unpublished work of R. Be- lohlavek and V. Vychodil dealing with the same subject [3] 128 Cynthia Vera Glodeanu K3 , i.e. Y ⊆ K1 ×K2 ×K3 . The elements of K1 , K2 and K3 are called (formal) objects, attributes and conditions, respectively, and (g, m, b) ∈ Y is read: the object g has the attribute m under the condition b. Small triadic contexts can be represented through three-dimensional cross tables. Examples are given in Table 1 and 2 on page 6 and 11. Definition 2. A triadic concept (shortly triconcept) of a triadic context (K1 , K2 , K3 , Y ) is defined as a triple (A1 , A2 , A3 ) with A1 ⊆ K1 , A2 ⊆ K2 and A3 ⊆ K3 that is maximal with respect to component-wise set inclusion in satisfying A1 × A2 × A3 ⊆ Y , i.e. for X1 ⊆ K1 , X2 ⊆ K2 and X3 ⊆ K3 with X1 × X2 × X3 ⊆ Y , the containments A1 ⊆ X1 , A2 ⊆ X2 , and A3 ⊆ X3 always imply (A1 , A2 , A3 ) = (X1 , X2 , X3 ). For a triconcept (A1 , A2 , A3 ), the components A1 , A2 and A3 are called the extent, the intent, and the modus of (A1 , A2 , A3 ) respectively. Pictorially, a triconcept (A1 , A2 , A3 ) is a rectangular box full of crosses in the three-dimensional cross table representation of (K1 , K2 , K3 , Y ), where ‘box’ is maximal under proper permutation of rows, columns and layers of the cross table. Definition 3. For {i, j, k} = {1, 2, 3} with j < k and for X ⊆ Ki and Z ⊆ Kj × Kk , the (i)-derivation operators are defined by: X 7→ X (i) := {(aj , ak ) ∈ Kj × Kk | (ai , aj , ak ) ∈ Y for all ai ∈ X}, Z 7→ Z (i) := {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Z}. These derivation operators correspond to the derivation operators of the dyadic contexts defined by K(1) := (K1 , K2 × K3 , Y (1) ), K(2) := (K2 , K1 × K3 , Y (2) ) and K(3) := (K3 , K1 × K2 , Y (3) ) where gY (1) (m, b) :⇔ mY (2) (g, b) :⇔ bY (3) (g, m) :⇔ (g, m, b) ∈ Y . Due to the structures of triadic contexts fur- ther derivation operators are necessary for the computation of triconcepts. For {i, j, k} = {1, 2, 3} and for Xi ⊆ Ki , Xj ⊆ Kj and Xk ⊆ Kk the (i, j, Xk )- derivation operators are defined by (i,j,Xk ) Xi 7→ Xi := {aj ∈ Kj | (ai , aj , ak ) ∈ Y for all (ai , ak ) ∈ Xi × Xk }, (i,j,Xk ) Xj 7→ Xj := {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Xj × Xk }. These derivation operators correspond to the derivation operators of the dyadic contexts defined by Kij ij ij Xk := (Ki , Kj , YXk ) where (ai , aj ) ∈ YXk if and only if (ai , aj , ak ) ∈ Y for all ak ∈ Xk . The structure on the set of all triconcepts T(K) is the set inclusion in each component of the triconcept. There is for each i ∈ {1, 2, 3} a quasiorder .i and its corresponding equivalence relation ∼i defined by (A1 , A2 , A3 ) .i (B1 , B2 , B3 ) :⇐⇒ Ai ⊆ Bi and (A1 , A2 , A3 ) ∼i (B1 , B2 , B3 ) :⇐⇒ Ai = Bi (i = 1, 2, 3). Triadic Factor Analysis 129 These quasiorders satisfy the antiordinal dependences: for {i, j, k} = {1, 2, 3}, (A1 , A2 , A3 ) .i (B1 , B2 , B3 ) and (A1 , A2 , A3 ) .j (B1 , B2 , B3 ) imply (A1 , A2 , A3 ) &k (B1 , B2 , B3 ) for all triconcepts (A1 , A2 , A3 ) and (B1 , B2 , B3 ) of K. For i 6= j the relation ∼i ∩ ∼j is the identity on T(K), i.e. if we have two components of the triconcept the third one is uniquely determined by them. [(A1 , A2 , A3 )]i denotes the equivalence class of ∼i containing the triconcept (A1 , A2 , A3 ). The quasiorder .i induces an order ≤i on the factor set T(K)/ ∼i of all equivalence classes of ∼i which is given by [(A1 , A2 , A3 )]i ≤i [(B1 , B2 , B3 )]i ⇔ Ai ⊆ Bi . (T(K)/ ∼1 , ≤1 ), (T(K)/ ∼2 , ≤2 ) and (T(K)/ ∼3 , ≤3 ) can be identified with the ordered set of all extents, intents and modi of K respectively. Generally every ordered set with smallest and greatest element is isomorphic to the ordered set of all extents, intents respectively modi of some triadic context. This means that the extents, intents respectively modi do not form a closure system in general as in the dyadic case. An analogous structure to the concept lattice in the dyadic case is given by T(K) := (T(K), .1 , .2 , .3S ). Let {i, j, k} = {1, 2, 3}, Xi and Xk beSsets of triconcepts of K and Xi := {Ai | (A1 , A2 , A3 ) ∈ Xi } and Xk := {Ak | (A1 , A2 , A3 ) ∈ Xk }. The ik-join of (Xi , Xk ) is defined to be the triconcept ∇(Xi , Xk ) := (B1 , B2 , B3 ) with (i,j,Xk )(i,j,Xk ) Bi := Xi , (i,j,Xk ) Bj := Xi , (i,j,Xk )(i,j,Xk ) (i,j,Xk ) (k) Bk := (Xi × Xi ) . The Basic Theorem of triconcept Analysis [7] proves that T(K) of a triadic context K is a complete trilattice with its ik-joins being exactly the ik-joins obtained by using the derivation operators. Further, every complete trilattice L := (L, .1 , .2 , .3 ) is isomorphic to a concept trilattice of a suitable triadic context for which one can choose (L, L, L, YL ) with YL := {(x1 , x2 , x3 ) ∈ L3 | there exists an u ∈ L with u .i xi for i = 1, 2, 3}. 3 Factor Analysis through Formal Concept Analysis This section contains the main results from [1] and the translation of the problem to Formal Concept Analysis. In [1] an approach to Factor Analysis is presented: a p×q binary matrix W is decomposed into the Boolean matrix product P ◦Q of a p × n binary matrix P and a n × q binary matrix Q with n W as small as possible. n The W Boolean matrix product P ◦ Q is defined as (P ◦ Q)ij = l=1 Pil · Qlj , where is the maximum and · is the product. Through the Boolean matrix product a non-linear relationship between objects, factors and attributes is given. The matrices W , P and Q represent an object-attribute, object-factor and factor-attribute relationship respectively. W = P ◦ Q means that object i is incident with attribute j if and only if there is a factor l which applies to i and contains j. 130 Cynthia Vera Glodeanu The method developed in [1] uses formal concepts as factors which produce decompositions with smallest number of factors possible. Contexts can be seen as Boolean matrices by replacing in the cross table the crosses by 1’s and the blanks by 0’s. In the language of Formal Concept Analysis we give the following definition for factors: Definition 4. A subset of formal concepts F ⊆ B(G, M, I) such that I = S (A,B)∈F A × B is called factorization. If F is minimal with respect to its car- dinality then it is called optimal factorization. The elements of F are called (optimal) factors. For subset F = {(A1 , B1 ), . . . , (An , Bn )} ⊆ B(G, M, I) of formal concepts we construct binary matrices AF and BF as follows: ½ ½ 1 if i ∈ Al 1 if j ∈ Bl (AF )il = , (BF )lj = 0 if i ∈ / Al 0 if j ∈ / Bl We consider a decomposition of the Boolean matrix W associated to (G, M, I) into the Boolean matrix product AF ◦ BF . Theorem 1. Universality of concepts as factors [1] For every W there is F ⊆ B(G, M, I) such that W = AF ◦ BF . Theorem 2. Optimality of concepts as factors [1] Let W = P ◦ Q for p × n and n × q binary matrices P and Q. Then there exists a set F ⊆ B(G, M, I) of formal concepts of W with |F| ≤ n such that for the p × |F| and |F| × q binary matrices AF and BF we have W = AF ◦ BF . W The proofs are based on the fact that a binary matrix can be seen as a - superposition of rectangles full of crosses. Each such rectangle in contained in a maximal rectangle. Every maximal rectangle full of crosses corresponds to a formal concept. Theorem 3. Mandatory factors [1] If W = AF ◦ BF for some subset F ⊆ B(G, M, I) of formal concepts then O(G, M, I) ∩ A(G, M, I) ⊆ F, where O(G, M, I) and A(G, M, I) are the sets of object and attribute concepts respectively. The proof is based on the fact that if one considers a formal concept (A, B) ∈ O(G, M, I) ∩ A(G, M, I), then (A, B) = ({g}pp , {g}p ) = ({m}p , {m}pp ) for some g ∈ G and m ∈ M . The formal concept (A, B) is the only one which contains the tuple (g, m). Theorem 4. [1, 8] The problem to find a decomposition W = P ◦ Q of an p × q binary matrix W into an p × n binary matrix P and a n × q binary matrix Q with n as small as possible is NP-hard and the corresponding decision problem is NP-complete. Triadic Factor Analysis 131 4 Triadic Factor Analysis In this section K is a triadic context. Definition 5. A subset F ⊆ T(K) such that [ Y = (A × B × C) (A,B,C)∈F is called a factorization of the triadic context K. If F is minimal with respect to its cardinality then it is called optimal factorization. The elements of F are called (optimal) factors. In the dyadic case we have worked with Boolean matrices. For the triadic case we need the notion of a Boolean 3d-matrix (shortly 3d-matrix) which is a rectangular box Bp×q×r such that bijk ∈ {0, 1} for all i ∈ {1, . . . , p}, j ∈ {1, . . . , q}, k ∈ {1, . . . , r}. For a 3d-matrix B we write B = B1 | . . . | Br where the Bk with k ∈ {1, . . . , r} are p × q binary matrices called layers. For a triadic context (K1 , K2 , K3 , Y ) with |K1 | = p, |K2 | = q, |K3 | = r we obtain the corresponding 3d-matrix by replacing in the three-dimensional cross table the crosses by 1’s and the blanks by 0’s. In order to define the 3d Boolean matrix product for a factorization F = {(A1 , B1 , C1 ), . . . , (An , Bn , Cn )} of a triadic context K we first have to build matrices AF , BF , CF : AF is defined as in the dyadic case and ½ ½ 1 if j ∈ Bl 1 if k ∈ Cl (BF )jl = , (CF )kl = 0 if j ∈ / Bl 0 if k ∈ / Cl for all l ∈ {1, . . . , n}. AF , BF , CF represent relationships between objects and factors, attributes and factors, conditions and factors respectively, i.e (AF )il = 1 means that ob- ject i can be described through factor l, (BF )jl = 1 means that attribute j is contained in factor l, and (CF )kl = 1 means that the factor l exists under the condition k. Definition 6. For p×n, q×n and r×n binary matrices P, Q and R the Boolean 3d-matrix product (shortly 3d-product) is defined as a ternary operation: n _ (P ◦ Q ◦ R)ijk := Pil · Qjl · Rkl l=1 with i ∈ {1, . . . , p}, j ∈ {1, . . . , q} and k ∈ {1, . . . , r}. For the 3d-product defined above we consider two equivalent expressions. The first one is given by: n _ (P ∗ Q ◦ R)ijk := (P ∗ Q)(ij)l · Rkl with l=1 (P ∗ Q)(ij)l := Pil · Qjl 132 Cynthia Vera Glodeanu where P ∗ Q is a p × q × n 3d-matrix and its n layers result from the Boolean multiplication of the l-th column P l of P and the l-th column Q l of Q. If we replace P and Q by AF and BF respectively then a layer l with l ∈ {1, . . . , n} of the 3d-matrix describes the relationship between the objects and attributes of the l-th factor. Furthermore we have to check which factors exist under which conditions. Therefore we take R = CF which describes the relationship between conditions and factors. The value of (P ∗ Q ◦ R)ijk is the maximum over the product between (P ∗ Q)(ij)l and Rkl for each l ∈ {1, . . . , n}. Finally P ∗ Q ◦ R corresponds to the 3d-matrix we wanted to decompose. This 3d-product mimics the best the situation of the dyadic case, since P ∗ Q describes the relation between objects and attributes. Because in the triadic case these tuples belong to different layers, we have to check under which conditions the tuples exist. Consider the triadic context K = ({1, 2, 3}, {a, b, c}, {b1 , b2 }, Y ) given by the three-dimensional cross table in Table 1. and the corresponding optimal factor- b1 b2 a b c a b c 1 ×× × 2 ×× × 3 × × × × Table 1. Example of triadic context ization F = {({1, 2}, {a, b}, b1 ), (3, {a, c}, {b1 , b2 }), ({1, 2}, b, {b1 , b2 })}. Then the 3d-product is given by:     101 110 µ ¶ 111 AF ∗ BF ◦ CF =  1 0 1  ∗  1 0 1  ◦ = 011 010 010   110|000|010 µ ¶ 111 = 1 1 0 | 0 0 0 | 0 1 0 · = 011 000|101|000       110|000 _ 000|000 _ 010|010 = 1 1 0 | 0 0 0 0 0 0 | 0 0 0 0 1 0 | 0 1 0 000|000 101|101 000|000 Another Boolean 3d-matrix product is given by: n _ (P ◦ Q ∗ R)ijk := Pil · (Q ∗ R)(jk)l with l=1 (Q ∗ R)(jk)l := Qjl · Rkl Triadic Factor Analysis 133 This 3d matrix product is very similar to the above one. However the differ- ence is in the interpretation. The operations ◦, ∗ and · are defined as above but carried out on different matrices. If we replace P, Q and R by AF , BF and CF respectively then the lth layer of BF ∗ CF describes the relationship between attributes and conditions for the l-th factors with l ∈ {1, . . . , n}. P l · (Q ∗ R)( )l describes the relation between the objects and pairs of attributes and conditions for the l-th factor. For the triadic context from above we obtain the following 3d-product:     101 10|11|00 AF ◦ BF ∗ CF =  1 0 1  ·  1 0 | 0 0 | 1 1  = 010 00|11|00       110|000 _ 000|000 _ 010|010 = 1 1 0 | 0 0 0 0 0 0 | 0 0 0 0 1 0 | 0 1 0 000|000 101|101 000|000 We give the triadic versions of the theorems presented in [1] showing that an optimal factorization of a triadic context can be obtained by triconcepts. The proofs are also very similar to the dyadic case, the difference consists in considering a different matrix product. Theorem 5. For every 3d-matrix B there is F ⊆ T(K) such that: B = AF ◦ BF ◦ CF . Proof. Bijk = 1 iff there is a triconcept (A, B, C) ∈ T(K) such that i ∈ A, j ∈ B and k ∈ C. If we choose for F the set of all triconcepts T(K) then (AF ◦ BF ◦ CF )ijk = 1 iff there is l such that (AF )il = 1, (BF )jl = 1 and (CF )kl = 1 iff there is (Al , Bl , Cl ) ∈ T(K) with i ∈ Al , j ∈ Bl and k ∈ Cl iff Bijk = 1. ⊓ ⊔ Using triconcepts as factors we obtain a factorization with smallest number of factors possible as stated by the next theorem: Theorem 6. Let K = (K1 , K2 , K3 , Y ) be a triadic context and B the corre- sponding 3d-matrix such that B = P ◦ Q ◦ R for p × n, q × n and r × n binary matrices P, Q and R. Then there exists a set F ⊆ T(K) of triconcepts with |F| ≤ n such that for the p × |F|, q × |F| and r × |F| binary matrices AF , BF and CF we have B = AF ◦ BF ◦ CF . Proof. Recall that triconcepts are maximal rectangular boxes full of 1’s which are maximal w.r.t. the component wise set inclusion. Through B = P ◦ W Q ◦ R for p × n, q × n and r × n binary matrices P, Q and R, B is written as a -superposition, as a union of n rectangular boxes full of 1’s, but these rectangular boxes do not have to be maximal. B = Pi1 · Qj1 · Rk1 ∨ · · · ∨ Pin · Qjn · Rkn with i ∈ {1, . . . , p}, j ∈ {1, . . . , q} and k ∈ {1, . . . , r}. B is the union of rectangular boxes Bl = P l ◦ Q l ◦ R l . B = P ◦ Q ◦ R for p × n, q × n 134 Cynthia Vera Glodeanu and r × n binary matrices P, Q and R with corresponding rectangular boxes B1 , . . . , Bn . Each such rectangular box Bl with l ∈ {1, . . . , n} is contained in a maximal rectangular box B̃l of B, i.e. (Bl )ijk ≤ (B̃l )ijk for all i, j, k. Every maximal rectangular box B̃l corresponds to a triconcept (Al , Bl , Cl ) in that (B̃l )ijk = 1 iff i ∈ Al , j ∈ Bl and k ∈ Cl . Since n _ n _ Bijk = (Bl )ijk ≤ (B̃l )ijk ≤ Bijk l=1 l=1 W a -superposition of maximal rectangular boxes B̃l yields B. Putting therefore F = {(A1 , B1 , C1 ), . . . , (An , Bn , Cn )} we get B = AF ◦ BF ◦ CF . Further |F| © n may happen because two distinct rectangular boxes can be contained in the same maximal rectangular box. ⊓ ⊔ The next definition contains the notion of d-cut which is actually a special case of Kij ij Xk = (Ki , Kj , YXk ) for Xk ⊆ Kk and |Xk | = 1. Each d-cut is itself a dyadic context. Definition 7. For a triadic context K = (K1 , K2 , K3 , Y ) a dyadic-cut (d-cut) is defined as: ciα := (Kj , Kk , Yαjk ) where {i, j, k} = {1, 2, 3} and α ∈ Ki . For every triadic context there are three families of d-cuts: c1 := {c1g := (K2 , K3 , Yg23 )}g∈K1 (1) c2 := {c2m := (K1 , K3 , Ym13 )}m∈K2 (2) c3 := {c3b := (K1 , K2 , Yb12 )}b∈K3 (3) Hence, (3) represents horizontal cuts in K for each condition b ∈ K3 . The family {c3b }b∈K3 of d-cuts contains (at most) |K3 | d-cuts. Such a d-cut is itself a dyadic context, namely (K1 , K2 , Yb12 ), and represents the relations between the object and attribute set of K under the condition b ∈ K3 . (1) represents vertical cuts in K for each object g ∈ K1 . Such a d-cut contains the relationship between all the attributes and conditions for the object which generated the d-cut. (2) represents also vertical cuts in K but this time one cuts for every attribute m ∈ K2 . Such a d-cut contains the relationship between all the objects and conditions for the attribute which generated the d-cut. Obviously one can reconstruct the triadic context K from the d-cuts by ’glu- ing’ them together. For the d-cut families of a triadic context the following equations hold: [ [ [ Y = {g} × Yg23 = {m} × Ym13 = {b} × Yb12 . g∈K1 m∈K2 b∈K3 Since each d-cut is a dyadic context, we can compute its concepts. For a d-cut ciα = (Kj , Kk , Yαjk ) its concepts are the pairs (Aj , Ak ) such that for all {j, k} ⊆ Triadic Factor Analysis 135 (j,k,α) {1, 2, 3} with j < k and Aj ⊆ Kj , Ak ⊆ Kk , α ∈ Ai we have Aj 7→ Aj (j,k,α) and Ak 7→ Ak with the derivation operators given in Section 3. From this concept of the d-cut we can compute the corresponding triconcept (A1 , A2 , A3 ) (i) with (Aj , Ak ) given as above and Ai 7→ Ai for all {i, j, k} = {1, 2, 3}. For different d-cuts of a d-cut family we may have identical dyadic contexts. This happens whenever the relationship between Ki and Kj are the same for some elements from Kk with {i, j, k} = {1, 2, 3}. Definition 8. A factor (A1 , A2 , A3 ) is called mandatory if and only if there exists a tuple (g, m, b) ∈ Y such that (A1 , A2 , A3 ) is the only triconcept satisfying: (g, m, b) ∈ A1 × A2 × A3 . Theorem 7. A factor (A1 , A2 , A3 ) of a factorization B = AF ◦ BF ◦ CF for a subset F ⊆ T(K) of triconcepts is mandatory if and only if it is of the form (1,2,x3 )(1,2,x3 ) (1,2,x3 ) (1,2,x3 )(1,2,x3 ) (1,2,x3 ) (3) (A1 , A2 , A3 ) = (x1 , x1 , (x1 × x1 ) ) (4) (1,2,x3 ) (1,2,x3 )(1,2,x3 ) (1,2,x3 ) (1,2,x3 )(1,2,x3 ) (3) = (x2 , x2 , (x2 × x2 ) ) (5) (1,3,x2 )(1,3,x2 ) 1,3,x2 )(1,3,x2 ) (1,2,x3 ) (2) (1,3,x2 ) = (x1 , (x1 × x1 ) , x1 ) (6) (1,3,x2 ) (1,3,x2 ) (1,2,x3 )(1,3,x2 ) (2) (1,3,x2 )(1,3,x2 ) = (x3 , (x3 × x3 ) , x3 ) (7) (2,3,x1 )(2,3,x1 ) (2,3,x1 ) (1) (2,3,x1 )(2,3,x1 ) (2,3,x1 ) = ((x2 × x2 ) , x2 , x2 ) (8) (2,3,x1 ) (2,3,x1 )(2,3,x1 ) (1) (2,3,x1 ) (2,3,x1 )(2,3,x1 ) = ((x3 × x3 ) , x3 , x3 ) (9) for fixed xι ∈ Kι with ι ∈ {1, 2, 3}. Remark 1. We know from the dyadic case that the mandatory factors are those which are object and attribute concepts simultaneously. In the triadic case we have for the d-cut family c1 attribute and condition triconcepts ((4), (5)), for c2 object and condition triconcepts ((6), (7)) and for c3 object and attribute triconcepts ((8), (9)). A factor is mandatory in the factorization of a triadic context if and only if it is a mandatory factor for at least one d-cut factorization of each d-cut family. Proof. (A1 , A2 , A3 ) as written in (4) and (5) is the only triconcept such that xι ∈ Aι with ι ∈ {1, 2, 3} in the d-cut c3x3 . Suppose there is another triconcept (B1 , B2 , B3 ) 6= (A1 , A2 , A3 ) from c3x3 . If x1 ∈ B1 then due to the properties of the triadic derivation operators we obtain: (1,2,x3 )(1,2,x3 ) (1,2,x3 )(1,2,x3 ) A1 = x1 ⊆ B1 = B1 . Since (A1 , A2 , A3 ) and (B1 , B2 , B3 ) (1,2,x3 ) (1,2,x3 ) are triconcepts we have A2 = A1 ⊇ B1 = B2 . On the other hand (1,2,x3 )(1,2,x3 ) (1,2,x3 )(1,2,x3 ) from x2 ∈ B2 follows A2 = x2 ⊆ B2 = B2 . Altogether (1,2,x3 ) (1,2,x3 ) we obtain B2 = A2 and A1 = A2 = B2 = B1 . Since two components of the triconcept uniquely determine the third one, we also have B3 = A3 . This means that (A1 , A2 , A3 ) = (B1 , B2 , B3 ). The triconcept (A1 , A2 , A3 ) is the only one which contains the tuple (x1 , x2 , x3 ) in the d-cut c3x3 . 136 Cynthia Vera Glodeanu Analogously we consider the cases (6), (7) and (8), (9). If a triconcept (A1 , A2 , A3 ) can be written in the form of (4) − (9) then is the only triconcept from T(K) which contains the tuple (x1 , x2 , x3 ). ⊓ ⊔ A possibility to compute the factorization of a triadic context is through the d-cuts. One determines the optimal factorization of every d-cut of a d-cut family and takes their disjoint union as the factorization of the triadic context. However this method is naive and laborious since it may happen that we compute the same triconcept several times. Finally we also have to check the disjointnes of the dyadic factorizations. However Algorithm 1 eliminates this unnecessary verification. The number of triconcepts even for a small triadic context can be quite big and searching for factors in T(K) can get cumbersome. On the other hand the ik- joins obtained by triconcepts from different d-cuts usually increase the number of factors since they cover just partially the incidence of the ik-join components. It follows trivially from the dyadic case that finding a factorization of a 3d- matrix B into the 3d product P ◦ Q ◦ R for p × n, q × n and r × n binary matrices P, Q and R is NP-hard as well. Algorithm 1 is the adoptions to the triadic case of the greedy approximation algorithm presented in [1]:2 Algorithm 1 Input: B (Boolean 3d-matrix) Output: F (set of factor concepts) U := {(i, j, k) | Bi,j,k = 1} for k := 1 to |K3 | Uk := {(i, j, k) | Bi,j,k = 1} F := ∅ for k := 1 to |K3 | while U 6= ∅ or Uk 6= ∅: B := ∅ V := 0 while there is j ∈ / B such that |B ⊕ j| ª V : do select j ∈ / B that maximizes B ⊕ j: B := (B ∪ j)(1,2,k)(1,2,k) V := |(B (1,2,k) × B) ∩ U | A := B (1,2,k) C := (A × B)(3) add (A, B, C) to F for each (i, j, k) ∈ A × B × C: remove (i, j, k) from U and from Uk return F 2 see also [3] for another algorithm and test data Triadic Factor Analysis 137 Thereby B ⊕ j = ((B ∪ j)(1,2,k) × (B ∪ j)(1,2,k)(1,2,k) ) ∩ U. In this algorithm we consider the d-cut family of the conditions. Alternatively one can choose also another d-cut family. As presented in [1] for the dyadic S case if (A, B) is a concept of a d-cut then the intent can be written as B = j∈B j (1,2,k)(1,2,k) . Further if j ∈ / B then ((B ∪ {j})(1,2,k) , (B ∪ {j})(1,2,k)(1,2,k) ) is a concept with B ⊂ (B ∪ {j})(1,2,k)(1,2,k) . This algorithm selects the triconcepts which cover as much of the incidence relation as possible. Finally we compute the third component of the triconcept. The sets U and Uk with k ∈ {1, . . . , |K3 |} assure that we do not compute unnecessary triconcepts. It may happen that for different d-cut families we obtain different factoriza- tion from which some are optimal and others not. It may happen that there exists aStriconcept (A1 , A2 , A3 ) in the factorization of a d-cut such that A1 ×A2 ×A3 ⊆ (B1 ,B2 ,B3 )∈F \(A1 ,A2 ,A3 ) B1 × B2 × B3 . An improvement of Algorithm 1 can be done by searching and removing such factors from the factorization. The example we give is a small one and serves as an illustration of the meth- ods developed in this paper. It is however a paradigmatic one since it reflects the usual setting in Web 2.0 applications. The object set contains the hostels {Nuevo Suizo, Samay, Oasis Backpacker, One, Ole Backpacker, Garden Backpacker}, the attribute set is given by the services {character, safety, location, staff, fun, cleanliness}. Since we have chosen the hostels with best ratings, the attribute values are - (good) or , (excellent) and are considered as tags. In the triadic context we make a cross in the corresponding line of object, attribute and condi- tion if the service was rated as excellent. The conditions are given by the overall rating of users from { [9],[10],[11] }. We number consecutively the elements from K1 , K2 and K3 , i.e Ki := {0, . . . , |Ki |} with i ∈ {1, 2, 3}. 0 1 2 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 × ×× ×× 1 ××× × ××× × ×××× × 2 ×××× × ×××××× ××××× 3 ×× × × ×××××× ×××××× 4 ×× × × ×××××× ×××××× 5 ×× × ×××××× ×××× × Table 2. Triadic context of hostels This triadic context has 18 triconcepts. To obtain the optimal factorization of the triadic context we consider the optimal factorizations of every d-cut from {c3b }b∈K3 . The factorization of the d-cuts are given by F(c11 ) = {({0, 1, 2, 5}, 2, K3 ), ({2, 3, 4}, {0, 1, 3, 5}, {0, 1}), ({1, 2, 5}, {2, 3, 5}, K3 ), ({1, 2, 3, 4}, {1, 3, 5}, K3 )}, F(c12 ) = {(K1 , {2, 3}, {1, 2}), ({2, 3, 4, 5}, K2 , 1), ({1, 2, 3, 4, 5}, {1, 2, 3, 5}, {1, 2})} and F(c13 ) = {(K1 , {2, 3}, {1, 2}), ({2, 3, 4}, {1, 2, 3, 4, 5}, {1, 2}), ({1, 3, 4, 138 Cynthia Vera Glodeanu 5}, {0, 1, 2, 3, 5}, 2)}. Obviously F(c12 ) and F(c13 ) have a common triconcept. We can reduce the number of factors even more, since the incidence covered by the factor ({1, 2, 3, 4, 5}, {1, 2, 3, 5}, {1, 2}) ∈ F(c12 ) is already covered by the factors (K1 , {2, 3}, {1, 2}) ∈ F(c12 ) and ({1, 2, 3, 4}, {1, 3, 5}, K3 ) ∈ F(c11 ). Algo- rithm 1 avoids including these ’unnecessary’ triconcepts into the factorization. The factorization of the triadic context is given by F = F(c11 )∪F(c ˙ 1 ˙ 1 2 )∪F(c3 ) \ {({1, 2, 3, 4, 5}, {1, 2, 3, 5}, {1, 2})}. The factors have also a verbal description, as for example the first one stand for very safe since the users from all platforms have this point of view, the hostels from the extent of ({2, 3, 4, 5}, K2 , 1) are the best deals according to the users from the second platform. On the other hand all users agreed that the hostels 1, 2, 5 are excellent concerning safety, location and fun and that the hostels 1, 2, 3, 4 are excellent concerning character, location and fun. The factorization offers the possibility to describe the hostels through 8 fac- tors while in the triadic context they are described through 6 attributes under 3 conditions. The 3d-product can be written in a similar way like in the above examples. We omit the details due to lack of space. 5 Conclusion We have presented a factorization method of triadic data such that the number of factors is as small as possible. In this paper we have also illustrated the developed mathematical and algo- rithmic techniques through a paradigmatic example. References 1. Belohlávek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1) (2010) 3–20 2. Krolak-Schwerdt, S., Orlik, P., Ganter, B.: TRIPAT: a model for analyzing three- mode binary data. In Bock, H.H., Lenski, W., Richter, M.M., eds.: Studies in Classification, Data Analysis, and Knowledge Organization. Volume 4 of Informa- tion systems and data analysis. Springer, Berlin (1994) 298–307 3. Belohlávek, R., Vychodil, V.: Optimal factorization of three-way binary data. IEEE Symposium on Foundations and Practice of Data Mining in GrC 2010 (2010) 61–66 4. Bibsonomy: http://www.bibsonomy.org/ 5. Ganter, B., Wille, R.: Formale Begriffsanalyse: Mathematische Grundlagen. (1996) 6. Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In Ellis, G., Levinson, R., Rich, W., Sowa, J.F., eds.: ICCS. Volume 954 of Lecture Notes in Computer Science., Springer (1995) 32–43 7. Wille, R.: The basic theorem of triadic concept analysis. Order 12 (1995) 149–158 8. Markowsky, G., Nau, D., Woodbury, M.A., Amos, D.: A mathematical analysis of human leukocyte antigen serology. Math. Biosciences 40 (1978) 9. Hostelworld: http://www1.hostelworld.com 10. Hostels: http://www.hostels.com 11. HostelBookers.com Ltd.: http://www.hostelbookers.com