On Suboptimality of GreConD for Boolean Matrix Factorisation of Contranominal Scales Dmitry I. Ignatov1,2 and Alexandra Yakovleva1 1 National Research University Higher School of Economics, Moscow 2 St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, Russia dignatov@hse.ru,yakovlevalexandra@yandex.ru Abstract. In this paper we study certain properties of the GreConD algorithm for Boolean matrix factorisation, a popular technique in Data Mining with binary relational data. This greedy algorithm was inspired by the fact that the optimal number of factors for the Boolean matrix factorisation can be chosen among the formal concepts of the correspond- ing formal context. In particular, we consider one of the hardest cases (in terms of the numerous of possible factors), the so-called contranominal scales, and show that the output of GreConD is not optimal in this case. Moreover, we formally analyse its output by means of recurrences and generating functions and provide the reader with the closed form for the returned number of factors. An algorithm generating the optimal num- ber of factors and the corresponding product matrices P and Q is also provided by us for the case of contranominal scales. Keywords: Boolean Matrix Factorisation, Formal Concept Analysis, Schein rank, generating functions, greedy algorithms 1 Introduction Boolean data analysis and Formal Concept Analysis are closely related [1]. For example, Boolean matrices describing binary relations can be considered as for- mal contexts and vice versa, and decomposition of Boolean matrices into the product of two Boolean matrices of possibly smaller sizes is one of such cross- roads where two disciplines meet each other. Thus, it was shown that the optimal number of factors, that is the minimal size of common dimension of these two product matrices, can be found based on the family of corresponding formal con- cepts considered as factors for the original Boolean matrix [2]. Decomposition of object-attribute matrices into products of object-factor and factor-attribute matrices plays important role in Machine Learning and Data Mining [3]. One of the desired properties is the dimensional reduction that normally preserves with high accuracy similarly between objects or attributes in terms of dot product and makes it possible to recover the input matrix [4]. For example, in collab- orative filtering domain Boolean Matrix Factorisation (BMF) was on par with the (truncated) Singular Value Decomposition approach in terms of obtained Copyright c 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). quality metrics [5,6]. It speeds up the computation on the decomposed matrices and allows finding homogeneous taste communities as those latent factors. Another fruitful property of Boolean matrices is their cheap bit represen- tation and related bit operations. The only obstacle for Boolean Matrix Fac- torisation to be widely adopted technique so far is that of determination of the optimal number factors k for Boolean matrices or Schein rank is NP-hard prob- lem [7,2]. So, every good approximate algorithm geared towards minimisation of the number of factors can be taken into account [8]. One of the earlier proposed algorithm for BMF is GreConD. It follows a greedy strategy adding attributes one-by-one with subsequent computation of their closures and is not optimal in general. In this paper we address one very important for practice case of the input for this algorithm, the contranominal scale of arbitrary size n, i.e. square Boolean matrix with all ones except the main diagonal [9]. It is well-known that the number of patterns (formal concepts) for this case is 2n . It is easy to show experimentally that GreConD is not optimal for this particular case by comparing its output with the theoretically deduced values of Schein rank for contranominal scales. However, the output solution follows an interesting pattern deserving a special treatment in terms of recurrences and generating functions. It allows us to formally analyse the discrepancy between this suboptimal solution and theoretically optimal one. Moreover, to know the theoretically optimal solution as the number of factors does not mean to provide a concrete factorisation. To fill the gap, we sketch a correct algorithm to this end. The paper is organised as follows. In Section 2, we recall the reader the basic definitions of FCA and BMF and describe GreConD algorithm. In Section 3, we shortly describe GreConD with its pseudocode. In Section 4, we provide the reader with our experimental and theoretical analyses of the algorithm’s subop- timality. The penultimate section, Section 5, presents the optimal algorithm to find BMF for formal contexts of contranominal scales. Finally, Section 6 briefly discusses future prospects and concludes the paper. 2 Boolean Matrix Factorisation and GreConD 2.1 BMF based on Formal Concept Analysis Basic FCA definitions. Formal Concept Analysis (FCA) is a branch of ap- plied algebra and it studies (formal) concepts and their hierarchies [10]. The adjective “formal” indicates a strict mathematical definition of a pair of sets, called, the extent and intent. This formalisation is possible because of the use of the algebraic lattice theory. Definition 1. Formal context K is a triple (G, M, I), where G is a set of ob- jects, M is a set of attributes, and I ⊆ G × M is an incidence binary relation. The binary relation I is interpreted as follows: for g ∈ G, m ∈ M we write gIm if the object g has the attribute m. For a formal context K = (G, M, I) and any A ⊆ G and B ⊆ M a pair of mappings is defined: A↑ = {m ∈ M | gIm for all g ∈ A}, B ↓ = {g ∈ G | gIm for all m ∈ B}, these mappings define Galois connection between partially ordered sets (2G , ⊆) and (2M , ⊆) on disjunctive union of G and M . The set A is called closed set, if A↑↓ = A [11]. Definition 2. A formal concept of the formal context K = (G, M, I) is a pair (A, B), where A ⊆ G, B ⊆ M , A↑ = B and B ↓ = A. The set A is called the extent, and B is the intent of the formal concept (A, B). It is evident that the extent and intent of any formal concept are closed sets. The set of all formal concepts of a context K is denoted by B(G, M, I). The state-of-the-art surveys on advances in FCA theory and its applications can be found in [12,13]. Description of FCA-based BMF. Boolean Matrix Factorisation is a decom- position of the original matrix I ∈ {0, 1}n×m , where Iij ∈ {0, 1}, into a Boolean matrix product P ◦ Q of binary matrices P ∈ {0, 1}n×k and Q ∈ {0, 1}k×m for the smallest possible number of k. We define Boolean matrix product as follows: k _ (P ◦ Q)ij = Pil · Qlj , l=1 W where denotes disjunction, and · conjunction. For example, in collaborative filtering, matrix I can be considered as a matrix of binary relation between set X of objects (users), and a set Y of attributes (items that users have evaluated). In this case, we assume that xIy iff the user x evaluated object y. The triple (X, Y, I) naturally forms a formal context. Consider a set F ⊆ B(X, Y, I), a subset of all formal concepts of context (X, Y, I), and introduce matrices PF and QF :   1, i ∈ Al , 1, j ∈ Bl , (PF )il = (QF )lj = , 0, i ∈ / Al , 0, j ∈ / Bl . where (Al , Bl ) is a formal concept from F. We can consider decomposition of the matrix I into binary matrix product PF and QF as described above. The following theorems are proved in [2]: Theorem 1. (Universality of formal concepts as factors). For every I there is F ⊆ B(X, Y, I), such that I = PF ◦ QF . Theorem 2. (Optimality of formal concepts as factors). Let I = P ◦ Q for n×k and k×m binary matrices P and Q. Then there exists a F ⊆ B(X, Y, I) of formal concepts of I such that |F| ≤ k and for the n × |F| and |F| × m binary matrices PF and QF we have I = PF ◦ QF . There are several algorithms for finding PF and QF by calculating formal con- cepts based on these theorems [2]. 3 GreConD There are several algorithms for finding PF and QF by calculating formal con- cepts based on aforementioned theorems [2]. This paper studies the work of GreConD (Algoritm 2 from [2]), one of the existing algorithms for BMF. Gre- ConD avoids computation of all possible formal concepts and therefore works much faster [2]. Time estimation of the calculations in the worst case yields O(k|G||M |3 ) [5], where k is the number of found factors (and can be omitted as a constant term), |G| is the number of objects, |M | is the number of attributes. Define U = {hi, ji|Ii,j = 1} for a Boolean matrix I. The main idea of the algorithm is to maximize the set D ⊕ y := ((D ∪ {y})↓ × (D ∪ {y})↓↑ ) ∩ U successively adding columns to intent D of formal concept (C, D). Below we provide pseudocode for GreConD. Algorithm 3.1 GreConD 1: INPUT: I (Boolean matrix) 2: OUTPUT: F (set of factor concepts) 3: 4: U ← {hi, ji|Ii,j = 1} 5: F ← ∅ 6: while U 6= ∅ do 7: D←∅ 8: V ←0 9: while there is j ∈ / D such that |D ⊕ j| > V do 10: select j ∈ / D that maximizes |D ⊕ j| 11: D ← (D ∪ {j})↓↑ 12: V ← |(D↓ × D) ∩ U| 13: end while 14: C ← D↓ 15: add (C, D) to F 16: for each hi, ji ∈ C × D do 17: remove hi, ji from U 18: end for 19: end while 20: return F The set U contains not yet covered object-attribute pairs by any of the pre- viously found factors. When the newly found factor (C, D) is added to F, all the pairs from C × D should be deleted from U (lines 15-18). When U is empty, the GreConD terminates (line 6, the main loop). The inner loop (lines 9–13) maximizes the cardinality D ⊕ j while it is still possible by examining attributes not in D. 4 GreConD on contranominal scale In this section we show that GreConD is optimal for ordinal and nominal scales, but not optimal on contranominal scale. We also construct an optimal algorithm for contranominal scale. 4.1 Optimality on ordinal and nominal scales In FCA, scales are used to represent the so-called multi-valued contexts (cf. relational tables in databases) as one-valued contexts; the latter we also consider here as Boolean matrices. First, let us consider two elementary scales. The nominal scale is defined as a formal context Nn = ({1, . . . , n}, {1, . . . , n}, =) and is used to scale mutually exclusive attributes like traffic light signals (red, green, yellow). The ordinal scale is defined as On = ({1, . . . , n}, {1, . . . , n}, ≤) and is applied in cases where the values are ordered like university grades (poor, normal, good, excellent). It follows from our experiment that the number of factors obtained by Gre- ConD on ordinal and nominal scales are equal to the size of scales. We can prove that these numbers are optimal. Proposition 1. The number of factors n obtained by GreConD for a nominal scale Nn is optimal. Proof. Note that for a nominal scale of size n any concept with nonempty extent and intent has the form ({i}, {i}) (i ∈ {1, . . . , n}). Furthermore, the number of formal concepts is equal to the number of factors by definition. t u Proposition 2. The number of factors n obtained by GreConD for an ordinal On is optimal. Proof. Note that for the ordinal scale of size n and for any nonempty A ⊆ {max(A), . . . , n} it holds that A↑ = {1, . . . , n}. Besides, {max(A), . . . , n}↓ = {1, . . . , max(A)}. Therefore, concepts for the ordinal scale are ({1, . . . , k}, {k, . . . , n}) for k ∈ {1, . . . , n}. Since GreConD needs to cover every object-attribute pair, each pair ({i}, {i}) for i ∈ {1, . . . , n} should be covered as well, which requires exactly n concepts ({1, . . . , i}, {i, . . . , n}). t u 4.2 Suboptimality on contranominal scale For every set S the contranominal scale is defined as NcS = (S, S, 6=). In what follows, we consider Ncn with S = {1, . . . , n} without loss of generality. Factorizing contranominal scales of sizes from 1 to 128 by GreConD3 we ob- tain a sequence of the number of factors an (n is the size of a scale): a1 = 0, a2 = 2, a3 = 3, a4 = 4, 3 Our Python implementation of GreConD for these experiments: https://bit.ly/ GreConDsub Fig. 1. The number of factors for GreConD and its theoretical optimum for contra- nominal scales of increasing size. a5 = a6 = a7 = 5, a8 = 6, a9 = · · · = a15 = 7, a16 = 8, a17 = · · · = a31 = 9, a32 = 10, a33 = · · · = a63 = 11, a64 = 12, a65 = · · · = a127 = 13, a128 = 14 . . . Note that the number of obtained factors increases by one when the size of a scale is a power of two or a power of two plus one. The sequence can be defined as follows: a1 = 0, a2 = 2, an = 2 log2 n if ∃k : n = 2k , an = a2blog2 nc + 1 for other n. Thus analytic form for the sequence is the following. Conjecture. The number of factors obtained by GreConD on contranominal scale is described by the sequence an = 2 · blog2 nc + 1 − [n = 2blog2 nc ], n being the size of a scale. One way to obtain a simpler closed form of the considered sequence is to analyse its generating P function [14]. Let G(z) = an z n is the associated generating function for the sequence n an . The sequence an can be rewritten in the following way: a1 = 0, a2 = 2, while an = an−1 +blog2 nc−blog2 (n−1)c+dlog2 ne−dlog2 (n−1)e. One can check that one of the respective differences of rounded logarithms takes on 1 when n = 2k or n − 1 = 2k for some k > 0. Let us sum an z n as follows: X X X an z n = an−1 z n + (blog2 nc−blog2 (n−1)c+dlog2 ne−dlog2 (n−1)e)z n n≥2 n≥2 n≥2 Let Un = dlog2 ne and Ln = blog2 nc, then X X X X G(z) = zG(z) + Ln z n − Ln−1 z n + Un z n − Un−1 z n . n≥2 n≥2 n≥2 n≥2 P P Now, let L(z) = Ln z n and U (z) Un z n , then n≥2 n≥2 G(z) = zG(z) + L(z) − zL(z) + U (z) − zU (z) or G(z)(1 − z) = (L(z) + U (z))(1 − z) . For z 6= 1 we have an = [z n ]G(z) = blog2 nc + dlog2 ne . Next, we show that the number of factors obtained by GreConD on contra- nominal scale is not optimal. First, we provide the definition of Schein rank. Definition 3. [15] For vectors v, w the matrix (vi wj ) is called cross-vector4 . Definition 4. [15] Schein rank of a Boolean matrix A is the least number of Boolean cross-vectors summing up to A. Theorem 3. [16] Schein rank of contranominal scale of size n equals N(n), l where l = N (k) (k ∈ N) is defined as the least number, such that k ≤ bl/2c . We also provide several first values of N (n)5 : N (1) = 1, N (2) = 2, N (3) = 3, N (4) = N (5) = N (6) = 4, N (7) = · · · = N (10) = 4 Note that we deal with column vectors according to data analysis conventions; so, (vi wj ) is the outer product of v and w. 5 See also OEIS sequence A305233: https://oeis.org/A305233 5, N (11) = · · · = N (20) = 6, N (21) = · · · = N (35) = 7, N (36) = · · · = N (70) = 8, N (71) = · · · = N (126) = 9, N (127) = · · · = N (252) = 10 . . . Note that for contranominal scales of sizes 2, 3, 4, 7 GreConD does find Schein rank, i.e. optimal number of factors. However, for the remaining sizes (n > 1) GreConD finds suboptimal number of factors. 5 Optimal algorithm for contranominal scale Let us construct an algorithm that would factorize contranominal scale with optimal number of factors. We use Sperner’s theorem. Definition 5. A family of incomparable (with respect to set inclusion) sets is called a Sperner family, or an antichain of sets. Theorem 4. [17] (Sperner)  For an n-element set the size of a largest antichain n does not exceed bn/2c . Equality holds iff an antichain consists of all subsets of size dn/2e or all subsets of size bn/2c. Theorem 3 [16] states that the optimal number of factors for contranominal scale of size n is equal to N (n). Therefore, BMF with the optimal number of factors (we call it optimal BMF) has object-factor matrix of size n × N (n). From the proof [16] it follows that the minimal set of factors for contranominal scale is an antichain. Next, we show how to find an antichain of a given length n. Let us find all combinations of elements from the set {1, . . . , N (n)} by bN (n)/2c elements (for example, by Algorithm T from [18][p. 359]). Note that by Sperner’s theorem a set of those combinations is the largest antichain for the N (n)-element  set of factors. Also, bNN(n)/2c (n) ≥ n by definition of N (n). Next, for every com- bination we make a binary vector r of length N (n) with ri = 1 ⇐⇒ the corresponding combination contains the element i. Finally, we obtain object- factor matrix by choosing an n-element subset of binary vectors and placing it in object-factor matrix. Based on the constructed object-factor matrix, we find the factor-attribute matrix. We apply the derivation operator ↓ to every factor f in the object- factor matrix, then we apply the derivation operator operator ↑ to the set of the obtained objects in object-attribute matrix. Finally, we make binary row for the obtained set of attributes and place it in f -row in factor-attribute matrix. Thus, we get optimal BMF for contranominal scale. Note that we can simplify the procedure of construction of the factor-attribute matrix using the following property. Property 1. For contranominal scale of size n and any subsets A and B of sets of objects and attributes respectively it holds that A↑ = {1, . . . , n}\A; B ↓ = {1, . . . , n}\B. Proof. Using the definition of contranominal scale and the derivation operator(s) we get: A↑ = ∩a∈A ({1, . . . , n}\a) = {1, . . . , n}\A. The proof for B is similar. t u Now we can remake the recovering of factor-attribute B matrix from object- factor matrix A. If Ak is the k-th column in the matrix A, then ∼ Ak (here ∼ is a logical negation) is a k-th row in the matrix B. Example. Let us demonstrate our algorithm on contranominal scale of size 5. N (5) = 4, hence BMF has 4 factors. Generate 5 different combinations from the set {1, 2, 3, 4}: {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}. Therefore, we obtain the object-factor  matrix  A: 1100 1 0 1 0   T A=  1 0 0 1. The first column of matrix A consists of vector 1, 1, 1, 0, 0 , 0 1 1 0 0101  so vector 0, 0, 0, 1, 1 is the first row of factor-attribute matrix. Similarly, we fill the rest of the rows in matrix B and obtain the optimal BMF:     01111 1100   1 0 1 1 1 1 0 1 0 00011       1 1 0 1 1 = 1 0 0 1 ◦ 0 1 1 0 0     1 0 1 0 1 1 1 1 0 1 0 1 1 0 11010 11110 0101 Proposition 3. The number of optimal BMFs (found by the proposed algo-   rithm) of the contranominal scale of size n > 0 is n! nq , where q = bNN(n)/2c (n) . Proof. Recall that we choose an n-element set from all the combinations of numbers from the set {1, . . . , N (n)} by bN (n)/2c elements in order to get rows of an object-factor matrix. Further, there are n! ways to arrange every obtained n-element set of combinations as rows of object-factor matrix. We conclude the proof noting that there is a unique way to build the factor- attribute matrix having the object-factor matrix. t u Note that the case n = 1, i.e. when I = (0), has two more solutions in addition to P ◦ Q = (1) ◦ (0); namely, (0) ◦ (1) = (0) ◦ (0). 6 Conclusion In the paper we considered important case for Boolean matrix factorisation based on our experimental and theoretical analyses of the behaviour of the GreConD algorithm. We hypothesise that the number of output factors for the contranom- inal scales in case of GreConD is blog2 nc + dlog2 ne based on the substantial observed fragment of its output for different values of the scale size n. Wehave also proposed an optimal algorithm w.r.t. Schein rank to find one out of n! nq optimal Boolean matrix factorisation for this case, where q = bNN(n)/2c (n) . As a future research direction we would like to continue our previous investi- gations of Boolean matrix factorisation for collaborative filtering problems [5,6] with an updated knowledge on suboptimality in case of contranominal scales presence as well to extend this approach to Boolean tensors. Acknowledgments. The paper was prepared within the framework of the HSE University Basic Research Program and was also supported in part through com- putational resources of HPC facilities at HSE University. The first author was also supported by Russian Science Foundation under grant 17-11-01276 at St. Petersburg Department of Steklov Mathematical Institute of Russian Academy of Sciences, Russia and by RFBR (Russian Foundation for Basic Research) ac- cording to the research project No 19-29-01151. The foundations had no role in study design, data collection and analysis, writing the manuscript, and decision to publish. We would like to thank Profs. Radim Belohlavek, Jan Outrata, Martin Tr- necka, and Vilem Vychodil for lasting collaboration and Prof. Donald Knuth for personal written explanation of a nontrivial piece from Concrete Mathemat- ics [14] regarding summation properties. References 1. Janostik, R., Konecny, J., Krajca, P.: Interface between Logical Analysis of Data and Formal Concept Analysis. Eur. J. Oper. Res. 284(2) (2020) 792–800 2. Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76(1) (2010) 3 – 20 Special Issue on Intelligent Data Analysis. 3. Miettinen, P., Neumann, S.: Recent Developments in Boolean Matrix Factoriza- tion. In Bessiere, C., ed.: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, ijcai.org (2020) 4922–4928 4. Belohlávek, R., Outrata, J., Trnecka, M.: Impact of Boolean factorization as pre- processing methods for classification of Boolean data. Ann. Math. Artif. Intell. 72(1-2) (2014) 3–22 5. Ignatov, D.I., Nenova, E., Konstantinova, N., Konstantinov, A.V.: Boolean Matrix Factorisation for Collaborative Filtering: An FCA-Based Approach. In Agre, G., Hitzler, P., Krisnadhi, A.A., Kuznetsov, S.O., eds.: Artificial Intelligence: Method- ology, Systems, and Applications - 16th International Conference, AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings. Volume 8722 of Lecture Notes in Computer Science., Springer (2014) 47–58 6. Akhmatnurov, M., Ignatov, D.I.: Context-Aware Recommender System Based on Boolean Matrix Factorisation. In Yahia, S.B., Konecny, J., eds.: Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications, Clermont-Ferrand, France, October 13-16, 2015. Volume 1466 of CEUR Workshop Proceedings., CEUR-WS.org (2015) 99–110 7. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20(10) (2008) 1348–1362 8. Miettinen, P., Vreeken, J.: MDL4BMF: Minimum Description Length for Boolean Matrix Factorization. ACM Trans. Knowl. Discov. Data 8(4) (2014) 18:1–18:31 9. Albano, A., Chornomaz, B.: Why concept lattices are large: extremal theory for generators, concepts, and VC-dimension. Int. J. Gen. Syst. 46(5) (2017) 440–457 10. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin/Heidelberg (1999) 11. Birkhoff, G.: Lattice Theory. 11th printing edn. Harvard University, Cambridge, MA (2011) 12. Poelmans, J., Ignatov, D.I., Kuznetsov, S.O., Dedene, G.: Formal concept analysis in knowledge processing: A survey on applications. Expert Syst. Appl. 40(16) (2013) 6538–6560 13. Poelmans, J., Kuznetsov, S.O., Ignatov, D.I., Dedene, G.: Formal concept analysis in knowledge processing: A survey on models and techniques. Expert Syst. Appl. 40(16) (2013) 6601–6623 14. Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete Mathematics: A Foundation for Computer Science, 2nd Ed. Addison-Wesley (1994) 15. Kim, K.H.: Boolean matrix theory and applications. Marcel Dekker, New York and Basel (1982) 16. Marenich, E.: Determining the Schein Rank of Boolean Matrices. Matrix Methods: Theory, Algorithms and Applications (2010) 85–103 17. Sperner, E.: Ein Satz über Untermengen einer endlichen Menge. Math Z 27 (1928) 544–548 18. Knuth, D.E.: Combinatorial Algorithms. Volume 4A of The Art of Computer Programming. Addison-Wesley Professional (January 2011)