=Paper=
{{Paper
|id=Vol-2668/paper4
|storemode=property
|title=Revisiting the GreCon Algorithm for Boolean Matrix Factorization
|pdfUrl=https://ceur-ws.org/Vol-2668/paper4.pdf
|volume=Vol-2668
|authors=Martin Trnecka,Roman Vyjidacek
|dblpUrl=https://dblp.org/rec/conf/cla/TrneckaV20
}}
==Revisiting the GreCon Algorithm for Boolean Matrix Factorization==
Revisiting the GreCon Algorithm for Boolean Matrix Factorization Martin Trnecka[0000−0001−7770−2033] and Roman Vyjidacek[0000−0002−5442−0444] Dept. Computer Science, Palacký University Olomouc, Olomouc, Czech Republic martin.trnecka@gmail.com, roman.vyjidacek@upol.cz Abstract. Over the past decade, the two most fundamental Boolean matrix factorization (BMF) algorithms, GreCon and GreConD, were proposed. Whereas GreConD has become one of the most popular al- gorithms, GreCon—an algorithm on which the GreConD was build—is somewhat forgotten. Although GreCon may produce better results than GreConD, computing BMF via this algorithm is time consuming. We show that the reasons for not using GreCon algorithm are no longer truth. We revise the algorithm and on various experiments we demon- strate that the revised version is competitive to current BMF algorithms in term of running time. Moreover, in some cases GreCon outperforms GreConD—the fastest BMF algorithm. Additionally, we argue that a search strategy of GreConD, notwithstanding it provides a good result, is limited. Furthermore, we show that our novel approach to GreCon opens a new door to further BMF research. Keywords: Boolean matrix factorization · Boolean matrix factorization Algorithms · Formal concept analysis 1 Introduction Boolean Matrix Factorization (BMF), also known as Boolean matrix decom- position, is a well-established and widely used tool in data-mining and data processing of Boolean (1/0) data. In the last decade there was a huge effort dedicated to a BMF research. In many works (e.g. [2, 3]) a strong connection—which we consider in the paper— between BMF and formal concept analysis (FCA) was established. FCA provides a general framework for the BMF problem description. Namely, formal context represents the input data and formal concepts represent factors in such data. From this perspective, BMF can be seen as a covering of a formal context by formal concepts [2, 3, 10]. In the pioneer work [3] two main algorithms, GreCon and GreConD, for BMF as well as a fundamental theory based on FCA were established.1 These two algorithms are basic ones. Both utilize formal concepts as candidates for 1 The algorithms were originally called Algorithm 1 and 2. The names GreCon and GreConD come from later works. Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Published in Francisco J. Valverde-Albacete, Martin Trnecka (Eds.): Proceedings of the 15th International Conference on Concept Lattices and Their Applications, CLA 2020, pp. 59–70, 2020. 60 Martin Trnecka and Roman Vyjidacek factors. From these candidates the final factors are selected via a greedy choice. A detailed description of these two algorithms is provided in Section 2. While the GreConD has become extremely popular, mainly due to its simple architecture and performance—in a fact GreConD is one of the fastest BMF algorithms and according to various experimental evaluations (see e.g. results in [1, 2]) it provides a very good results from the quality viewpoint—the GreCon is forgotten in the contemporary BMF research. There are two main reasons for that: (i) GreCon is a very slow BMF algorithm. The algorithm requires a several iterations over the set of all formal concepts, which may be large. (ii) Results provided by GreConD are comparable to results provided by GreCon. These two reasons stand on results presented in [3] and are widely adopted in BMF research. In the paper, we argue that these reasons need to be revisited, mainly according to a new development in BMF. The main aim of this paper is to show that the reasons for not using GreCon algorithm are no longer valid—thanks to the development of algorithms for FCA and, paradoxically, thanks to the development of the GreConD algorithm. The main aim can be summarized as follows: (i) We reimplemented the GreCon algorithm and on various experiments we demonstrate that the revised version is competitive in term of running time with existing BMF algorithms. Moreover, the new implementation in some cases outperforms GreConD. (ii) We show, that the search strategy used by GreConD, despite it provides a good result, is limited. In the paper, we compare the search spaces of GreCon and GreConD and we briefly address the comparison of the results provided by both of them. (iii) Last but not least, the massive speed up of GreCon presented in the paper opens a new door to further BMF research. Namely, we utilize the GreCon search strategy on a restricted set of formal concepts. Although this is not a completely new idea, its application was limited, because of the speed of existing algorithms. Moreover, in an experimental evaluation we show that this approach may provides better results. The rest of the paper is organized as follows. In Section 2, we provide a brief introduction to BMF, overview of related works and a description of the basic algorithms, GreCon and GreConD. Then, in Section 3, a comparison of GreCon and GreConD is discussed. Section 4 provides a description and a pseudocode of the redesign GreCon algorithm. In Section 5, we present re- sults from various experimental evaluations. Section 6 summarizes the paper and outlines a further research directions. 2 A Brief Introduction to BMF A good overview of BMF and related topics can be found e.g. in [1, 2, 8]. In general, BMF and BMF algorithms are addressed in various papers involving formal concept analysis [3, 6], role mining [4], binary databases [5] or bipartite graphs [9]. In this paper, we focus instead of a general BMF to a certain class of factorization, so-called from-below matrix factorization [2], i.e. no entries in the input data with a zero value are covered by some factor. Revisiting the GreCon Algorithm for Boolean Matrix Factorization 61 A general aim of BMF is for a given Boolean matrix I ∈ {0, 1}m×n to find matrices A ∈ {0, 1}m×k and B ∈ {0, 1}k×n for which I≈A◦B (1) where ◦ is Boolean matrix multiplication, i.e. (A ◦ B)ij = maxkl=1 min(Ail , Blj ), and ≈ represents an approximate equality. This approximate equality is assessed by || · || (i.e. by number of 1s) and with the corresponding metric E which is defined for matrices I ∈ {0, 1}m×n , A ∈ {0, 1}m×k , and B ∈ {0, 1}k×n by E(I, A ◦ B) = ||I (A ◦ B)||, (2) where is Boolean subtraction which is the normal matrix subtraction with an alternative definition 0 − 1 = 0. In other words, function E is a number of 1s in I that are not in (A ◦ B). The metric (2), or its variant, is generally used to assess the quality of factorization [1, 2]. A decomposition of I into A◦B may be interpreted as a discovery of k factors that exactly or approximately describe the data: interpreting I, A, and B as the object-attribute, object-factor, and factor-attribute matrices. The model (1) can be interpreted as follows: the object i has the attribute j, i.e. Iij = 1, if and only if there exists factor l such that l applies to i and j is one of the particular manifestations of l. 2.1 BMF with Help of Formal Concept Analysis We already mentioned that the BMF is closely connected to FCA. The formal context hX, Y, Ii with m objects and n attributes can be seen as a Boolean matrix I ∈ {0, 1}m×n where Iij = 1 if hx, yi ∈ I, and vice versa. To every I ∈ {0, 1}n×m one may associate a pair h↑ , ↓ i of arrow operators assigning to sets C ⊆ X = {1, . . . , m} and D ⊆ Y = {1, . . . , n} the sets C ↑ ⊆ Y and D↓ ⊆ X defined by C ↑ = {j ∈ Y | ∀i ∈ C : Iij = 1}, D↓ = {i ∈ X | ∀j ∈ D : Iij = 1}. A pair hC, Di for which C ↑ = D and D↓ = C is called a formal concept. The set of all formal concepts for formal context hX, Y, Ii is defined as follows B(X, Y, I) = {hC, Di | C ⊆ X, D ⊆ Y, C ↑ = D, D↓ = C}. For the sake of simplicity, we denote the set of all formal concepts for Boolean matrix I by B(I). The set of all concepts can be equipped with a partial order ≤ such that hA, Bi ≤ hC, Di iff A ⊆ C (or D ⊆ B). The whole set of partially ordered formal concepts is called the concept lattice of I. The set of formal concepts that is generated by single object (denoted by O(I)) is called the set of object concepts, i.e. O(I) = {hi↑↓ , i↑ i | ∀i ∈ X}, and the set that is generated by single attribute (denoted A(I)) is called the set of attributes concepts, i.e. A(I) = {hj ↓ , j ↓↑ i | ∀j ∈ Y }. 62 Martin Trnecka and Roman Vyjidacek Now, we explain the connection between a set of formal concepts and the BMF. Every set F = {hC1 , D1 i, . . . , hCk , Dk i} ⊆ B(I), with a fixed indexing of the formal concepts hCl , Dl i induces the m × k and k × n Boolean matrices AF and BF by 1, if i ∈ Cl , (AF )il = 0, if i 6∈ Cl , and 1, if j ∈ Dl , (BF )lj = 0, if j 6∈ Dl , for l = 1, . . . , k. That is, the lth column of AF and lth row BF are the char- acteristic vectors of Cl and Dl , respectively. The set F is called a set of factor concepts. The entry Iij = 1 is covered by formal concept hA, Bi if i ∈ A and j ∈ B. 2.2 Algorithm GreCon The GreCon2 algorithm [3] is one of the straightforward algorithms for BMF based on FCA. To produce matrices AF and BF it uses a greedy search for factor concepts driven by metric (2). More precisely, the algorithm first computes the set B(I). Then it iteratively goes through this set and in each iteration a factor concept that covers the largest not yet covered part of input data is selected, i.e. it is a set cover based algorithm. The algorithm is designed to compute an exact (or approximate if it is stopped before the end) from-below matrix factorization. 2.3 Algorithm GreConD One of the most successful algorithms for BMF is the GreConD3 algorithm [3] which was originally designed to improve a running time of GreCon. To pro- duce the matrices AF and BF it uses a particular greedy search for factor concepts which allows us to compute factor concepts “on demand”, i.e. without the need to compute the set of all formal concepts for the input matrix first. GreConD constructs the factor concepts by adding sequentially “promising columns” to candidate hC, Di for factor concept. More formally, a new column j that minimizes the error E(I, AF ∪h(D∪j)↓ ,(D∪j)↓↑ i ◦BF ∪h(D∪j)↓ ,(D∪j)↓↑ i ) is added to hC, Di. This is repeated until no such columns exist. If there is no such col- umn, hC, Di is added to the set F. This strategy leads to a huge time saving while the quality of the decomposition is comparable with the decomposition obtained via GreCon. Similar to GreCon, the algorithm is also designed to compute an exact and approximate from-below factorization. 2 GreCon is the abbreviation for Greedy Concepts. 3 GreConD is the abbreviation for Greedy Concepts on Demand. Revisiting the GreCon Algorithm for Boolean Matrix Factorization 63 3 Comparison of GreCon and GreConD In what follows, we demonstrate differences between the search strategies of GreConD and GreCon. Let us consider matrix I, depicted below, with rows {1, 2, 3, 4, 5} and columns {a, b, c, d, e, f } abcdef 001111 1 0 1 0 1 0 1 2 I = 1 1 0 0 1 1 3 1 0 0 1 1 0 4 000110 5 and two sets of factor concepts F1 and F2 that are computed via GreConD and GreCon respectively. The corresponding matrices are: 00111 100010 10010 000110 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 1 AF1 ◦ BF1 = 1 1 0 0 1 ◦ 0 0 1 1 1 1 , AF2 ◦ BF2 = 0 1 0 0 1 ◦ 0 1 0 1 0 1 . 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 1 00011 000010 10000 100010 The search space, in each iteration of GreCon, is the set of all formal con- cepts. Namely, in each the iteration all formal concepts in the concept lattice of I (depicted in Figure 1) are considered as candidates for factors, in order in they were generated. 1, 2, 3, 4, 5 f d e 1, 2, 3 1, 2, 4, 5 1, 3, 4, 5 d, f e, f d, e b, f 1, 2 1, 3 1, 4, 5 a, e 2, 3 3, 4 b, d, f a, b, e, f c, d, e, f a, d, e 2 3 1 4 a, b, c, d, e, f Fig. 1: Concept lattice of I. The search space of GreConD is different. GreConD always starts with attributes concepts. Then it extends the selected concept via some promising attributes, i.e. GreConD search space is limited to chains in the corresponding 64 Martin Trnecka and Roman Vyjidacek concept lattice and the selection of a particular chain is driven by the greedy choice. Note, the columns of the input matrix are considered in a fixed order. As a consequence of this GreConD may stop in some local maximum. In a fact, this is true for the first iteration on our example data matrix I. GreConD starts with attribute a which generates h{3, 4}, {a, e}i. All remaining candidates (attribute concepts) have smaller or equal size as the first concept. Then, h{3, 4}, {a, e}i is extended by some attributes to obtain a potentially better candidate (candidate which covers more not yet covered entries of I), namely attributes b, d, f are considered, meanwhile c is skipped because a, c, e do not generate any formal concept. The concepts considered as candidates for factors in the first iteration of GreConD are depicted by black nodes in Figure 1. As a consequence of this GreConD is not able to discover the best choice h{1, 4, 5}, {d, e}i. Despite the differences between the algorithms, the results produced by them are comparable—GreConD produces slightly worse results than GreCon—in terms of the overall coverage which is evaluated by the metric (2). The com- parison of the coverage is shown in Figure 2. Note, the coverage is computed as 1 − E(I,A◦B) ||I|| . 1.0 1.0 0.8 0.8 Coverage Coverage 0.6 0.6 0.4 0.4 0.2 0.2 GreCon GreCon GreConD GreConD 0.0 0.0 0 25 50 75 100 125 150 175 200 0 50 100 150 200 250 300 Number of Factors Number of Factors (a) Americas small (b) Customers 1.0 1.0 0.8 0.8 Coverage Coverage 0.6 0.6 0.4 0.4 0.2 0.2 GreCon GreCon GreConD GreConD 0.0 0.0 0 100 200 300 400 500 0 20 40 60 80 100 120 Number of Factors Number of Factors (c) DNA (d) Mushroom Fig. 2: Comparison of coverage produced by factorization obtained via GreCon and GreConD on selected real-word data. Revisiting the GreCon Algorithm for Boolean Matrix Factorization 65 This observation—in fact results presented in Figure 2 are standard in BMF— was confirmed by previous works e.g. [1, 2]. The interpretation of this observation is affected by a small misunderstanding. Namely, graphs in Figure 2 capture the cumulative coverage, i.e. the sum of all covered entries for a given number of factors. In this sum, the differences between the factorizations be may not easy to see. In [3] a different kind of experiments were performed. Instead of plotting the cumulative coverage, the coverage of i-th factor in one factorization in contrast with the coverage of i-th factor in the second factorization is plotted, i.e. plotted points have coordinates based on the number of covered entries. Results of this experiments—[3] includes only the results for Mushroom data (see Section 5)— for selected data are depicted in Figure 3. 104 105 104 103 3 GreConD 10 GreConD 102 102 101 101 100 100 0 1 2 3 4 5 10 10 10 10 10 10 100 101 102 103 104 GreCon GreCon (a) Americas small (b) Customers 104 104 103 103 GreConD GreConD 102 102 1 10 101 0 10 100 100 101 102 103 104 100 101 102 103 104 GreCon GreCon (c) DNA (d) Mushroom Fig. 3: Comparison of GreCon and GreConD algorithms on selected real- word data. The plot uses axes with a logarithmic scale. The plotted points have coordinates based on the number of covered entries. One may easily see, that the difference between factorizations may be very large despite small differences in Figure 2. In case of Customers dataset (Fig- ure 3b) the datapoints are gathered close to the diagonal which means that there is a small difference between the corresponding factors. In contrast with this a significant difference between GreCon and GreConD on DNA dataset (Figure 3c) may be observed. 66 Martin Trnecka and Roman Vyjidacek 4 Redesign of GreCon Now we describe a basic idea of redesigning the GreCon algorithm. We call the redesigned version of the algorithm GreCon2 because it is mainly an efficient implementation of the original GreCon algorithm. Differently from GreCon, GreCon2 does not need repeatedly compute the number of not yet covered entries for each formal concept—which is the most time consuming task in GreCon. Instead of this the actual cover for all formal concepts is maintained for the entire running time. For this purpose, a conve- nient data structure which enables an additional speed up is used. Additionally, GreCon2 does not iterate over all the formal concepts (i.e. data structure) but only over the numbers (integers). A pseudocode of GreCon2, which accepts as an input Boolean matrix I ∈ {0, 1}m×n , is depicted in Algorithm 1. Algorithm 1: GreCon2 Input: Boolean matrix I ∈ {0, 1}m×n Output: Set F of factor concepts. 1 concepts ← B(I); 2 foreach hAl , Bl i ∈ concepts do 3 covers[l] ← ||Al || · ||Bl || 4 foreach (i, j) ∈ hAl , Bl i do 5 append l to list cell[i · n + j] 6 end 7 end 8 while AF ◦ BF 6= I do 9 add hAl , Bl i which maximizes covers[l] to F 10 foreach (i, j) ∈ hAl , Bl i do 11 foreach k ∈ cell[i · n + j] do 12 covers[k] ← covers[k] − 1 13 end 14 delete list cell[i · n + j] 15 end 16 end In the first phase, GreCon2 computes the set of all formal concepts B(I) (line 1). Then the algorithm computes a coverage for each concept—stores in array covers (line 3)—and for each Iij constructs a list of concepts that cover Iij (lines 4–6). These lists are stored in array cell on a corresponding indexes (line 5). Note that accessing the entries of arrays covers and cell requires a constant time. In the second phase, GreCon2 performs linear search in array covers and adds to F concept hAl , Bl i for which the value of cover[l] is maximal (line 9). For each Iij covered by hAl , Bl i GreCon2 decrease values in array covers for each concept that covers Iij (lines 10–13) and deletes the list for Iij (line 14). The second phase is repeated until all the entries of the input data matrix are covered or, equivalently, there is no covers[l] > 0. Revisiting the GreCon Algorithm for Boolean Matrix Factorization 67 5 Experimental Evaluation In this section, we present results of an experimental comparison of GreCon, GreConD and GreCon2. We implemented4 all three algorithms in the Swift programming language with the same level of optimization to make the compar- ison fair. 5.1 Datasets We used eight real-world datasets, namely Advertisement, Mushroom, Tic Tac Toe from [7]; Americas small, Apj, Customer, Firewall 1 from [4]; and DNA [11]. The characteristics of the datasets are shown in Table 1. Specifically the num- ber of objects, the number of attributes, the density of nonzero entries in data in percents, the number of concepts, and the number of object and attribute concepts. All of them are well known and widely used as benchmark datasets in BMF. Table 1: Datasets and their characteristics. dataset # objects # attributes dens. |B(I)| |O(I) ∪ A(I)| Advertisement 3279 1557 0.88 9192 2648 Americas small 3477 1587 1.91 2764 524 Apj 2044 1164 0.29 798 723 Customer 10961 277 1.50 47848 5806 DNA 4590 392 1.47 4483 1450 Firewall 1 365 709 12.35 317 152 Mushroom 8124 119 17.65 221525 8231 Tic Tac Toe 958 30 33.33 59505 988 5.2 Running Times Comparison We compared the running times of GreCon, GreConD and GreCon2. All experiments were performed on an ordinal computer with 2.7GHz Quad-Core Intel Core i7 processor and 16GB of RAM. Each algorithm was run 5 times on a particular data and the average time as well the standard deviation (the numbers after ±) are reported in Table 2. Note, the running times do not involve the time required to load input data. In case of GreCon and GreCon2, the time required for the computation of all formal concepts, which is done via our implementation of FCbO algorithm [12], is included in each iteration. Table 2 shows that in all cases GreCon2 significantly outperforms origi- nal GreCon algorithm. Moreover, in most cases GreCon2 outperforms Gre- ConD. GreConD performs well on datasets where the number of attributes 4 All implementations together with scripts that performs all presented experiments are available on the GitHub https://github.com/rvyjidacek/experiments-cla2020 68 Martin Trnecka and Roman Vyjidacek Table 2: Comparison of running times of GreCon, GreCon2 and GreConD in seconds. The average time over five iterations as well as standard deviations (the numbers after ±) are presented. dataset GreCon GreCon2 GreConD Advertisement 916.40 ± 9.80 1.75 ± 0.06 295.72 ± 1.38 Americas small 94.68 ± 0.17 1.37 ± 0.02 96.46 ± 0.61 Apj 16.70 ± 0.14 0.21 ± 0.00 59.52 ± 0.23 Customer 1179.33 ± 4.85 3.23 ± 0.02 11.95 ± 0.15 DNA 133.71 ± 2.21 0.48 ± 0.01 21.9 ± 0.50 Firewall 1 0.76 ± 0.02 0.12 ± 0.00 4.51 ± 0.08 Mushroom 1139.91 ± 15.37 31.14 ± 0.22 3.99 ± 0.03 Tic Tac Toe 5.69 ± 0.17 0.78 ± 0.01 0.04 ± 0.00 Table 3: Comparison of running times and the final number of factors of Gre- Con2 restricted to O(I) ∪ A(I) and unrestricted GreConD. dataset GreCon2 # factors GreConD # factors Advertisement 0.38 ± 0.00 756 295.72 ± 1.38 766 Americas small 0.22 ± 0.01 208 96.46 ± 0.61 197 Apj 0.07 ± 0.00 463 59.52 ± 0.23 464 Customer 0.41 ± 0.00 281 11.95 ± 0.15 286 DNA 0.15 ± 0.00 515 21.90 ± 0.50 511 Firewall 1 0.05 ± 0.00 66 4.51 ± 0.08 66 Mushrooms 0.42 ± 0.00 103 3.99 ± 0.03 118 Tic Tac Toe 0.01 ± 0.00 29 0.04 ± 0.00 32 is rather small. This is the reason why GreConD outperforms GreCon2 on Mushroom dataset which has a larger number of concepts and only a small number of attributes. Note that our implementation of GreCon2 may be improved. Namely the construction phase of the algorithm (lines 2–6 in Algorithm 1) can be a part of FCbO algorithm (line 1). We decide to keep these two parts separated to make the comparison between GreCon and GreCon2 fair. 5.3 A New Approach to BMF A significant speed up provided by GreCon2 allows us to consider the set of all formal concepts as a set of candidates for factors. Despite considerable speedup, the set may be still too large (like in the case of Mushroom dataset). As a very promising research direction seems to be a restriction of the set of formal concepts, i.e. skip formal concepts that are potentially not a good candidates for factors. To demonstrate this idea, we consider in GreCon2 the set O(I) ∪ Revisiting the GreCon Algorithm for Boolean Matrix Factorization 69 A(I) instead of B(I). We perform the same experiments as above and compare GreCon2 with the restricted search space and unchanged GreConD. From the results shown in Table 3, we observe that GreCon2 significantly outperforms GreConD in therm of running times. Moreover, in this case Gre- Con2 produces slightly better results in terms of overall coverage (see Figure 4) and in some cases a smaller number of factors than GreConD (see Table 3). 1.0 1.0 0.8 0.8 Coverage Coverage 0.6 0.6 0.4 0.4 0.2 0.2 GreCon2 GreCon2 GreConD GreConD 0.0 0.0 0 100 200 300 400 500 600 700 0 25 50 75 100 125 150 175 200 Number of Factors Number of Factors (a) Advertisement (b) Americas small 1.0 1.0 0.8 0.8 Coverage Coverage 0.6 0.6 0.4 0.4 0.2 0.2 GreCon2 GreCon2 GreConD GreConD 0.0 0.0 0 20 40 60 80 100 0 5 10 15 20 25 30 Number of Factors Number of Factors (c) Mushroom (d) Tic Tac Toe Fig. 4: Comparison of coverage produced by factorization obtained via GreCon2 where B(I) was restricted to O(I) ∪ A(I) and GreConD on selected datasets. 6 Conclusion and Further Research We proposed a revised version of GreCon algorithm which significantly out- performs the original algorithm and which is competitive to the fastest BMF algorithms. Additionally, we show that our approach enables us to consider a larger search space than one of the most popular BMF algorithms, which in the past has caused that GreCon to be forgotten in BMF research. Further research shall include an efficient implementation of the approach; a deep investigation of how the set of all formal concepts may be restricted without affecting the outcome quality; and last but not least, a parallelization of the approach. 70 Martin Trnecka and Roman Vyjidacek Acknowledgments Supported by Junior research Grant No. JG 2020 003 of the Palacký University Olomouc. Support by Grant No. IGA PrF 2020 030 of IGA of Palacký University is also acknowledged. References 1. Belohlavek, R., Outrata, J., Trnecka, M.: Toward quality assess- ment of Boolean matrix factorizations. Inf. Sci. 459, 71–85 (2018). https://doi.org/10.1016/j.ins.2018.05.016 2. Belohlavek, R., Trnecka, M.: From-below approximations in Boolean matrix fac- torization: Geometry and new algorithm. J. Comput. Syst. Sci. 81(8), 1678–1697 (2015). https://doi.org/10.1016/j.jcss.2015.06.002 3. Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010). https://doi.org/10.1016/j.jcss.2009.05.002 4. Ene, A., Horne, W.G., Milosavljevic, N., Rao, P., Schreiber, R., Tarjan, R.E.: Fast exact and heuristic methods for role minimization problems. In: Ray, I., Li, N. (eds.) 13th ACM Symposium on Access Control Models and Technologies, SACMAT 2008, Estes Park, CO, USA, June 11-13, 2008, Proceedings. pp. 1–10. ACM (2008). https://doi.org/10.1145/1377836.1377838 5. Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) Discovery Science, 7th International Conference, DS 2004, Padova, Italy, October 2-5, 2004, Proceedings. Lecture Notes in Computer Science, vol. 3245, pp. 278–289. Springer (2004). https://doi.org/10.1007/978-3-540-30214-8 22 6. Ignatov, D.I., Nenova, E., Konstantinova, N., Konstantinov, A.V.: Boolean ma- trix factorisation for collaborative filtering: An fca-based approach. In: Agre, G., Hitzler, P., Krisnadhi, A.A., Kuznetsov, S.O. (eds.) Artificial Intelligence: Method- ology, Systems, and Applications - 16th International Conference, AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8722, pp. 47–58. Springer (2014). https://doi.org/10.1007/978-3-319- 10554-3 5, https://doi.org/10.1007/978-3-319-10554-3 5 7. Lichman, M.: UCI machine learning repository (2013), http://archive.ics.uci.edu/ml 8. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The dis- crete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008). https://doi.org/10.1109/TKDE.2008.53 9. Monson, S.D., Pullman, S., Rees, R.: A survey of clique and biclique coverings and factorizations of (0,1)-matrices. No. 14 (1995) 10. Mouakher, A., Yahia, S.B.: QualityCover: Efficient binary relation cover- age guided by induced knowledge quality. Inf. Sci. 355-356, 58–73 (2016). https://doi.org/10.1016/j.ins.2016.03.009 11. Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: Dna copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006) 12. Outrata, J., Vychodil, V.: Fast algorithm for computing fixpoints of galois connec- tions induced by object-attribute relational data. Inf. Sci. 185(1), 114–127 (2012). https://doi.org/10.1016/j.ins.2011.09.023