=Paper= {{Paper |id=Vol-2668/paper4 |storemode=property |title=Revisiting the GreCon Algorithm for Boolean Matrix Factorization |pdfUrl=https://ceur-ws.org/Vol-2668/paper4.pdf |volume=Vol-2668 |authors=Martin Trnecka,Roman Vyjidacek |dblpUrl=https://dblp.org/rec/conf/cla/TrneckaV20 }} ==Revisiting the GreCon Algorithm for Boolean Matrix Factorization== https://ceur-ws.org/Vol-2668/paper4.pdf
              Revisiting the GreCon Algorithm
              for Boolean Matrix Factorization

Martin Trnecka[0000−0001−7770−2033] and Roman Vyjidacek[0000−0002−5442−0444]

    Dept. Computer Science, Palacký University Olomouc, Olomouc, Czech Republic
              martin.trnecka@gmail.com, roman.vyjidacek@upol.cz



        Abstract. Over the past decade, the two most fundamental Boolean
        matrix factorization (BMF) algorithms, GreCon and GreConD, were
        proposed. Whereas GreConD has become one of the most popular al-
        gorithms, GreCon—an algorithm on which the GreConD was build—is
        somewhat forgotten. Although GreCon may produce better results than
        GreConD, computing BMF via this algorithm is time consuming. We
        show that the reasons for not using GreCon algorithm are no longer
        truth. We revise the algorithm and on various experiments we demon-
        strate that the revised version is competitive to current BMF algorithms
        in term of running time. Moreover, in some cases GreCon outperforms
        GreConD—the fastest BMF algorithm. Additionally, we argue that a
        search strategy of GreConD, notwithstanding it provides a good result,
        is limited. Furthermore, we show that our novel approach to GreCon
        opens a new door to further BMF research.

        Keywords: Boolean matrix factorization · Boolean matrix factorization
        Algorithms · Formal concept analysis


1     Introduction

Boolean Matrix Factorization (BMF), also known as Boolean matrix decom-
position, is a well-established and widely used tool in data-mining and data
processing of Boolean (1/0) data.
   In the last decade there was a huge effort dedicated to a BMF research. In
many works (e.g. [2, 3]) a strong connection—which we consider in the paper—
between BMF and formal concept analysis (FCA) was established. FCA provides
a general framework for the BMF problem description. Namely, formal context
represents the input data and formal concepts represent factors in such data.
From this perspective, BMF can be seen as a covering of a formal context by
formal concepts [2, 3, 10].
   In the pioneer work [3] two main algorithms, GreCon and GreConD, for
BMF as well as a fundamental theory based on FCA were established.1 These
two algorithms are basic ones. Both utilize formal concepts as candidates for
1
    The algorithms were originally called Algorithm 1 and 2. The names GreCon and
    GreConD come from later works.



Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). Published in Francisco
J. Valverde-Albacete, Martin Trnecka (Eds.): Proceedings of the 15th International
Conference on Concept Lattices and Their Applications, CLA 2020, pp. 59–70, 2020.
60     Martin Trnecka and Roman Vyjidacek

factors. From these candidates the final factors are selected via a greedy choice.
A detailed description of these two algorithms is provided in Section 2.
     While the GreConD has become extremely popular, mainly due to its simple
architecture and performance—in a fact GreConD is one of the fastest BMF
algorithms and according to various experimental evaluations (see e.g. results in
[1, 2]) it provides a very good results from the quality viewpoint—the GreCon
is forgotten in the contemporary BMF research. There are two main reasons
for that: (i) GreCon is a very slow BMF algorithm. The algorithm requires
a several iterations over the set of all formal concepts, which may be large. (ii)
Results provided by GreConD are comparable to results provided by GreCon.
These two reasons stand on results presented in [3] and are widely adopted in
BMF research. In the paper, we argue that these reasons need to be revisited,
mainly according to a new development in BMF.
     The main aim of this paper is to show that the reasons for not using GreCon
algorithm are no longer valid—thanks to the development of algorithms for FCA
and, paradoxically, thanks to the development of the GreConD algorithm. The
main aim can be summarized as follows: (i) We reimplemented the GreCon
algorithm and on various experiments we demonstrate that the revised version
is competitive in term of running time with existing BMF algorithms. Moreover,
the new implementation in some cases outperforms GreConD. (ii) We show,
that the search strategy used by GreConD, despite it provides a good result, is
limited. In the paper, we compare the search spaces of GreCon and GreConD
and we briefly address the comparison of the results provided by both of them.
(iii) Last but not least, the massive speed up of GreCon presented in the paper
opens a new door to further BMF research. Namely, we utilize the GreCon
search strategy on a restricted set of formal concepts. Although this is not a
completely new idea, its application was limited, because of the speed of existing
algorithms. Moreover, in an experimental evaluation we show that this approach
may provides better results.
     The rest of the paper is organized as follows. In Section 2, we provide a
brief introduction to BMF, overview of related works and a description of the
basic algorithms, GreCon and GreConD. Then, in Section 3, a comparison
of GreCon and GreConD is discussed. Section 4 provides a description and
a pseudocode of the redesign GreCon algorithm. In Section 5, we present re-
sults from various experimental evaluations. Section 6 summarizes the paper and
outlines a further research directions.


2    A Brief Introduction to BMF

A good overview of BMF and related topics can be found e.g. in [1, 2, 8]. In
general, BMF and BMF algorithms are addressed in various papers involving
formal concept analysis [3, 6], role mining [4], binary databases [5] or bipartite
graphs [9]. In this paper, we focus instead of a general BMF to a certain class of
factorization, so-called from-below matrix factorization [2], i.e. no entries in the
input data with a zero value are covered by some factor.
          Revisiting the GreCon Algorithm for Boolean Matrix Factorization        61

  A general aim of BMF is for a given Boolean matrix I ∈ {0, 1}m×n to find
matrices A ∈ {0, 1}m×k and B ∈ {0, 1}k×n for which

                                     I≈A◦B                                       (1)

where ◦ is Boolean matrix multiplication, i.e. (A ◦ B)ij = maxkl=1 min(Ail , Blj ),
and ≈ represents an approximate equality. This approximate equality is assessed
by || · || (i.e. by number of 1s) and with the corresponding metric E which is
defined for matrices I ∈ {0, 1}m×n , A ∈ {0, 1}m×k , and B ∈ {0, 1}k×n by

                          E(I, A ◦ B) = ||I    (A ◦ B)||,                        (2)

where is Boolean subtraction which is the normal matrix subtraction with an
alternative definition 0 − 1 = 0. In other words, function E is a number of 1s
in I that are not in (A ◦ B). The metric (2), or its variant, is generally used to
assess the quality of factorization [1, 2].
    A decomposition of I into A◦B may be interpreted as a discovery of k factors
that exactly or approximately describe the data: interpreting I, A, and B as
the object-attribute, object-factor, and factor-attribute matrices. The model (1)
can be interpreted as follows: the object i has the attribute j, i.e. Iij = 1, if and
only if there exists factor l such that l applies to i and j is one of the particular
manifestations of l.

2.1   BMF with Help of Formal Concept Analysis
We already mentioned that the BMF is closely connected to FCA. The formal
context hX, Y, Ii with m objects and n attributes can be seen as a Boolean
matrix I ∈ {0, 1}m×n where Iij = 1 if hx, yi ∈ I, and vice versa. To every
I ∈ {0, 1}n×m one may associate a pair h↑ , ↓ i of arrow operators assigning to
sets C ⊆ X = {1, . . . , m} and D ⊆ Y = {1, . . . , n} the sets C ↑ ⊆ Y and D↓ ⊆ X
defined by

                        C ↑ = {j ∈ Y | ∀i ∈ C : Iij = 1},
                        D↓ = {i ∈ X | ∀j ∈ D : Iij = 1}.

    A pair hC, Di for which C ↑ = D and D↓ = C is called a formal concept. The
set of all formal concepts for formal context hX, Y, Ii is defined as follows

            B(X, Y, I) = {hC, Di | C ⊆ X, D ⊆ Y, C ↑ = D, D↓ = C}.

For the sake of simplicity, we denote the set of all formal concepts for Boolean
matrix I by B(I). The set of all concepts can be equipped with a partial order
≤ such that hA, Bi ≤ hC, Di iff A ⊆ C (or D ⊆ B). The whole set of partially
ordered formal concepts is called the concept lattice of I.
    The set of formal concepts that is generated by single object (denoted by
O(I)) is called the set of object concepts, i.e. O(I) = {hi↑↓ , i↑ i | ∀i ∈ X}, and
the set that is generated by single attribute (denoted A(I)) is called the set of
attributes concepts, i.e. A(I) = {hj ↓ , j ↓↑ i | ∀j ∈ Y }.
62       Martin Trnecka and Roman Vyjidacek

   Now, we explain the connection between a set of formal concepts and the
BMF. Every set F = {hC1 , D1 i, . . . , hCk , Dk i} ⊆ B(I), with a fixed indexing of
the formal concepts hCl , Dl i induces the m × k and k × n Boolean matrices AF
and BF by
                                         
                                           1, if i ∈ Cl ,
                             (AF )il =
                                           0, if i 6∈ Cl ,

and
                                         
                                             1, if j ∈ Dl ,
                             (BF )lj =
                                             0, if j 6∈ Dl ,

for l = 1, . . . , k. That is, the lth column of AF and lth row BF are the char-
acteristic vectors of Cl and Dl , respectively. The set F is called a set of factor
concepts. The entry Iij = 1 is covered by formal concept hA, Bi if i ∈ A and
j ∈ B.


2.2     Algorithm GreCon

The GreCon2 algorithm [3] is one of the straightforward algorithms for BMF
based on FCA. To produce matrices AF and BF it uses a greedy search for factor
concepts driven by metric (2). More precisely, the algorithm first computes the
set B(I). Then it iteratively goes through this set and in each iteration a factor
concept that covers the largest not yet covered part of input data is selected, i.e.
it is a set cover based algorithm. The algorithm is designed to compute an exact
(or approximate if it is stopped before the end) from-below matrix factorization.


2.3     Algorithm GreConD

One of the most successful algorithms for BMF is the GreConD3 algorithm [3]
which was originally designed to improve a running time of GreCon. To pro-
duce the matrices AF and BF it uses a particular greedy search for factor
concepts which allows us to compute factor concepts “on demand”, i.e. without
the need to compute the set of all formal concepts for the input matrix first.
GreConD constructs the factor concepts by adding sequentially “promising
columns” to candidate hC, Di for factor concept. More formally, a new column j
that minimizes the error E(I, AF ∪h(D∪j)↓ ,(D∪j)↓↑ i ◦BF ∪h(D∪j)↓ ,(D∪j)↓↑ i ) is added
to hC, Di. This is repeated until no such columns exist. If there is no such col-
umn, hC, Di is added to the set F. This strategy leads to a huge time saving
while the quality of the decomposition is comparable with the decomposition
obtained via GreCon. Similar to GreCon, the algorithm is also designed to
compute an exact and approximate from-below factorization.
2
     GreCon is the abbreviation for Greedy Concepts.
3
     GreConD is the abbreviation for Greedy Concepts on Demand.
           Revisiting the GreCon Algorithm for Boolean Matrix Factorization                                          63

3    Comparison of GreCon and GreConD

In what follows, we demonstrate differences between the search strategies of
GreConD and GreCon. Let us consider matrix I, depicted below, with rows
{1, 2, 3, 4, 5} and columns {a, b, c, d, e, f }
                                              abcdef
                                                       
                                              001111 1
                                            0 1 0 1 0 1 2
                                                       
                                        I = 1 1 0 0 1 1 3
                                            1 0 0 1 1 0 4
                                              000110 5

and two sets of factor concepts F1 and F2 that are computed via GreConD
and GreCon respectively. The corresponding matrices are:
                                                              
                 00111             100010                                              10010                000110
            0 1 0 1 0 0 1 0 0 0 1                     0 0 1 0 0 1 1 0 0 1 1
                                                                                 
AF1 ◦ BF1 =  1 1 0 0 1  ◦  0 0 1 1 1 1  , AF2 ◦ BF2 =  0 1 0 0 1  ◦  0 1 0 1 0 1  .
            1 0 0 1 1 0 0 0 1 0 0                     1 0 0 0 1 0 0 1 1 1 1
                 00011             000010                                              10000                100010

    The search space, in each iteration of GreCon, is the set of all formal con-
cepts. Namely, in each the iteration all formal concepts in the concept lattice of
I (depicted in Figure 1) are considered as candidates for factors, in order in they
were generated.


                                                  1, 2, 3, 4, 5



                                    f                           d                e
                              1, 2, 3     1, 2, 4, 5                             1, 3, 4, 5


                                d, f                     e, f                    d, e
                  b, f          1, 2              1, 3                           1, 4, 5          a, e
                  2, 3                                                                            3, 4


                b, d, f             a, b, e, f                          c, d, e, f                a, d, e
                          2                      3                  1                         4



                                                 a, b, c, d, e, f


                                Fig. 1: Concept lattice of I.


    The search space of GreConD is different. GreConD always starts with
attributes concepts. Then it extends the selected concept via some promising
attributes, i.e. GreConD search space is limited to chains in the corresponding
64                        Martin Trnecka and Roman Vyjidacek

concept lattice and the selection of a particular chain is driven by the greedy
choice. Note, the columns of the input matrix are considered in a fixed order.
As a consequence of this GreConD may stop in some local maximum.
    In a fact, this is true for the first iteration on our example data matrix I.
GreConD starts with attribute a which generates h{3, 4}, {a, e}i. All remaining
candidates (attribute concepts) have smaller or equal size as the first concept.
Then, h{3, 4}, {a, e}i is extended by some attributes to obtain a potentially better
candidate (candidate which covers more not yet covered entries of I), namely
attributes b, d, f are considered, meanwhile c is skipped because a, c, e do not
generate any formal concept. The concepts considered as candidates for factors
in the first iteration of GreConD are depicted by black nodes in Figure 1.
As a consequence of this GreConD is not able to discover the best choice
h{1, 4, 5}, {d, e}i.
    Despite the differences between the algorithms, the results produced by them
are comparable—GreConD produces slightly worse results than GreCon—in
terms of the overall coverage which is evaluated by the metric (2). The com-
parison of the coverage is shown in Figure 2. Note, the coverage is computed as
1 − E(I,A◦B)
        ||I||  .




                1.0                                                                              1.0


                0.8                                                                              0.8
     Coverage




                                                                                      Coverage




                0.6                                                                              0.6


                0.4                                                                              0.4


                0.2                                                                              0.2
                                                                           GreCon                                                                    GreCon
                                                                           GreConD                                                                   GreConD
                0.0                                                                              0.0
                      0     25         50   75    100      125    150    175    200                    0    50    100       150          200   250        300
                                              Number of Factors                                                         Number of Factors


                                  (a) Americas small                                                             (b) Customers


             1.0                                                                                  1.0


             0.8                                                                                  0.8
  Coverage




                                                                                       Coverage




             0.6                                                                                  0.6


             0.4                                                                                  0.4


             0.2                                                                                  0.2
                                                                          GreCon                                                                     GreCon
                                                                          GreConD                                                                    GreConD
             0.0                                                                                  0.0
                   0             100        200          300       400         500                      0   20     40       60           80    100       120
                                              Number of Factors                                                         Number of Factors


                                            (c) DNA                                                              (d) Mushroom

Fig. 2: Comparison of coverage produced by factorization obtained via GreCon
and GreConD on selected real-word data.
                              Revisiting the GreCon Algorithm for Boolean Matrix Factorization                                                                                65

    This observation—in fact results presented in Figure 2 are standard in BMF—
was confirmed by previous works e.g. [1, 2]. The interpretation of this observation
is affected by a small misunderstanding. Namely, graphs in Figure 2 capture the
cumulative coverage, i.e. the sum of all covered entries for a given number of
factors. In this sum, the differences between the factorizations be may not easy
to see.
    In [3] a different kind of experiments were performed. Instead of plotting the
cumulative coverage, the coverage of i-th factor in one factorization in contrast
with the coverage of i-th factor in the second factorization is plotted, i.e. plotted
points have coordinates based on the number of covered entries. Results of this
experiments—[3] includes only the results for Mushroom data (see Section 5)—
for selected data are depicted in Figure 3.

                                                                                                            104
            105

            104                                                                                             103
                 3
  GreConD




            10
                                                                                                  GreConD
                                                                                                            102
            102
                                                                                                            101
            101

            100                                                                                             100
                          0          1              2                 3        4        5
                     10         10             10                10       10       10                                   100         101         102         103         104
                                                        GreCon                                                                             GreCon

                               (a) Americas small                                                                               (b) Customers

            104
                                                                                                            104
            103
                                                                                                            103
                                                                                                  GreConD
  GreConD




            102
                                                                                                            102
                 1
            10
                                                                                                            101
                 0
            10
                                                                                                            100
                     100                 101               102            103               104                   100         101         102         103         104
                                                        GreCon                                                                              GreCon

                                           (c) DNA                                                                              (d) Mushroom

Fig. 3: Comparison of GreCon and GreConD algorithms on selected real-
word data. The plot uses axes with a logarithmic scale. The plotted points have
coordinates based on the number of covered entries.

    One may easily see, that the difference between factorizations may be very
large despite small differences in Figure 2. In case of Customers dataset (Fig-
ure 3b) the datapoints are gathered close to the diagonal which means that
there is a small difference between the corresponding factors. In contrast with
this a significant difference between GreCon and GreConD on DNA dataset
(Figure 3c) may be observed.
66      Martin Trnecka and Roman Vyjidacek

4     Redesign of GreCon
Now we describe a basic idea of redesigning the GreCon algorithm. We call the
redesigned version of the algorithm GreCon2 because it is mainly an efficient
implementation of the original GreCon algorithm.
   Differently from GreCon, GreCon2 does not need repeatedly compute the
number of not yet covered entries for each formal concept—which is the most
time consuming task in GreCon. Instead of this the actual cover for all formal
concepts is maintained for the entire running time. For this purpose, a conve-
nient data structure which enables an additional speed up is used. Additionally,
GreCon2 does not iterate over all the formal concepts (i.e. data structure) but
only over the numbers (integers). A pseudocode of GreCon2, which accepts as
an input Boolean matrix I ∈ {0, 1}m×n , is depicted in Algorithm 1.

    Algorithm 1: GreCon2
     Input: Boolean matrix I ∈ {0, 1}m×n
     Output: Set F of factor concepts.
 1 concepts ← B(I);
 2 foreach hAl , Bl i ∈ concepts do
 3    covers[l] ← ||Al || · ||Bl ||
 4    foreach (i, j) ∈ hAl , Bl i do
 5        append l to list cell[i · n + j]
 6    end
 7 end
 8 while AF ◦ BF 6= I do
 9    add hAl , Bl i which maximizes covers[l] to F
10    foreach (i, j) ∈ hAl , Bl i do
11        foreach k ∈ cell[i · n + j] do
12            covers[k] ← covers[k] − 1
13        end
14        delete list cell[i · n + j]
15    end
16 end



    In the first phase, GreCon2 computes the set of all formal concepts B(I)
(line 1). Then the algorithm computes a coverage for each concept—stores in
array covers (line 3)—and for each Iij constructs a list of concepts that cover
Iij (lines 4–6). These lists are stored in array cell on a corresponding indexes
(line 5). Note that accessing the entries of arrays covers and cell requires a
constant time.
    In the second phase, GreCon2 performs linear search in array covers and
adds to F concept hAl , Bl i for which the value of cover[l] is maximal (line 9).
For each Iij covered by hAl , Bl i GreCon2 decrease values in array covers for
each concept that covers Iij (lines 10–13) and deletes the list for Iij (line 14).
The second phase is repeated until all the entries of the input data matrix are
covered or, equivalently, there is no covers[l] > 0.
            Revisiting the GreCon Algorithm for Boolean Matrix Factorization         67

5     Experimental Evaluation
In this section, we present results of an experimental comparison of GreCon,
GreConD and GreCon2. We implemented4 all three algorithms in the Swift
programming language with the same level of optimization to make the compar-
ison fair.

5.1    Datasets
We used eight real-world datasets, namely Advertisement, Mushroom, Tic Tac
Toe from [7]; Americas small, Apj, Customer, Firewall 1 from [4]; and DNA [11].
The characteristics of the datasets are shown in Table 1. Specifically the num-
ber of objects, the number of attributes, the density of nonzero entries in data
in percents, the number of concepts, and the number of object and attribute
concepts. All of them are well known and widely used as benchmark datasets in
BMF.


                    Table 1: Datasets and their characteristics.

       dataset          # objects   # attributes   dens.    |B(I)|   |O(I) ∪ A(I)|
       Advertisement        3279            1557    0.88     9192            2648
       Americas small       3477            1587    1.91     2764             524
       Apj                  2044            1164    0.29      798             723
       Customer            10961             277    1.50    47848            5806
       DNA                  4590             392    1.47     4483            1450
       Firewall 1            365             709   12.35      317             152
       Mushroom             8124             119   17.65   221525            8231
       Tic Tac Toe           958              30   33.33    59505             988




5.2    Running Times Comparison
We compared the running times of GreCon, GreConD and GreCon2. All
experiments were performed on an ordinal computer with 2.7GHz Quad-Core
Intel Core i7 processor and 16GB of RAM. Each algorithm was run 5 times
on a particular data and the average time as well the standard deviation (the
numbers after ±) are reported in Table 2. Note, the running times do not involve
the time required to load input data. In case of GreCon and GreCon2, the
time required for the computation of all formal concepts, which is done via our
implementation of FCbO algorithm [12], is included in each iteration.
    Table 2 shows that in all cases GreCon2 significantly outperforms origi-
nal GreCon algorithm. Moreover, in most cases GreCon2 outperforms Gre-
ConD. GreConD performs well on datasets where the number of attributes
4
    All implementations together with scripts that performs all presented experiments
    are available on the GitHub https://github.com/rvyjidacek/experiments-cla2020
68    Martin Trnecka and Roman Vyjidacek

Table 2: Comparison of running times of GreCon, GreCon2 and GreConD
in seconds. The average time over five iterations as well as standard deviations
(the numbers after ±) are presented.

          dataset                GreCon        GreCon2        GreConD
          Advertisement      916.40 ± 9.80   1.75 ± 0.06    295.72 ± 1.38
          Americas small      94.68 ± 0.17   1.37 ± 0.02     96.46 ± 0.61
          Apj                 16.70 ± 0.14   0.21 ± 0.00     59.52 ± 0.23
          Customer          1179.33 ± 4.85   3.23 ± 0.02     11.95 ± 0.15
          DNA                133.71 ± 2.21   0.48 ± 0.01      21.9 ± 0.50
          Firewall 1           0.76 ± 0.02   0.12 ± 0.00      4.51 ± 0.08
          Mushroom         1139.91 ± 15.37   31.14 ± 0.22    3.99 ± 0.03
          Tic Tac Toe          5.69 ± 0.17    0.78 ± 0.01    0.04 ± 0.00



Table 3: Comparison of running times and the final number of factors of Gre-
Con2 restricted to O(I) ∪ A(I) and unrestricted GreConD.

       dataset            GreCon2     # factors     GreConD       # factors
       Advertisement    0.38 ± 0.00        756    295.72 ± 1.38        766
       Americas small   0.22 ± 0.01        208     96.46 ± 0.61        197
       Apj              0.07 ± 0.00        463     59.52 ± 0.23        464
       Customer         0.41 ± 0.00        281     11.95 ± 0.15        286
       DNA              0.15 ± 0.00        515     21.90 ± 0.50        511
       Firewall 1       0.05 ± 0.00         66      4.51 ± 0.08         66
       Mushrooms        0.42 ± 0.00        103      3.99 ± 0.03        118
       Tic Tac Toe      0.01 ± 0.00         29      0.04 ± 0.00          32




is rather small. This is the reason why GreConD outperforms GreCon2 on
Mushroom dataset which has a larger number of concepts and only a small
number of attributes.
    Note that our implementation of GreCon2 may be improved. Namely the
construction phase of the algorithm (lines 2–6 in Algorithm 1) can be a part of
FCbO algorithm (line 1). We decide to keep these two parts separated to make
the comparison between GreCon and GreCon2 fair.


5.3   A New Approach to BMF

A significant speed up provided by GreCon2 allows us to consider the set of all
formal concepts as a set of candidates for factors. Despite considerable speedup,
the set may be still too large (like in the case of Mushroom dataset). As a
very promising research direction seems to be a restriction of the set of formal
concepts, i.e. skip formal concepts that are potentially not a good candidates
for factors. To demonstrate this idea, we consider in GreCon2 the set O(I) ∪
                          Revisiting the GreCon Algorithm for Boolean Matrix Factorization                                                                             69

A(I) instead of B(I). We perform the same experiments as above and compare
GreCon2 with the restricted search space and unchanged GreConD.
   From the results shown in Table 3, we observe that GreCon2 significantly
outperforms GreConD in therm of running times. Moreover, in this case Gre-
Con2 produces slightly better results in terms of overall coverage (see Figure 4)
and in some cases a smaller number of factors than GreConD (see Table 3).



               1.0                                                                          1.0

               0.8                                                                          0.8
    Coverage




                                                                                 Coverage
               0.6                                                                          0.6

               0.4                                                                          0.4

               0.2                                                                          0.2
                                                                       GreCon2                                                                               GreCon2
                                                                       GreConD                                                                               GreConD
               0.0                                                                          0.0
                     0   100        200   300     400     500    600   700                        0   25        50    75       100    125        150   175     200
                                           Number of Factors                                                               Number of Factors

                                (a) Advertisement                                                              (b) Americas small


               1.0                                                                          1.0

               0.8                                                                          0.8
    Coverage




                                                                                 Coverage




               0.6                                                                          0.6

               0.4                                                                          0.4

               0.2                                                                          0.2
                                                                       GreCon2                                                                               GreCon2
                                                                       GreConD                                                                               GreConD
               0.0                                                                          0.0
                     0         20         40         60         80       100                      0        5         10         15          20         25        30
                                           Number of Factors                                                               Number of Factors

                                    (c) Mushroom                                                                (d) Tic Tac Toe

Fig. 4: Comparison of coverage produced by factorization obtained via GreCon2
where B(I) was restricted to O(I) ∪ A(I) and GreConD on selected datasets.


6               Conclusion and Further Research

We proposed a revised version of GreCon algorithm which significantly out-
performs the original algorithm and which is competitive to the fastest BMF
algorithms. Additionally, we show that our approach enables us to consider a
larger search space than one of the most popular BMF algorithms, which in the
past has caused that GreCon to be forgotten in BMF research.
    Further research shall include an efficient implementation of the approach;
a deep investigation of how the set of all formal concepts may be restricted
without affecting the outcome quality; and last but not least, a parallelization
of the approach.
70     Martin Trnecka and Roman Vyjidacek

Acknowledgments
Supported by Junior research Grant No. JG 2020 003 of the Palacký University
Olomouc. Support by Grant No. IGA PrF 2020 030 of IGA of Palacký University
is also acknowledged.


References
 1. Belohlavek, R., Outrata, J., Trnecka, M.: Toward quality assess-
    ment of Boolean matrix factorizations. Inf. Sci. 459, 71–85 (2018).
    https://doi.org/10.1016/j.ins.2018.05.016
 2. Belohlavek, R., Trnecka, M.: From-below approximations in Boolean matrix fac-
    torization: Geometry and new algorithm. J. Comput. Syst. Sci. 81(8), 1678–1697
    (2015). https://doi.org/10.1016/j.jcss.2015.06.002
 3. Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a
    novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010).
    https://doi.org/10.1016/j.jcss.2009.05.002
 4. Ene, A., Horne, W.G., Milosavljevic, N., Rao, P., Schreiber, R., Tarjan, R.E.: Fast
    exact and heuristic methods for role minimization problems. In: Ray, I., Li, N. (eds.)
    13th ACM Symposium on Access Control Models and Technologies, SACMAT
    2008, Estes Park, CO, USA, June 11-13, 2008, Proceedings. pp. 1–10. ACM (2008).
    https://doi.org/10.1145/1377836.1377838
 5. Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa,
    S. (eds.) Discovery Science, 7th International Conference, DS 2004, Padova, Italy,
    October 2-5, 2004, Proceedings. Lecture Notes in Computer Science, vol. 3245, pp.
    278–289. Springer (2004). https://doi.org/10.1007/978-3-540-30214-8 22
 6. Ignatov, D.I., Nenova, E., Konstantinova, N., Konstantinov, A.V.: Boolean ma-
    trix factorisation for collaborative filtering: An fca-based approach. In: Agre, G.,
    Hitzler, P., Krisnadhi, A.A., Kuznetsov, S.O. (eds.) Artificial Intelligence: Method-
    ology, Systems, and Applications - 16th International Conference, AIMSA 2014,
    Varna, Bulgaria, September 11-13, 2014. Proceedings. Lecture Notes in Computer
    Science, vol. 8722, pp. 47–58. Springer (2014). https://doi.org/10.1007/978-3-319-
    10554-3 5, https://doi.org/10.1007/978-3-319-10554-3 5
 7. Lichman,        M.:      UCI       machine       learning      repository      (2013),
    http://archive.ics.uci.edu/ml
 8. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The dis-
    crete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008).
    https://doi.org/10.1109/TKDE.2008.53
 9. Monson, S.D., Pullman, S., Rees, R.: A survey of clique and biclique coverings and
    factorizations of (0,1)-matrices. No. 14 (1995)
10. Mouakher, A., Yahia, S.B.: QualityCover: Efficient binary relation cover-
    age guided by induced knowledge quality. Inf. Sci. 355-356, 58–73 (2016).
    https://doi.org/10.1016/j.ins.2016.03.009
11. Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.:
    Dna copy number amplification profiling of human neoplasms. Oncogene 25(55),
    7324–7332 (2006)
12. Outrata, J., Vychodil, V.: Fast algorithm for computing fixpoints of galois connec-
    tions induced by object-attribute relational data. Inf. Sci. 185(1), 114–127 (2012).
    https://doi.org/10.1016/j.ins.2011.09.023