Impact of Boolean factorization as preprocessing
    methods for classification of Boolean data⋆

                   Radim Belohlavek, Jan Outrata, Martin Trnecka

                      Data Analysis and Modeling Lab (DAMOL)
                 Dept. Computer Science, Palacky University, Olomouc
      radim.belohlavek@acm.org, jan.outrata@upol.cz, martin.trnecka@gmail.com


          Abstract. The paper explores a utilization of Boolean factorization as
          a method for data preprocessing in classiﬁcation of Boolean data. In
          previous papers, we demonstrated that data preprocessing consisting in
          replacing the original Boolean attributes by factors, i.e. new Boolean
          attributes that are obtained from the original ones by Boolean factoriza-
          tion, improves the quality of classiﬁcation. The aim of this paper is to
          explore the question of how the various Boolean factorization methods
          that were proposed in the literature impact the quality of classiﬁcation.
          In particular, we compare three factorization methods, present experi-
          mental results, and outline issues for future research.


  1     Problem Setting

  In classiﬁcation of Boolean data, the objects to classify are described by Boolean
  (binary, yes-no) attributes. As with the other classiﬁcation problems, one may
  be interested in preprocessing of the input attributes to improve the quality of
  classiﬁcation. With Boolean input attributes, we might want to limit ourselves
  to preprocessing with a clear semantics. Namely, as it is known, see e.g. [2, 8],
  applying to Boolean data the methods designed originally for real-valued data
  distorts the meaning of the data and leads generally to results diﬃcult to inter-
  pret. In [9, 10], we proposed a method for preprocessing Boolean data based on
  the Boolean matrix factorization (BMF) method, i.e. a decomposition method
  for Boolean matrices, developed in [2]. The method consists in using for classi-
  ﬁcation of the objects new Boolean attributes. The new attributes are actually
  the factors computed from the original attributes. Since the factors are essen-
  tially (some of the) formal concepts [4] associated to the input data, they have
  a clear meaning and are easy to interpret [2]. Moreover, there exists a natural
  transformation of the objects between the space of the original attributes and
  the space of the factors [2] which is conveniently utilized by the method. It has
  been demonstrated in [9, 10] that such preprocessing makes it possible to classify
  ⋆
      We acknowledge support by the ESF project No. CZ.1.07/2.3.00/20.0059, the project
      is co-ﬁnanced by the European Social Fund and the state budget of the Czech Re-
      public (R. Belohlavek); Grant No. 202/10/P360 of the Czech Science Foundation (J.
      Outrata); and by the IGA of Palacky University, No. PrF 20124029 (M. Trnecka).


c 2012 by the paper authors. CLA 2012, pp. 305–316. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–84–695–5252–0,
  Universidad de Málaga (Dept. Matemática Aplicada), Spain.
306     Radim Belohlavek, Jan Outrata and Martin Trnecka


using a smaller number of input variables (factors instead of original attributes)
and yet improve the quality of classiﬁcation. In addition to the method from
[2], there exist several other BMF methods described in the literature. In the
present paper, we therefore look at the question of how these methods inﬂuence
the quality of classiﬁcation. In particular, we focus on three such methods and
provide an experimental evaluation using basically the same scenario as in [9,
10]. Doing so, we emphasize the need to consider not only coverage and the
number of extracted factors, but also additional criteria regarding quality of the
proposed BMF methods.
    We use the following notation. We denote by X = {1, . . . , n} a set of objects
which are given along with their input Boolean attributes which form the set
Y = {1, . . . , m}, and a class attribute c. The input attributes are described by
an n × m Boolean matrix I with entries Iij (entry at row i and column j), i.e.
Iij ∈ {0, 1} for every i, j. Alternatively, I may be considered as a representation
of a binary relation between X and Y and, hence, we may speak of a formal
context hX, Y, Ii, etc. Since there is no danger of confusion, we conveniently
switch between the matrix and relational way of looking at things. The class
attribute c may be conceived as a mapping c : X → C assigning to every object
i ∈ X its class label c(i) in the set C of all class tables (note that C may contain
more than two labels).
    The preprocessing method along with the three particular methods of Boolean
matrix factorization is described in Section 2. Section 3 describes the experiments
and provides their results. In Section 4 we conclude the paper and provide some
directions for future research.


2     Boolean Matrix Factorization and Its Utilization
2.1   General BMF Problem
We denote by {0, 1}n×m the set of all n × m Boolean matrices and by Ii and
I j the ith row and jth column, respectively, of matrix I. In BMF, the general
aim is to ﬁnd for a given I ∈ {0, 1}n×m (and possibly other parameters, see
Problem 1 and Problem 2) matrices A ∈ {0, 1}n×k and B ∈ {0, 1}k×m for which

                      I is (approximately) equal to A ◦ B,                      (1)

with ◦ being the Boolean matrix product given by
                                          k
                                          _
                            (A ◦ B)ij =         Ail · Blj ,                     (2)
                                          l=1
       W
where     denotes the maximum and · the ordinary product. Such an exact or
approximate decomposition of I into A◦B corresponds to a discovery of k factors
(new Boolean variables) that exactly or approximately explain the data. Namely,
factor l = 1, . . . , k, may be represented by A l (column l of A) and Bl (row l
of B): Ail = 1 indicates that factor l applies to object i while Blj indicates that
              Impact of Bool. factorization as preprocessing for classiﬁcation   307


attribute j is a particular manifestation of factor l (think of person A as object,
“being ﬂuent in English” as attribute, and “having good education” as factor).
The least k for which an exact decomposition I = A ◦ B exists is called the
Boolean (or Schein) rank of I [2, 5, 8]. Then, according to (2), the factor model
reads “object i has attribute j if and only if there exists factor l such that l
applies to i and j is a particular manifestation of l”.
    The matrices I, A, and B are usually called the object-attribute matrix,
the object-factor (or usage) matrix, and the factor-attribute (or basis vector)
matrix [2, 8]. The methods described in the literature are usually designed for
two particular problems. Consider the matrix metric [5, 8] (arising from the L1 -
norm || · || of matrices, or Hamming weight in case of Boolean matrices) given
by                                         Pm,n
                   E(C, D) = ||C − D|| = i=1,j=1 |Cij − Dij |.                  (3)
E(I, A ◦ B) may be used to asses how well the product A ◦ B approximates the
input matrix I.
Problem 1
   input: I ∈ {0, 1}n×m , positive integer k
   output: A ∈ {0, 1}n×k and B ∈ {0, 1}k×m minimizing ||I − A ◦ B||.
This problem is called the discrete basis problem (DBP) in [8]. In [2], the following
problem is considered:
Problem 2
   input: I ∈ {0, 1}n×m , positive integer ε
   output: A ∈ {0, 1}n×k and B ∈ {0, 1}k×m with k as small as possible such
   that ||I − A ◦ B|| ≤ ε.
    The two problems reﬂect two important views on BMF, the ﬁrst one em-
phasizing the importance of the ﬁrst k (presumably most important) factors,
the second one emphasizing the need to account for (and thus to explain) a pre-
scribed portion of data. Note that the problem of ﬁnding an exact decomposition
of I with the least number k of factors possible is a particular instance of Prob-
lem 2 (put ε = 0). Note also that it follows from the known results that both
Problem 1 and Problem 2 are NP-hard optimization problems, see e.g. [2, 8], and
hence approximation algorithms are needed to obtain (suboptimal) solutions.

2.2   Use of BMF in Preprocessing of Boolean Data
The idea may be described as follows. For a given set X of objects, set Y of
attributes, Boolean matrix I, and class attribute c, we compute n × k and k × m
Boolean matrices A and B, respectively, for which A ◦ B approximates I reason-
ably well (either according to the scenario given by Problem 1 or Problem 2).
Then, instead of the original instance hX, Y, I, ci of the classiﬁcation problem,
we consider a new instance given by hX, F, A, ci, with F = {1, . . . , k} denoting
the factors, i.e. new Boolean attributes. Any classiﬁcation model developed for
hX, F, A, ci may then be used to classify the objects described by the original
308     Radim Belohlavek, Jan Outrata and Martin Trnecka


Boolean attributes from Y . Namely, one may utilize natural transformations
g : {0, 1}m → {0, 1}k and h : {0, 1}k → {0, 1}m between the space of the original
attributes and the space of factors which are given by
                       Vm                                     Wk
           (g(P ))l = j=1 (Blj → Pj ) and (h(Q))j = l=1 (Ql · Blj )

for P ∈ {0, 1}m and Q ∈ {0, 1}k ( and → denote minimum and implication).
                                      V
These transformations are described in [2] to which we refer for more information.
In particular, given an object represented by P ∈ {0, 1}m (vector of values of the
m input attributes), we apply the classiﬁcation method developed for hX, F, A, ci
to g(P ), i.e. to the object representation in the space of factors. Any classiﬁcation
model MF : {0, 1}k → C for hX, F, A, ci therefore induces a classiﬁcation model
MY : {0, 1}m → C by MY (P ) = MF (g(P )) for any P ∈ {0, 1}m .
    Note that since the number k of factors of I is usually smaller than the num-
ber m of attributes (see [2], which means a reduction of dimensionality of data)
and the transformation of objects from the attribute space to the factor space is
not an injective mapping, we need to solve the problem of assigning a class label
to objects in hX, F, A, ci with equal g(P ) representations transformed from ob-
jects in hX, Y, I, ci with diﬀerent P representations and diﬀerent assigned class
labels. We adopt the common solution of assigning to such objects in hX, F, A, ci
the majority class label of class labels assigned to the objects in hX, Y, I, ci.

2.3   Three Methods for Boolean Matrix Factorization Used in Our
      Experiments
Asso [8] works as follows. From the input n × m matrix I, the required number
k of factors, and parameters τ, w+ , and w− , the algorithm computes an m × m
matrix C in which Cij = 1 if the conﬁdence of the association rule {i} ⇒ {j} is
at least τ . The rows of C are then the candidate rows for matrix B. The actual k
rows of B are selected from the rows of C in a greedy manner using parameters
w+ and w− . During the greedy selection, the k columns of A are selected along
with the k rows of B. This way, one obtains from I two matrices A and B such
that A◦B approximates I. Asso is designed for Problem 1. There is no guarantee
that Asso computes an exact factorization of I even for k = m, see [3]. In our
experiments, we used τ = 1 and w+ = w− = 1 because such choice guarantees
that for k = m all the 1s in I will be covered by the computed factors.

GreConD This algorithm, described in [2] where it is called Algorithm 2, utilizes
formal concepts of I as factors. Namely, the algorithm is selecting formal concepts
of I, one by one, until a decomposition of I into A ◦ B is obtained. The selected
formal concepts are utilized in a simple way: The (characteristic vectors of the)
extents and intents of the concepts form the columns and rows of A and B.
The algorithm may be stopped after computing the ﬁrst k concepts or whenever
||I − A ◦ B|| ≤ ε, i.e. the algorithm may be used for solving Problem 1 as well
as Problem 2. The formal concepts are selected in a greedy manner to maximize
the drop of the error function, in particular, on demand way, whence the name
Gre(edy)Con(concepts on)D(emand).
                Impact of Bool. factorization as preprocessing for classiﬁcation       309


GreEssQ This algorithm [3] utilizes formal concepts of I in the same way as
GreConD. The concepts are selected in a greedy manner, but contrary to Gre-
ConD, the concepts are selected using a particular heuristic that is based on the
information provided by certain intervals in the concept lattice of I. As with
GreConD, GreEssQ may be used to solve both Problem 1 and 2.


3     Experiments

We performed a series of experiments to evaluate the impact of the three Boolean
matrix factorization methods described in Section 2.3 on classiﬁcation of Boolean
data when using factors as new attributes. The experiments consisted in com-
paring the classiﬁcation accuracy of learning models created by selected machine
learning (ML) algorithms from the data with the original attributes replaced by
factors. The factors are computed from the input data by the three selected fac-
torization methods. The ML algorithms used in the comparison are: the reference
decision tree algorithms ID3 and C4.5 (entropy and information gain based), an
instance based learning method (Nearest Neighbor, NN), Naive Bayes learning
(NB) and a multilayer perceptron neural network trained by back propagation
(MLP) [7, 11]. The algorithms were borrowed and run from Weka1 , a software
package that contains implementations of machine learning and data mining
algorithms in Java. Default Weka’s parameters were used for the algorithms.


                Table 1. Characteristics of datasets used in experiments

     Dataset         No. of attributes (binary) No. of objects    Class distribution
     breast-cancer             9(51)                 277                 196/81
     kr-vs-kp                 36(74)                 3196               1669/1527
     mushroom                 22(125)                282                 187/95
     vote                     16(32)                 232                124/108
     zoo                      15(30)                 101         41/20/5/13/4/8/10


    The experiments were done on selected public real-world datasets from UCI
Machine Learning Repository [1]. The selected datasets are from diﬀerent ar-
eas (medicine, biology, zoology, politics, games). All the datasets contain only
categorical attributes with one class attribute and the datasets were cleared of
objects containing missing values. Basic characteristics of the datasets are de-
picted in Table 1 (note that the mushroom dataset was shrunk in the number
of objects due to computation time reasons). Note that “9(51)” means 9 cate-
gorical and 51 binary attributes obtained by nominal scaling. The classiﬁcation
accuracy is evaluated using the 10-fold stratiﬁed cross-validation test [6] and the
1
    Waikato    Environment    for    Knowledge              Analysis,      available    at
    http://www.cs.waikato.ac.nz/ml/weka/
310     Radim Belohlavek, Jan Outrata and Martin Trnecka


following results are based on averaging 10 execution runs on each dataset with
randomly ordered objects.
    The results are depicted in Figures 1 to 5. Each ﬁgure contains ﬁve graphs
for the ﬁve ML algorithms used. The graphs show the average percentage rates
of correct classiﬁcations on the preprocessed data, i.e. the data hX, F, A, ci de-
scribed by factors instead of hX, Y, I, ci described by the original attributes, cf.
Sections 2.2 and 2.3 for each of the three Boolean matrix factorization meth-
ods. The percentage rates of GreEssQ, GreConD, and Asso are depicted by the
dashed, dot-and-dashed, and dotted lines, respectively. The x-axis corresponds
to the factor decompositions obtained by the algorithms and, in particular, mea-
sures the quantity
                        |hi, ji ; Iij = 1 and (A ◦ B)ij = 0|
                                                             ,
                                     |hi, ji ; Iij = 1|
i.e. the relative error w.r.t. 1s of the input matrix I that are left uncovered in
A ◦ B for the computed factorization given by A and B. The values on the x-axis
range from 0.9 (corresponding to a factorization with a small number of factors
that leave 90 % of the 1s in I uncovered) to 0 (corresponding to the number of
factors which decompose I exactly, i.e. I = A ◦ B). The average percentage rate
of correct classiﬁcation for the original data hX, Y, I, ci is depicted in each graph
by a constant solid line. All the graphs are computed for the testing parts of the
datasets used in the evaluation of classiﬁcation only.
     We can clearly see from the graphs for all datasets but breast-cancer that the
best results (average percentage rates of correct classiﬁcations) for preprocessed
data are obtained, for all ML algorithms used, by the GreEssQ algorithm, out-
performing both GreConD and, quite signiﬁcantly, the Asso algorithm. GreConD
outperforms the Asso algorithm, again for all ML algorithms used, for datasets
kr-v-kp and mushroom, but not for the vote dataset. We can also see from the
graphs that sometimes the preprocessed data lead to a better classiﬁcation ac-
curacy than the original data even with a few factors covering less than 100 % of
input data. This can be seen for instance for the kr-vs-kp dataset and Nearest
Neighbor and MLP or the mushroom dataset and ID.3, Naive Bayes and MLP.
See [9, 10] for indications of when, i.e. for which datasets and ML algorithms,
the data with original attributes replaced by factors (computed by GreConD)
covering 100 % of input data leads to a better classiﬁcation accuracy compared
to the original data.
     Particularly interesting seem the results for the breast-cancer dataset. As we
can see, the preprocessed data with factors instead of the original attributes are
(much) better classiﬁed compared to the original data and that this is observable
for all ML algorithms used except for Naive Bayes. Furthermore, the number of
factors leading to the best average percentage rates of correct classiﬁcations is
such that the factors cover just 40 % (which corresponds to 0.6 on the x-axis) of
input data! This indicates either many superﬂuous attributes or large noise in the
input data that is overcome by using the factors. The GreEssQ and GreConD
algorithms are of comparable performance here, both outperforming the Asso
algorithm.
              Impact of Bool. factorization as preprocessing for classiﬁcation                           311

                                               100


                  correct classification (%)
                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold


Fig. 1. Classiﬁcation accuracy for breast-cancer dataset, for (from top to bottom) ML
algorithms ID.3, C4.5, NN, NB and MLP
312     Radim Belohlavek, Jan Outrata and Martin Trnecka

                                               100


                  correct classification (%)
                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold


Fig. 2. Classiﬁcation accuracy for kr-vs-kp dataset, for (from top to bottom) ML al-
gorithms ID.3, C4.5, NN, NB and MLP
             Impact of Bool. factorization as preprocessing for classiﬁcation                           313

                                              100


                 correct classification (%)
                                               90

                                               80

                                               70

                                               60

                                               50
                                                 0.9   0.75       0.6    0.45       0.3      0.15   0
                                                         relative decomposition error threshold
                                              100
                 correct classification (%)


                                               90

                                               80

                                               70

                                               60

                                               50
                                                 0.9   0.75       0.6    0.45       0.3      0.15   0
                                                         relative decomposition error threshold
                                              100
                 correct classification (%)


                                               90

                                               80

                                               70

                                               60

                                               50
                                                 0.9   0.75       0.6    0.45       0.3      0.15   0
                                                         relative decomposition error threshold
                                              100
                 correct classification (%)


                                               90

                                               80

                                               70

                                               60

                                               50
                                                 0.9   0.75       0.6    0.45       0.3      0.15   0
                                                         relative decomposition error threshold
                                              100
                 correct classification (%)


                                               90

                                               80

                                               70

                                               60

                                               50
                                                 0.9   0.75       0.6    0.45       0.3      0.15   0
                                                         relative decomposition error threshold


Fig. 3. Classiﬁcation accuracy for mushroom dataset, for (from top to bottom) ML
algorithms ID.3, C4.5, NN, NB and MLP
314     Radim Belohlavek, Jan Outrata and Martin Trnecka

                                                100


                   correct classification (%)
                                                 90

                                                 80

                                                 70

                                                 60

                                                 50
                                                   0.9   0.75       0.6    0.45       0.3      0.15   0
                                                           relative decomposition error threshold
                                                100
                   correct classification (%)


                                                 90

                                                 80

                                                 70

                                                 60

                                                 50
                                                   0.9   0.75       0.6    0.45       0.3      0.15   0
                                                           relative decomposition error threshold
                                                100
                   correct classification (%)


                                                 90

                                                 80

                                                 70

                                                 60

                                                 50
                                                   0.9   0.75       0.6    0.45       0.3      0.15   0
                                                           relative decomposition error threshold
                                                100
                   correct classification (%)


                                                 90

                                                 80

                                                 70

                                                 60

                                                 50
                                                   0.9   0.75       0.6    0.45       0.3      0.15   0
                                                           relative decomposition error threshold
                                                100
                   correct classification (%)


                                                 90

                                                 80

                                                 70

                                                 60

                                                 50
                                                   0.9   0.75       0.6    0.45       0.3      0.15   0
                                                           relative decomposition error threshold


Fig. 4. Classiﬁcation accuracy for vote dataset, for (from top to bottom) ML algorithms
ID.3, C4.5, NN, NB and MLP
              Impact of Bool. factorization as preprocessing for classiﬁcation                           315

                                               100


                  correct classification (%)
                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold
                                               100
                  correct classification (%)


                                                90

                                                80

                                                70

                                                60

                                                50
                                                  0.9   0.75       0.6    0.45       0.3      0.15   0
                                                          relative decomposition error threshold


Fig. 5. Classiﬁcation accuracy for zoo dataset, for (from top to bottom) ML algorithms
ID.3, C4.5, NN, NB and MLP
316     Radim Belohlavek, Jan Outrata and Martin Trnecka


4     Conclusions
We presented an experimental study which shows that when Boolean matrix fac-
torization is used as a preprocessing technique in Boolean data classiﬁcation in
the scenario proposed in [9, 10], the particular factorization algorithms impact
in a signiﬁcant way the accuracy of classiﬁcation. For this purpose, we com-
pared three such algorithms from the literature. In addition to demonstrating
further the usefulness of Boolean factorization for classiﬁcation of Boolean data,
the paper emphasizes Boolean factorization as a data dimensionality reduction
technique that may be utilized in a similar way as the matrix-decomposition-
based methods designed for real-valued data.
    An extended version of this paper will include further factorization algo-
rithms in the experimental comparison (let us note in this respect that a techni-
cal problem with some such algorithms is that they are poorly described in the
literature). Furthermore, we intend to investigate and utilize further appropriate
transformation functions between the attribute and the factor spaces, in partic-
ular those suitable for approximate factorizations. A comparison with other data
dimensionality techniques, see e.g. the references in [12], in the presented sce-
nario is also an important topic for future research. In this respect, both the
impact on the classiﬁcation accuracy as well as the transparency of the resulting
classiﬁcation model are important aspects to be evaluated in such a comparison.


References
 1. Asuncion A., Newman D. J., UCI Machine Learning Repository. University of
    California, Irvine, School of Information and Computer Sciences, 2007.
    http://www.ics.uci.edu/~mlearn/MLRepository.html
 2. Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel
    method of matrix decomposition. J. Computer and System Sci. 76(1)(2010), 3–20.
 3. Belohlavek R., Trnecka M.: From-below approximations in Boolean matrix factor-
    ization: geometry, heuristics, and new BMF algorithm (submitted).
 4. Ganter B., Wille R.: Formal Concept Analysis. Mathematical Foundations.
    Springer, Berlin, 1999.
 5. Kim K. H.: Boolean Matrix Theory and Applications. M. Dekker, 1982.
 6. Kohavi R.: A Study on Cross-Validation and Bootstrap for Accuracy Estimation
    and Model Selection. Proc. IJCAI1995, pp. 1137–1145.
 7. Mitchell T. M.: Machine Learning. McGraw-Hill, 1997.
 8. Miettinen P., Mielikäinen T., Gionis A., Das G., Mannila H., The Discrete Basis
    Problem. IEEE Trans. Knowledge and Data Eng. 20(10)(2008), 1348–1362 (pre-
    liminary version in PKDD 2006, pp. 335–346.)
 9. Outrata J.: Preprocessing input data for machine learning by FCA. Proc. CLA
    2010, pp. 187–198, Sevilla, Spain.
10. Outrata J.: Boolean factor analysis for data preprocessing in machine learning.
    Proc. ICML 2010, pp. 899-902, Washington, D.C., USA.
11. Quinlan J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
12. Tatti N., Mielikäinen T., Gionis A., Mannila H.: What is the dimension of your
    binary data? Proc. ICDM 2006, pp. 603–612.