=Paper=
{{Paper
|id=None
|storemode=property
|title=Mining Concepts from Incomplete Datasets Utilizing Matrix Factorization
|pdfUrl=https://ceur-ws.org/Vol-972/paper3.pdf
|volume=Vol-972
|dblpUrl=https://dblp.org/rec/conf/cla/PiskovaPHK12
}}
==Mining Concepts from Incomplete Datasets Utilizing Matrix Factorization==
<pdf width="1500px">https://ceur-ws.org/Vol-972/paper3.pdf</pdf>
<pre>
      Mining Concepts from Incomplete Datasets
            Utilizing Matrix Factorization

         Lenka Pisková1 , Štefan Pero1 , Tomáš Horváth2 , Stanislav Krajči1
                 1
                     ICS, Pavol Jozef Šafárik University, Košice, Slovakia
                     {Lenka.Piskova, Stefan.Pero}@student.upjs.sk
                                Stanislav.Krajci@upjs.sk
                       2
                         ISMLL, University of Hildesheim, Germany
                                    horvath@ismll.de


        Abstract. Formal Concept Analysis aims at ﬁnding clusters (concepts)
        with given properties in data. Most techniques of concept analysis require
        a dense matrix with no missing values on the input. However, real data
        are often incomplete or inaccurate due to the noise or other unforeseen
        reasons. This paper focuses on using matrix factorization methods to
        complete the missing values in the input data such that it can be used
        with arbitrary concept analysis technique. The used matrix factorization
        model approximates the sparse object-item data matrix by a product of
        two dense factor matrices, thus, mapping objects and items to a common
        latent space. The mentioned object-factor and item-factor matrices are
        obtained by a simple stochastic gradient optimization method. We also
        investigate how the amount of missing values inﬂuences the output of the
        concept analysis. Two measures, well-known in the information retrieval
        community, have been used for the evaluation of the proposed framework.
        Real datasets from the UCI Machine Learning Repository were used in
        our experiments.

        Keywords: Formal Concept Analysis, matrix factorization, missing val-
        ues completion, clustering


  1    Introduction

  Formal Concept Analysis (FCA) deals with data in the form of a table which
  rows represent objects and columns represent attributes of objects. A table entry
  corresponding to an object x and an attribute y indicates whether or not the
  object x has the attribute y. The clusters on the output of FCA are called
  concepts, each of which consists of a set of formal objects and a set of formal
  attributes, called the extent and the intent, respectively, of the given concept.
  The set of all concepts ordered by ≤ forms a concept lattice [1]. The data table
  required on the input of FCA has to contain all information, i.e. should not
  contain missing values.
      However, real datasets are often incomplete or inaccurate due to the damage
  or inconsistency in the data collection process. Incomplete context was ﬁrst

c 2012 by the paper authors. CLA 2012, pp. 33–44. Copying permitted only for private
  and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–84–695–5252–0,
  Universidad de Málaga (Dept. Matemática Aplicada), Spain.
34       Lenka Pisková, Štefan Pero, Tomáš Horváth and Stanislav Krajči


introduced by Burmeister and Holzer [2] and applied to attribute implications
and attribute exploration. The analysis of incomplete data in the ﬁeld of FCA
can be accomplished in one of the following two ways: The ﬁrst one is to adapt
concept techniques to cope with missing values. Some research was conducted,
recently, to generate a concept lattice of an incomplete context [7]. The approach
presented in [8] is based on many-valued contexts and conceptual scaling. The
second way of handling incomplete data in FCA is to complete the missing values
in a pre-processing step and use a concept analysis on the pre-processed data.
    The contribution of this paper follows the latter case described above, namely,
to predict the missing values using a matrix factorization method (MF) in the
pre-processing step such that FCA can be used on the completed data. MF, one
of the useful techniques in data mining, allows the decomposition of a sparse1
matrix into two dense matrices such that the product of these two dense matrices
results in a dense matrix which is an approximation2 of the original, sparse
matrix. We use stochastic gradient descent matrix factorization method (SGD
MF) in this paper, which is fast and eﬀective. It is important to note that we need
to adjust the predicted values since these should be from the input values. The
main reason of it is that the resulting dense matrix is just an approximation
of the original one with speciﬁc values, e.g. 0 and 1 or 1, 2, 3, 4, 5, etc. The
resulting dense matrix is then scaled [1] and feed to FCA.
    We also address the problem of the robustness of formal concepts, i.e. the
ability to remain unaﬀected by small variations in an input table. As far as we
know, there is no related work investigating the issue that how the concepts
computed from a full table diﬀers from those computed from a pre-processed
table, i.e. what happens when we remove some values from the original table,
complete them with a pre-processing method and compute the concepts on the
pre-processed table? We think that this question is quite important for a real-life
application of FCA.


2      Related Work
One direction of estimating missing values in an input matrix is by the use of
association rule mining techniques consisting of two approaches: The ﬁrst one
discards instances which contain missing data and generate association rules
from the remaining complete data. However, excluding instances with missing
data can bias the result. The second approach takes into account the presence
of missing values in the rule mining process. Note that two or more association
rules may result to diﬀerent predicted values, thus the so called conﬂict problem
have to be tackled here. The large number of association rules and the eﬀort to
reduce the conﬂict problem have led to the usage of generic bases of association
rules. We refer to [3], [4] for details and further references. The percentage of
correctly completed missing values is aﬀected by the number of missing values.
1
     We will call a matrix with missing values sparse.
2
     The diﬀerence of the values in the non-empty cells of the original matrix and the
     predicted values for these cells is minimal.
      Mining Concepts from Incomplete Datasets Utilizing Matrix Factorization        35


A “hidden” problem is the percentage of missing values that these approaches
permits to complete.
    [5] and [6] show how to use extracted knowledge represented by formal con-
cepts for completing a relation or at least augmenting it in the case when we
do not get each missing value. They generate concepts from the sparse matrix
such that they remove rows (objects) containing missing values. However, these
approaches have two drawbacks, namely, that the large number of concepts (also
for a small relation) makes more diﬃcult to predict missing values, and, more
missing values leads to more biased completion.
    Combinations of MF and FCA are introduced in [9], [10] and [11], where sin-
gular value decomposition, semi-discrete decomposition and non-negative matrix
factorization are used for reducing the number of formal concepts. These works,
however, use matrix factorization for a diﬀerent purpose (reduction of the num-
ber of concepts) and consider full input matrix.
    A novel approach to combine matrix decomposition and factor analysis is
presented in [12], [13]. There are two main diﬀerences from ordinary factor anal-
ysis. First, a table entry represents a degree to which an object has an attribute
(table entries are from a bounded scale L). Second, the matrix composition
operator is a so-called t-norm-based product. The aim of this approach is not
the completion of missing values but ﬁnding a set of factors that correspond to
formal concepts.


3      Preliminaries

We will just brieﬂy describe FCA, and focus instead on a more detailed descrip-
tion of the used MF technique in this section since we think it could be more
interesting to the FCA community.


3.1     Formal Concept Analysis

A formal context is a triple (X, Y, I) consisting of two non-empty sets X and Y
and a binary relation I between them. The elements of X are called the objects
and the elements of Y are called the attributes. (x, y) ∈ I means that the object
x has the attribute y.
   For a set A ⊆ X and a set B ⊆ Y , deﬁne A′ = {y ∈ Y : (∀x ∈ A)(x, y) ∈ I}
and B ′ = {x ∈ X : (∀y ∈ B)(x, y) ∈ I}. A′ is the set of attributes common to
the objects in A and B ′ is the set of object which have all attributes in B.
   A formal concept of the context (X, Y, I) is a pair (A, B) of a set A ⊆ X of
objects and a set B ⊆ Y of attributes such that A′ = B and B ′ = A. A and B
are called the extent and the intent of the concept (A, B), respectively. Denote
the set of all concepts in (X, Y, I) by B(X, Y, I).
   Introduce a relation ≤ on B(X, Y, I) by (A1 , B1 ) ≤ (A2 , B2 )) ⇔ A1 ⊆ A2 ⇔
B2 ⊆ B1 , (A1 , B1 ) is called the subconcept of (A2 , B2 ) and (A2 , B2 ) is called the
superconcept of (A1 , B1 ). The set of concepts B(X, Y, I) ordered by ≤ constitutes
the concept lattice (or Galois lattice) of the context (X, Y, I). The so-called main
36        Lenka Pisková, Štefan Pero, Tomáš Horváth and Stanislav Krajči


theorem of concept lattices characterizes the structure of concept lattices. For
further details we refer to [1].


3.2      Stochastic Gradient Descent Matrix Factorization

Stochastic gradient descent matrix factorization (SGD MF) is one of the most
popular factorization techniques [14] because of its scalability and good accuracy.
   The goal of this approach is to approximate a sparse matrix X ∈ R|O|×|I|
with a product of two (much smaller) matrices W ∈ R|O|×K and H ∈ R|I|×K
such that
                               X ≈ X̂ = W H T ,                                 (1)
where |O|, |I| are the number of objects and items, respectively, and K is number
of (latent) factors. The ith row of the matrix W is a vector containing K latent
factors describing the object i, and the jth row in matrix H is a vector containing
K latent factors describing the item j [15].
    The estimate x̂ij of a missing value at the row i and column j of the sparse
matrix X is computed as
                                                     K
                                                     X
                               x̂ij = (W H T )ij =         wik hjk                          (2)
                                                     k=1

where wik (hjk ) is the value of W (H) at the ith (jth) row and kth column.
    We are interested in such X̂ which estimates the missing values of X well.
Since we do not know the missing values, we use the following trick, well-known
in the machine learning community [18]: We split X into two complementary
parts, called the train set X train and the test set X test . The model X̂ is then
approximated from X train and it’s quality is assessed by the root mean squared
error (RMSE) loss computed on X test , deﬁned as
                                    v P
                                    u
                                    u             (xij − x̂ij )2
                                    t xij ∈X test
                            RM SE =                              ,                          (3)
                                               |X test |

where xij are the values of the non-empty cells of X belonging to X test . In this
way, we simulate the case when we have a sparse matrix (X train ) which missing
values (X test ) are known, however.
    Our goal is to ﬁnd those parameters W and H of the model X̂ for which the
RMSE is minimal. However, we have to keep in mind that X test is “hidden”, i.e.
it represents the “unknown“ missing values, thus, we have to use3 the data we
know, i.e. the train set X train . Since |X train | is constant,Pminimizing RMSE (3)
on X train is equivalent to minimizing the sum of errors xij ∈X train (xij − x̂ij )2 .
Since RMSE is a (loss) function, we use a stochastic gradient descent (SGD)
3
     Here we also expect that X test has similar statistical characteristics as X train .
     Mining Concepts from Incomplete Datasets Utilizing Matrix Factorization          37


optimization method [16] for searching those parameters W and H of RMSE
such that the following objective function is minimal:
                       X
                             (xij − x̂ij )2 + λ(kW k2 + kHk2 )       (4)
                      xij ∈X train


where x̂ij is deﬁned as in (2) and λ is a regularization term to prevent the so-
called overﬁtting (i.e. when a model estimates very well the training data but
poorly the test data [18]). λ controls the magnitudes of the factor vectors such
that W and H would give a good approximation of the whole original input
matrix X (containing both X train and X test ). kW k2 means square of the one
vector in matrix W.
    In the training phase we ﬁrst initialize two matrices W and H with some
random values (for example from the normal distribution N (0, σ 2 ) with mean 0
and standard deviation 0.01) and compute the estimation error
                                         X
                               err =           e2ij                          (5)
                                           xij ∈X train

where
                                                          K
                                                          X
                       e2ij = (xij − x̂ij )2 = (xij −           wik hjk )2           (6)
                                                          k=1

    We minimize this error by updating the values of W and H in the following
way [16]: we randomly choose a value xij from X train and compute the gradient
of the objective function (6) in this value w.r.t. the parameters4 wi and hj , i.e.

                        δ 2
                           e = −2eij hjk = −2(xij − x̂ij )2 hjk                      (7)
                       δwik ij

                         δ
                           e2 = −2eij wik = −2(xij − x̂ij )2 wik                     (8)
                       δhjk ij
   In the next step we use the computed gradient to update the values of wik
and hjk :
                    (new)            δ 2
                  wik     = wik − β     e = wik + 2βeij hjk              (9)
                                    δwik ij
                       (new)                δ
                      hjk      = hjk − β       e2 = hjk + 2βeij wik                 (10)
                                           δhjk ij
where β is the learning rate controlling the step sizes.
    We update W and H for each value of X train in one iteration. The number
of iterations, i.e. the number of passes over the full X train is a hyper-parameter
of the factorization technique as well as the regularization term λ, the learn rate
β and the number of factors K.
4
    Only the vectors wi and hj contribute to e2ij , thus we treat the other vectors (rows
    of W and H) as constants (which derivatives are zero).
38     Lenka Pisková, Štefan Pero, Tomáš Horváth and Stanislav Krajči


    As we see, the quality of the model X̂ depends on the hyper-parameters
used in the learning phase. Also, it can happen that the split of the data into
the train and the test part was done in a way that these two parts have no
similar characteristics, e.g. one part contains large values while the other the
low ones. To prevent this phenomenon to happen we usually use n-fold cross
validation [18] in which we split X into n equal parts X1 , . . . , Xn and for each
n and hyper-parameter combination (K, λ, β, iterations) do the following: train
a model X̂ n with the given hyper-parameter combination on ∪i6=n Xi and com-
pute its RMSE on Xn . The resulting RMSE of X̂ with the hyper-parameters
(K, λ, β, iterations) is then the average RMSE over all X̂ n . After trying some
hyper-parameter combinations with n-fold cross validation, we choose the best
combination (K, λ, β, iterations)best which has the smallest average RMSE over
all folds. Finally, we train the model X̂ again using the whole input matrix X.
    It can happen, that the input matrix X contains an empty line or column
with no values. For those objects (rows) or items (columns) we will not update
the parameters W and H (it follows from the algorithm of SGD MF). In such
cases, estimated missing values could be the global average of non-missing values.
More advanced and sophisticated methods that can improve the estimation in
such cases are listed in [17].


4    Illustrative example

To illustrate our approach we present a small toy example. Our purpose is to
cluster students into groups with respect to their success in solving the tasks.
The key problem is that not every student have solved all of the tasks. We will
proceed as follows. We predict the success of students in tasks that they have
not solved yet (empty values in the input table). After the process of conceptual
scaling is completed, we ﬁnd formal concepts.
    The table in the left side of the ﬁgure 1 contains ﬁve students who have per-
formed several tasks. The total marks that can be scored are 9; 3 marks for each
task. Table values represent students’ marks for tasks. The matrix represents a
relation between objects and items (which are students and tasks in our case).
    Figure 1 shows an example of how we can factorize the students and tasks.
After the training phase with K = 2 latent factors (F1 and F2), we get the
student-factor matrix W and the factor-task matrix H. Suppose we would like to
predict Tom’s (3rd row) performance for the task 2 (2nd column). The prediction
is performed using equation 2

                        K
                        X
               x̂32 =         w3k h2k = 1.80 ∗ 1.20 + 0.13 ∗ 1.08 = 2.30.
                        k=1

    Since the predicted value should be included in the input matrix, we round
this value to one of the values 1, 2 and 3. We have predicted Tom will achieve
average results in task 2. Similarly, we can predict the performance of other
    Mining Concepts from Incomplete Datasets Utilizing Matrix Factorization          39


       X     T1 T2 T3                  W     F1 F2
     Katie   3   3                    Katie 1.18 1.52             H T1 T2 T3
     Dean    2   2    1               Dean 1.24 0.31
                                                                 F1 0.98 1.20 0.85
     Tom     2        2               Tom 1.80 0.13
                                                                 F2 1.35 1.08 0.54
      Sam        2                    Sam 0.91 1.57
     Adam    2   3    2               Adam 1.68 0.49

                          Fig. 1. Factorization of the input matrix.


students in tasks which they have not done yet. The full matrix is depicted in
the left side of the ﬁgure 2.
    Now, we would like to cluster students according to their results in solving the
tasks. The table in the left side of the ﬁgure 2 represents many-valued context.
Since values are ordered and each value implies the weaker one, we use the ordinal
scale in the right side of the ﬁgure 2. Performing a task at the highest level implies
performing the task at the average level, too. The table in the ﬁgure 3 is the
result of the conceptual scaling process. For details on many-valued contexts and
conceptual scaling we refer to [1].


                     T1 T2 T3
             Katie   3     3   2                                3 2 1
             Dean    2     2   1                              3×××
             Tom     2     2   2                              2 ××
             Sam     3     2   2                              1  ×
             Adam    2     3   2

Fig. 2. The many-valued context and the ordinal scale S for all attributes of the many-
valued context.


   The obtained concept lattice contains 6 concepts (clusters) and is depicted in
the ﬁgure 3. A labeling is simpliﬁed by putting down each object and attribute
only once. The extent of each concept is formed by collecting all objects which
can be reached by descending line paths from the concept. The intent of each
concept consists of all attributes located along ascending line paths.
   We interpret some of the clusters, here: Each student solved the task 1 and
task 2 at least at average level (the top element of the lattice). All students
except for Dean achieved at least average results in solving all tasks. Nobody
performed good results in every task (the bottom element of the lattice).


5    Experiments

Experiments were conducted on computing cluster with 7 computing nodes, each
of which has 16 cores, running Red Hat Linux.
40       Lenka Pisková, Štefan Pero, Tomáš Horváth and Stanislav Krajči


                                                                 Dean ✉ T10.0 , T10.5
                                                                        T20.0 , T20.5 , T30.0
                T1    T2    T3
               1 2 3 1 2 3 1 2 3                                 Tom ✉ T30.5
        Katie × × × × × × × ×
        Dean × ×    ××    ×                            Sam ✉ T11.0      Adam ✉ T21.0
        Tom × ×     ××    ××
         Sam × × × × ×    ××
        Adam × ×    ×××××                                        Katie ✉

                                                                       ✉ T31.0


Fig. 3. The one-valued context as the derived context of the many-valued context in
ﬁgure 2 and the corresponding concept lattice.


5.1     Data

For these experiments, we consider a complete database to act as a reference
database, and we randomly introduce missing values with the following diﬀerent
rates : 10%, 20% and 30%. Benchmark datasets used for this experiments are
from the UCI Machine Learning Repository5 . Characteristics of these datasets
are depicted in the table 1.


                            Table 1. Datasets characteristics

               Dataset     Objects Attributes Attribute types FCA attributes
               Wine          178        13       Integer, Real             68
           SPECT Heart       267        22        Categorical              46
            Tic-tac-toe      958        9         Categorical              29


    The data conversion into FCA format was done as follows. Categorical (many-
valued) attributes were converted by creating a formal context attribute for each
of the values. Attributes with integer or real types were scaled to create context
attributes with ranges of values.


5.2     Matrix factorization model settings

We implemented matrix factorization model in Java described in the section 3.2
on our own. We have used 3-fold cross-validation for testing the proposed model:
In each of the 3 iterations, one fold was used for testing. The remaining two folds
were used for tuning hyper-parameters of the model. Hyper-parameters (number
5
     http://www.ics.uci.edu/mlearn/MLRepository.html.
      Mining Concepts from Incomplete Datasets Utilizing Matrix Factorization       41


of factors, number of iterations, learn rate and regularization term) were tuned
using grid search [19] (a systematic search over diﬀerent combinations). The
best hyper-parameter combinations for our model and the root mean squared
error (RMSE) was measured on the test fold for each dataset. The best hyper-
parameters are reported in the table 2.


               Table 2. The best hyper-parameters found by grid search

          Dataset      Missing values (%) Factors Iterations Learn rate Reg. term
                               10           12       30        0.055      0.1
            Wine               20           12       40        0.055      0.05
                               30           12       50        0.055      0.05
                               10           22       50        0.055      0.05
       SPECT Heart             20           42       40        0.055      0.05
                               30           72       40        0.055      0.05
                               10           42       70        0.005      0.05
         Tic-tac-toe           20           12       60        0.005      0.05
                               30           32       90        0.005      0.1


5.3     Results

Concept lattices from complete data sets were computed, the number of concepts
is shown in the table 3.


   Table 3. Datasets and the number of concepts computed from the full matrix.


                                          number of concepts
                               Wine               24423
                           SPECT Heart           2135549
                            Tic-tac-toe           59505


    Table 4 presents the comparison of the RMSE measured over 3 folds and
average of success rates of the prediction for all datasets. Using matrix factor-
ization model we correctly predicted from 30% to 80% missing values. If we did
not predict a missing value correctly then the predicted value was close to the
original value.
    In order to evaluate methods for completing missing values to mine formal
concepts, one have to compare concepts generated from the original data table
(initial context) with concepts generated from the data table containing esti-
mated values (predicted context).
42      Lenka Pisková, Štefan Pero, Tomáš Horváth and Stanislav Krajči

          Table 4. RMSE and the average of success rates of the prediction

       Dataset      Missing values (%)    RMSE        Average of the success rates (%)
                            10            0.6944168                53.18
        Wine                20            0.7345221                66,46
                            30            0.6972556                48,84
                            10           0.37255815                78,61
     SPECT Heart            20            0.3785878                78,09
                            30           0.38591003                77,37
                            10           0.88052654                31,72
      Tic-tac-toe           20            0.9031108                31,56
                            30            0.9076978                33,71


   We propose to use two measures for evaluating the performance of the pro-
posed approach for the missing values completion with respect to mined con-
cepts. Both are well-known in the information retrieval (IR) community. The
ﬁrst one indicates how many concepts of the initial context occur in the set of
concepts of the estimated context. The second one determines how many con-
cepts of the estimated context are in the set of concepts of the initial context.
   Let (X, Y, I) be the initial formal context and (X, Z, J) be the estimated
formal context. Let O be the set of formal concepts of the initial context and
E be the set of formal concepts of the estimated one (O = B(X, Y, I) and
E = B(X, Z, J)). We propose to use the following two measures:

                                   |O ∩ E|              |O ∩ E|
                     precision =           and recall =         .
                                     |E|                  |O|
    Precision is the fraction of concepts of the estimated context that are in
the set of concepts of the initial context. Recall is the fraction of concepts of
the initial context that are in the set of concepts of the estimated context. The
higher the precision, the more concepts of the estimated context are in the set
of concepts of the initial context. The higher the recall, the more concepts of the
initial context are in the set of concepts of the estimated context.
    Two formal concepts are equal when any formal object in the extent of one is
in the extent of the other and vice versa, the same for formal attributes. In this
case, precision and recall are computed. From the extent-ional point of view:
two concepts are equal when they have the same extent. Analogously, from the
point of view of intent, two concepts are equal when they have the same intent. In
these cases, we calculate precisione , recalle and precisioni , recalli , respectively.
    The results of our experiments are shown in the table 4 and 5.
    It is noteworthy that there is a huge diﬀerence between precisione and
precisioni . We assume that the gap between precisione and precisioni is caused
by the fact that the number of objects is greater than the number of attributes.
    Moreover, the precision and recall are quite small. The reason is how we have
measured the equality of the sets: the sets are not equal if they diﬀer even only
in one element. Nevertheless, the proposed measures are useful for experimental
     Mining Concepts from Incomplete Datasets Utilizing Matrix Factorization           43

                            Table 5. Precision and recall.

      Dataset      Missing values (%)     prec   prece preci     rec     rece   reci
                          10             0,0005 0,0277 0,2603 0,0005 0,0311 0,2920
       Wine               20             0,0004 0,0246 0,2603 0,0004 0,0272 0,2883
                          30             0,0002 0,0194 0,2187 0,0002 0,0198 0,2238
                          10            6 × 10−5 0,0107 0,3775 4 × 10−6 0,0007 0,0257
    SPECT Heart           20            4 × 10−5 0,0063 0,3667 3 × 10−6 0,0005 0,0279
                          30            2 × 10−5 0,0039 0,3346 2 × 10−6 0,0003 0,0267
                          10             0,0011 0,0239 0,3969 0,0002 0,0032 0,0524
     Tic-tac-toe          20             0,0011 0,0132 0,3899 0,0001 0,0009 0,0258
                          30             0,0008 0,0139 0,3477 0,0001 0,0011 0,0282


comparisons of various techniques used in the pre-processing step. We will focus
on ﬁnding other measures. We suppose that if we modify measures to use the
similarity (of extents, intents and concepts) better results could be achieved.


6     Conclusion and future research directions

In this paper we have introduced framework for mining concepts from incom-
plete datasets. The proposed framework uses stochastic gradient descent matrix
factorization (SGD MF) in the pre-processing step and after completing the
missing values a formal concept analysis (FCA) is deployed. Using SGD MF
we are able to quite precisely predict values from a sparse data matrix without
having any background knowledge about the data. Two measures (precision and
recall) have been used for evaluating the presented framework.
    The experiments on three datasets (with 10, 20 and 30 percent of missing
values) showed that the percentage of missing values does not inﬂuence the
prediction rate, and, that there is a large diﬀerence in the precision and recall
measures when computed on the extents and the intents of the resulting concepts,
respectively.
    Our further research will focus on modifying the (precision and recall) mea-
sures with using the similarity of extents, intents and concepts instead of their
equality. The generation of all concepts from the huge incidence matrix is time
consuming and the number of concepts is very large. Using our matrix factoriza-
tion model we transform input matrix into more smaller latent factor matrices
W and H. These matrices describe objects and items according to latent factors.
We want to explore these matrices and to use them for reducing the number of
concepts.
    Even if there are some remaining issues to investigate, experimental results
show that the proposed approach is promising and worth of further research.

Acknowledgements: This work was supported by the grants VEGA 1/0832/12
and VVGS-PF-2012-22 at the Pavol Jozef Šafárik University in Košice, Slovakia.
44     Lenka Pisková, Štefan Pero, Tomáš Horváth and Stanislav Krajči


References
1. B. Ganter, R. Wille: Formal concept analysis: Mathematical foundations. Springer,
  1999, ISBN 3-540-62771-5.
2. P. Burmeister, R. Holzer: Treating Incomplete Knowledge in Formal Concept Anal-
  ysis. In: B. Ganter, G. Stumme, R. Wille (Eds.): Formal Concept Analysis: State of
  the Art, LNAI 3626, Springer, Heidelberg, 2005.
3. L. Ben Othman, S. Ben Yahia: Yet Another Approach for Completing Missing Val-
  ues. CLA 2006, pages 155-169.
4. L. Ben Othman, S. Ben Yahia: GBARM V C : Generic Basis of Association Rules
  Based Approach for Missing Values Completion. The International Journal of Com-
  puting and Information Sciences (IJCIS), 2008.
5. S. Ben Yahia, Kh. Arour, A. Jaoua: Completing Missing Values Using Discovered
  Formal Concepts. DEXA 2000, pages 741-751.
6. S. Ben Yahia, Kh. Arour.: Mining fuzzy formal concepts for completing missing
  values. In Proceedings of the Intl. Symposium on Innovation in Information and
  Communication Technology, Amman, Jordan, 2002.
7. M. Krupka, J. Lastovicka: Concept lattices of incomplete data. In: F. Domenach,
  D.I. Ignatov, and J. Poelmans (Eds.): ICFCA 2012, LNAI 7278. Springer, Heidelberg,
  pages 180-194, 2012.
8. J. Liu, X. Yao: Formal Concept Analysis of Incomplete Information System. In:
  FSKDIEEE (2010), pages 2016-2020.
9. P. Gajdoš, P. Moravec, V. Snášel: Concept Lattice Generation by Singular Value
  Decomposition, In: V. Snášel, R. Bělohlávek (Eds.): CLA 2004, pages 102-110.
10. V. Snášel, P. Gajdoš, H. Abdulla, M. Polovinčák: Using Matrix Decompositions in
  Formal Concept Analysis. ISIM 2007, pages 121-128.
11. V. Snášel, H. Abdulla, M. Polovinčák: Using Nonnegative Matrix Factorization
  and Concept Lattice Reduction to visualizing data. SYRCoDIS, 2007.
12. R. Bělohlávek, V. Vychodil: Factor Analysis of incidence data via Novel Decom-
  position of Matrices, In: S. Ferré and S. Rudolph (Eds.): ICFCA 2009, LNAI 5548,
  pages 83-97, 2009.
13. R. Bělohlávek, V. Vychodil: Discovery of Optimal Factors in binary data via
  novel method of matrix decomposition. Journal of Computer and System Sciences
  76(1)(2010), pages 3-20.
14. Y. Koren, R. Bell and C. Volinsky. Matrix Factorization Techniques for Recom-
  mender Systems. COMPUTER (Vol. 42, 8): IEEE Computer Society Press, Los
  Alamitos, USA, 2009.
15. G. Takács, I. Pilászy, B. Németh. Investigation of Various Matrix Factorization
  Methods for Large Recommender Systems. KDD Netﬂix workshop, 2008.
16. L. Bottou. Stochastic learning. In Bousquet, O. and von Luxburg, U., editors,
  Advanced Lectures on Machine Learning, Lecture Notes in Artiﬁcial Intelligence,
  LNAI 3176, pages 146-168. Springer Verlag, Berlin.
17. Z. Gantner, L. Drumond, C. Freudenthaler, S. Rendle, L. Schmidt-Thieme. Learn-
  ing attribute-to-feature mappings for cold-start recommendations. In Proceedings of
  the 10th IEEE International Conference on Data Mining (ICDM 2010). IEEE Com-
  puter Society, 2010.
18. T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning: Data
  Mining, Inference, and Prediction. Second Edition, Springer Series in Statistics, 2009.
19. J. Bergstra, Y. Bengio: Random Search for Hyper-Parameter Optimization. J.
  Machine Learning Research 13, pages 281-305, 2012.

</pre>