=Paper= {{Paper |id=Vol-256/paper-11 |storemode=property |title=Concept Lattice Reduction by Singular Value Decomposition |pdfUrl=https://ceur-ws.org/Vol-256/submission_18.pdf |volume=Vol-256 |dblpUrl=https://dblp.org/rec/conf/syrcodis/SnaselPA07 }} ==Concept Lattice Reduction by Singular Value Decomposition== https://ceur-ws.org/Vol-256/submission_18.pdf
    Concept Lattice Reduction by Singular Value Decomposition

                  c Vaclav Snasel               Martin Polovincak                  Hussam M. Dahwa
                                 VSB Technical University Ostrava, Czech Republic
                            {vaclav.snasel, martin.polovincak.fei, hussamdahwa}@vsb.cz



                       Abstract                              lar value decomposition, self organized maps and unified
                                                             distrance matrix.
    High complexity of lattice construction algo-
    rithms and uneasy way of visualising lattices            2.1   Formal concept analysis
    are two important problems connected with the            Formal Concept Analysis was first introduced by Rudolf
    formal concept analysis. Algorithm complexity            Wille in 1980. FCA is based on the philosophical under-
    plays significant role when computing all con-           standing of the world in terms of objects and attributes.
    cepts from a huge incidence matrix. In this pa-          It is assumed that a relation exists to connect objects to
    per we try to modify an incidence matrix us-             the attributes they possess. Formal context and formal
    ing matrix decomposition, creating a new ma-             concept are the fundamental notions of Formal Concept
    trix with fewer dimensions as an input for some          Analysis [3].
    known algorithms for lattice construction. Re-               A formal context C = (G, M, I) consists of two sets;
    sults are presented by visualising neural net-           G and M , with I in relation to G and M . The elements
    work. Neural network is responsive for reduc-            of G are defined as objects and the elements of M are
    ing result dimension to two dimensional space            defined as attributes of the context. In order to express
    and we are able to present result as a picture that      that an object g ∈ G is related to I with the attribute
    we are able to analyse.                                  m ∈ M , we record it as gIm or (g, m) ∈ I and read it
                                                             as the object g has the attribute m. I is also defined as the
1   Introduction                                             context incidence relation.
                                                                 For a set A ⊂ G of object we define
We are dealing with uses of matrix decompositions for
                                                                        0
reduction of concept lattice. These methods are well                  A = {m ∈ M | gIm f or all g ∈ A}
known in the area of information retrieval under the
name Latent Semantic Indexing (LSI) or Latent Seman-         (the set of attributes common to the objects in A). Cor-
tic Analysis (LSA). LSI and LSA have been used for dis-      respondingly, for a set B of attributes we define
covery of latent dependencies between terms (or docu-                      0
ments) [1], [6]. We would like to apply this approach                  B = {g ∈ G | gIm f or all m ∈ B}
in the area of formal concept analysis (FCA). In this pa-
per, we want to present decomposition’s results with neu-    (the set of objects which have all attributes in B).
ral network [5]. First, we will introduce basic terms of        A formal concept of the context (G, M, I) is a pair
                                                                                                 0                0
Formal Concept Analysis (FCA) [3] and then we will           (A, B) with A ⊆ G, B ⊆ M , A = B and B = A.
describe the rank-k singe value decomposition (SVD)          We call A the extent and B the intent of the concept
[8]. Singular Value Decomposition is one of the various      (A, B). B(G, M, I) denotes the set of all concepts of
matrix decomposition techniques arising from numeri-         context (G, M, I).
cal linear algebra. SVD reduces both the column space           The concept lattice B(G, M, I) is a complete lattice
and the row space of the term-document matrix to lower       in which infimum and supremum are given by:
dimensional spaces [4]. After using SVD on input ma-                                          \            [         00 
trix we can build lattices and visualise them using neural            ^
                                                                               (At , Bt ) =          At ,         Bt
networks[5] and u-matrix [9]. Numerous studies suggest
                                                                      t∈T                      t∈T          t∈T
that graphical representation and display of results can
improve information retrieval performance [10]. Once                  _                        [      00 \ 
having visualisation we can analyze result of matrix re-                   (At , Bt ) =               At ,   Bt .
duction.                                                             t∈T                        t∈T           t∈T

                                                             We refer to [3].
2   Concept Lattice Reduction
                                                             2.2   Singular Value Decomposition
In the following paragraphs we would like to introduce
important basic terms of formal concept analysis, singu-     Singular value decomposition (SVD) is well known be-
                                                             cause of its application in information retrieval as LSI.
Proceedings of the Spring Young Researcher’s Colloquium      SVD is especially suitable in its variant for sparse matri-
on Database and Information Systems, Moscow, Russia, 2007    ces [7].
                          k                                            2.3   Self-organizing maps
                                               k
                                                         k             Kohonen Self-Organizing Map (SOM) [5] is a compet-
                                      k
                                                                       itive artificial neural network. They are used to clasify
      Ak          =        Uk             Σk                     T
                                                                Vk     and cluster data set according to similarity [2]. SOM ar-
                                                                       tifical network is structured in two layers. The first one
                                                                       represents the input data, the second one is a neuron’s
     mxn                 mxk             kxk                   kxn     grid, usually bidimensional, full connected. Each input
                                                                       is connected to all output neurons. Output neurons are
    Figure 1: k-reduced singular value decomposition                   arranged in low dimensional (usually 2D or 3D) grid. At-
                                                                       tached to every neuron there is a weight vector with the
                                               d1 0 . . . 0            same
                                                                        yT1 dimensionality as the input vectors. The number of
    Theorem 1: Let A is an m n rank-r          0√ dmatrix. . . . 0Be   input
                                                                        yT2 dimensions is usually a lot higher than the output
                                                       2 T
          A
σ1 ≥ · · · ≥ σr=eigenvalues       of  a matrix      AA       .  Then   grid  dimension. SOMs are mainly used for dimensional-
              k         x1 x2 . . . xk =                             = ity reduction
there exist orthogonal       matrices U = (u1 , . . . , ur ) and                       rather than expansion.
V = (v1 , . . . , vr ), whose column vectors are orthonor-              ykT

mal, and a diagonal matrix Σ = diag(σ0              0 . . . dk
                                               1 , . . . , σr ). The   2.4 Unified distance matrix
decomposition A = U ΣV T is called singular value de-
composition of matrix A and numbers σ1 , . . . , σr are sin-           The unified distance matrix (U-matrix) makes the 2D vi-
gular values of the matrix A. Columns of U (or V ) are                 sualization of multi-variate data possible using SOM’s
called left (or right) singular vectors of matrix A.                   codevectors as data source [9]. This is achieved by using
                                                                       topological relations property among neurons after the
    Now we have a decomposition of the original matrix
                                                                       learning process. U-matrix contains the distances from
A. It is not needed to say, that the left and right singular
                                                                       each unit center to all of its neighbours. By U-matrix we
vectors are not sparse. We have at most r nonzero sin-
                                                                       can detect topological relations among neurons and in-
gular numbers, where rank r is the smaller of the two
                                                                       fer about the input data structure. High values in the U-
matrix dimensions. However, we would not conserve
                                                                       matrix represent a frontier region between clusters, and
much memory by storing the term-by-document matrix
                                                                       low values represent a high degree of similarities among
this way. Luckily, because the singular values usually
                                                                       neurons on that region, clusters. This can be a visual task
fall quickly, we can take only k greatest singular values
                                                                       when we use some color schema.
and corresponding singular vector co-ordinates and cre-
ate a k-reduced singular decomposition of A.
    Let us have k, 0 < k < r and singular value decom-                 3 Our Experiment
position of A                                                          3.1 Input data
                                                                   Data from the Ostrava Mass Transit Authority was used
                T                         Σk       0         VkT
    A = U ΣV         = (Uk U0 )                                        in this experiment. We can describe data in a graph.
                                          0        Σ0        V0T
                                                                       Rows represent tram stops, columns represent tram lines.
                                                                       Let 1 represent a tram stopping at a tram stop in the ap-
We call Ak = Uk Σk VkT a k-reduced singular value de-                  propriate row and column crossing, otherwise zero. This
composition (rank-k SVD).                                              data was chosen because we can easily see if the SVD
   In information retrieval, if every document is relevant             is working. Many tram lines share the same tram stops.
to only one topic, we obtain a latent semantics - semanti-             These parts of tram lines should be minimized with SVD
cally related words and documents will have similar vec-               and it should also be visible after using SVD.
tors in the reduced space. For an illustration of rank-k
SVD see figure 1, the grey areas determine first k coor-               3.2   Steps of our Experiment
dinates from singular vectors, which are being used.
                                                                       Our experiment included several steps. The first step was
   Theorem 2: (Eckart-Young) Among all m × n matri-
                                                                       to read and transform data into adjacency matrix. After
ces C of rank atPmost k Ak is the one, that minimises
                                                                       transforming and computing (using SVD) three new ma-
||Ak − A||2F = (Ai,j − Cw,j )2 .
                    i,j                                                trixes were obtained as results from SVD A = U ΣV T .
    Because rank-k SVD is the best rank-k approxima-                   After choosing rank-k, a new matrix was computed. Lat-
tion of original matrix A, any other decomposition will                tices were build from both original matrix and matrix
increase the sum of squares of matrix A − Ak .                         computed by SVD. Concepts of lattices acted as the
                                                                       source for SOMs learning phase. Having learned SOMs,
    The SVD is hard to compute and once computed, it                   visualisation as done using U-matrix technique.
reflects only the decomposition of the original matrix.
The recalculation of SVD is expensive, so it is impossi-
ble to recalculate SVD every time new rows or columns                  3.3   Reading and transforming data
are inserted. The SVD-Updating is a partial solution, but              Data was obtained from tram timetables. The first ten
since the error increases slightly with inserted rows and              tram lines and their tram stops were chosen. The trans-
columns when updates occur frequently, the recalcula-                  formed matrix had 92 rows and 10 columns. Transfor-
tion of SVD may be needed.                                             mation was simple. If a tram stops at a particular tram
    Note: From now on, we will assume rank-k singular                  stop, the number 1 appeared in the appropriate row and
value decomposition when speaking about SVD.                           column crossing, otherwise zero. The matrix is also an
                                                             4   Conclusion
                                                             There are two u-matrix visualisations. Figure 2 matches
                                                             concepts of lattice built from the original matrix, figure 3
                                                             matches concepts of lattice built from the original matrix
                                                             changed with SVD. The number of differencies between
                                                             rows in tables was decreased by SVD computation. Thus
                                                             you can see smaller number of clusters. There is no doubt
                                                             method was successful in reducing number of concepts.


                                                             References
                 Figure 2: Original data                      [1] M. Berry and M. Browne. Understanding search
                                                                  engines. Mathematical Modeling and Text Re-
                                                                  trieval, 1999.
                                                              [2] C. Lee C. Kiu. Discovering ontological semantics
                                                                  using fca and som. M2USIC 2004, 2004.
                                                              [3] B. Ganter and R. Wille. Formal Concept Analysis.
                                                                  Springer-Verlag Berlin Heidelberg, 1999.
                                                              [4] Douglas Vogel Karen S.K. Cheung. Complexity re-
                                                                  duction in lattice-based information retrieval. Infor-
                                                                  mation Retrieval, 8, page 285299, 2005.
                                                              [5] T. Kohonen. Self-organizing maps. Springer, 2001.
                                                              [6] T.Letsche M. Berry, S. Dumais. Computation meth-
                                                                  ods for intelligent information access. Proceed-
            Figure 3: Data after using SVD
                                                                  ings of the 1995 ACM/IEEE Supercomputing Con-
                                                                  ference, 1995.
incidence matrix and was used in SVD, NMF and FCA             [7] T.Letsche M. Berry, S. Dumais. Computation meth-
computations.                                                     ods for intelligent information access. Proceed-
                                                                  ings of the 1995 ACM/IEEE Supercomputing Con-
3.4   Using SVD                                                   ference, 1995.
                                                              [8] P. Moravec P. Gajdos. Concept lattice generation
SVD computations were made by SVDLIBC-fix soft-
                                                                  by singular value decomposition. CLA 2004, pages
ware. This software is free and is written in C language.
                                                                  13–22, 2004.
An incidence matrix acts as the input. Software can com-
pute all three matrixes - U and Σ and V T . k-rank was        [9] J. Vesanto. Som-based data visualization methods.
chosen k = 4. Matrixes were U (92x10), Σ(10x10) and               pages 6–9, 1999.
V T (10x10).
                                                             [10] Lin X. Map displays for information retrieval.
                                                                  Journal of the American Society of Information Sci-
3.5   Visualising result                                          ence, pages 40–54, 1997.
Data from the Ostrava Mass Transit Authority was used
in this experiment. We can describe data in a graph.
Rows represent tram stops, columns represent tram lines.
Let ones represent a tram stopping at the tram stop in the
appropriate row and column crossing, otherwise zero.
   First SOM was learned by giving these initial data.
Second SOM was learned by giving data after applying
SVD on initial data. There were two SOM networks ac-
cording to type of input data.
   Next part of research was about visualising SOM net-
work via unified distance matrix (U-matrix) algorithm.
Euclidean metric was used for computing distances be-
tween nodes. We got two greyscale pictures (figure 2,
figure 3) that we are able to analyze, especialy number
and form of visible clusters.