=Paper=
{{Paper
|id=Vol-252/paper-14
|storemode=property
|title=Using Matrix Decompositions in Formal Concept Analysis
|pdfUrl=https://ceur-ws.org/Vol-252/paper14.pdf
|volume=Vol-252
|dblpUrl=https://dblp.org/rec/conf/isim/SnaselGAP07
}}
==Using Matrix Decompositions in Formal Concept Analysis==
Using Matrix Decompositions in Formal Concept
Analysis
Vaclav Snasel1, Petr Gajdos1, Hussam M. Dahwa Abdulla1, Martin Polovincak1
1
Dept. of Computer Science, Faculty of Electrical Engineering and Computer Science,
VŠB-Technical University of Ostrava, 17. listopadu 15, Ostrava, 708 33, Czech Republic
{vaclav.snasel, petr.gajdos, hussamdahwa, martin.polovincak.fei}@vsb.cz
Abstract. One of the main problems connected with the formal concept
analysis and lattice construction is the high complexity of algorithms which
plays a significant role when computing all concepts from a huge incidence
matrix. In some cases, we only need to compute some of them to test for
common attributes. In our research we try to modify an incidence matrix using
matrix decompositions, creating a new matrix with fewer dimensions as an
input for some known algorithms for lattice construction. In this paper, we want
to describe methods of matrix decompositions and their influence on the
concept lattice.
Keywords: Formal Concept Analysis, Singular Value Decomposition, Semi
Discrete Decomposition
1 Introduction
We are dealing with possible uses of matrix decompositions for reduction of
concept lattice. These methods are well known in the area of information retrieval
under the name Latent Semantic Indexing (LSI) or Latent Semantic Analysis (LSA)
[2], [3]. LSI and LSA have been used for discovery of latent dependencies between
terms (or documents). We would like to apply this approach in the area of formal
concept analysis (FCA) [5]. First, we will introduce basic terms of Formal Concept
Analysis (FCA) and then we will describe the rank-k SVD decomposition and the LSI
method. Because the SVD decomposition presents some difficulty connected with the
setting of its parameters, we will introduce a different approach based on semi-
discrete lattice decomposition.
Small examples will illustrate our approaches. We hope that this is the way to use
FCA on large data collections and visualize a data structure via concept lattice. Note
that usage of this method depends on concrete application, because it makes some
noise in the lattice structure to make it simpler.
122 V. Snasel, P. Gajdos, H. M. Dahwa Abdulla and M. Polovincak
2 Background
In the following paragraphs we would like to introduce important basic terms of
formal concept analysis, singular value decomposition and semi-discrete value
decomposition. Pros and cons of matrix decompositions will be discussed later.
2.1 Formal Concept Analysis
Definition 1. [9] A formal context C = (G, M , I ) consists of two sets; G and M, I is a
subset of GxM. The elements of G are defined as objects and the elements of M are
defined as attributes of the context. In order to express that an object g ∈ G is
related to I with the attribute m ∈ M , we record it as gIm or (g,m ) ∈ I and read it as
“the object g has the attribute m”. I is also defined as the context incidence relation.
Definition 2. [9] For A ⊂ G as set of objects we use the definition
A′ = {m ∈ M gIm for all g ∈ A} (the set of attributes common to the objects in A).
Correspondingly, for B attributes we use the definition
B′ = {g ∈ G gIm for all m ∈ B} (the set of objects which have all attributes in B).
Definition 3. [9] A formal concept of the context (G, M, I) is the pair (A, B)
with A ⊆ G , B ⊆ M , and B′ = A . We call A the extent and B the intent of the concept
(A, B). β(G, M, I) denotes the set of all concepts of context (G, M, I).
Definition 4. [9] The concept lattice β (G,M, I) is a complete lattice in which infimum
and supremum are given by
I ( At , Bt ) = (I At , (U Bt )" )
t∈T t∈T t∈T
U ( At , Bt ) = ((U At )" , I Bt )
t∈T t∈T t∈T
2.2 Singular Value Decomposition
Singular value decomposition (SVD) is well known because of its application in
information retrieval as LSI. SVD is especially suitable in its variant for sparse
matrices. [3]
Theorem 1. (Singular Value Decomposition) Let A be an m × n rank-r matrix. Be
σ 1 Kσ r eigenvalues of a matrix AA T . There exist orthogonal matrices
U = (u1 , . . . , u r ) and V = (v1 , . . . , v r ) , whose column vectors are orthonormal,
and diagonal matrix ∑ = diag (σ 1 , ..., σ r ) . The decomposition A = UΣV T is referred to
as a singular value decomposition of matrix A and numbers σ 1 Kσ r are singular
Using Matrix Decompositions in Formal Concept Analysis 123
values of the matrix A. Columns of U (or V) are referred to as left (or right) singular
vectors of matrix A.
Now we have a decomposition of the original matrix A. Needless to say, the left
and right singular vectors are not sparse. We have at most r nonzero singular
numbers, where rank-r is the min(m,n). Because the singular values usually decrease
quickly, we only need to take k greatest singular values and corresponding singular
vector coordinates and create a k-reduced singular decomposition of matrix A.
Definition 5. Let us have k, 0 < k < r and singular value decomposition of A
⎛Σ 0 ⎞⎛VKT ⎞
A = UΣV = (U K U 0 )⎜ K
T
⎟⎜ ⎟
⎝ 0 Σ 0 ⎠⎝V0T ⎠
A = U k Σ k VkT is referred to as a k-reduced singular value decomposition (k-rank SVD).
In information retrieval, if every document is relevant to only one topic, we obtain
a latent semantics – semantically related words and documents will have similar
vectors in the reduced space. For an illustration of rank-k SVD see < 1, the grey areas
determine first k coordinates from singular vectors, which are being used.
Fig. 1. k-reduced singular value decomposition
Theorem 2. (Eckart-Young). Among all m×n matrices C of rank at most k, Ak is the
one that minimizes || Ak − A ||2F = ∑ (Ai, j − C w, j ) 2 .
i, j
Because rank-k SVD is the best rank-k approximation of original matrix A, any
other decomposition will increase the sum of squares of matrix A − Ak.
The SVD is hard to compute and once computed, it reflects only the decomposition
of the original matrix. The recalculation of SVD is expensive, so it is impossible to
recalculate SVD every time new rows or columns are inserted. The SVD-Updating is
a partial solution, but since the error increases slightly with inserted rows and
columns when updates occur frequently, the recalculation of SVD may be needed.
Note: From now on, we will assume rank-k singular value decomposition when
speaking about SVD.
2.2 Semi Discrete Matrix Decomposition (SDD)
We have used another alternative decomposition in our research, semi discrete
decomposition (SDD). For equal data set and incidence matrix, the SDD does as well
124 V. Snasel, P. Gajdos, H. M. Dahwa Abdulla and M. Polovincak
as the SVD and uses less than one-tenth the storage space. In this section, we briefly
overview SDD and then we will describe its purpose.
Semi discrete decomposition approximates a matrix as a weighted sum of outer
product formed by vectors with entries constrained to be in the set S = {−1, 0, 1}.
O’Leary and Peleg introduced the SDD in the context of image compression [8], and
Kolda and O’Leary used the SDD for latent semantic indexing LSI in information
retrieval [6], [7].
The primary advantage of the SDD over other types of matrix decompositions such
as the truncated SVD is that it typically provides a more accurate approximation for
far less storage. An SDD of an m × n matrix A as a decomposition is shown in
figure 2. Here each xi is an m − vector with entries from the set S = {−1, 0, 1}, each yi
⎡ d1 0 ... 0 ⎤ ⎡Y1T ⎤
⎢0 d ⎢ ⎥
... 0 ⎥⎥ ⎢Y2T ⎥
Ak = [ x1 x2 ...xk ] ⎢ 2
⎢ ... ... ... ... ⎥ ⎢ ... ⎥
⎢ ⎥⎢ ⎥
⎣⎢ 0 0 ... d k ⎥⎦ ⎢⎣YkT ⎥⎦
Fig. 2. k-reduced semi discrete decomposition
is an n−vector with entries from the set S, and each di is a positive scalar. We call this
k − term SDD.
Note: From now on, we will assume rank-k semi discrete decomposition when
speaking about SDD.
2.4 Nonnegative Matrix Factorization (NMF)
Nonnegative matrix factorization differs from other rank reduction methods for
vector space models in text mining, e.g. principal component analysis (PCA) or vector
quantization (VQ) due to use of constraints that produce nonnegative basis vectors,
which make possible the concept of a parts-based representation.
Lee and Seung [11] first introduced the notion of parts-based representations for
problems in image analysis or text mining that occupies nonnegative subspaces in a
vector-space model. Techniques like PCA and VQ also generate basis vectors various
additive and subtractive combinations of which can be used to reconstruct the original
space. But the basis vectors for PCA and VQ contain negative entries and cannot be
directly related to the original vector space to derive meaningful interpretations. In the
case of NMF, the basis vectors contain no negative entries, allowing only additive
combinations of the vectors to reproduce the original. So the perception of the whole,
be it an image or a document in a collection, becomes a combination of its parts
represented by these basis vectors. In text mining, the vectors represent or identify
semantic features, i.e. a set of words denoting a particular concept or topic. If a
document is viewed as a combination of basis vectors, then it can be categorized as
belonging to the topic represented by its principal vector. Thus, NMF can be used to
Using Matrix Decompositions in Formal Concept Analysis 125
organize text collections into partitioned structures or clusters directly derived from
the nonnegative factors [10].
3 Our Experiment
3.1 Input Data
Data from the Ostrava Mass Transit Authority was used in this experiment. We can
describe data in a graph. Rows represent tram stops, columns represent tram lines. Let
1 represent a tram stopping at a tram stop in the appropriate row and column crossing,
otherwise zero. This data was chosen because we can easily see if the SVD is
working. Many tram lines share the same tram stops. These parts of tram lines should
be minimized with SVD and it should also be visible in the lattice after using SVD.
Fig. 3. Original lattice
3.2 Steps of Our Experiment
Our experiment included several steps. The first step was to read and transform
data into adjacency matrix. After transforming and computing (using SVD) three new
matrixes were obtained as results from SVD A = U Σ V T . After choosing rank-k, a
new matrix was computed.
3.3 Reading and Transforming Data
Data was obtained from tram timetables. The first ten tram lines and their tram
stops were chosen. The transformed matrix had 92 rows and 10 columns.
Transformation was simple. If a tram stops at a particular tram stop, the number 1
appeared in the appropriate row and column crossing, otherwise zero. The matrix is
also an incidence matrix and was used in SVD, NMF and FCA computations.
126 V. Snasel, P. Gajdos, H. M. Dahwa Abdulla and M. Polovincak
3.4 Using SVD
SVD computations were made by SVDLIBC-fix software. This software is free
and is written in C language. An incidence matrix acts as the input. Software can
compute all three matrixes - U and Σ and VT. k-rank was chosen k = 4, that is mean,
matrixes will be U(92x10), Σ(10x10) and VT(10x10).
3.5 Computing Concept Lattices
Both incidence matrix and product of SVD computing were used to compute
concept lattice. Both matrixes passed as input for Concept Lattice software developed
at VSB–TU Ostrava. The Concept Lattice software produced a concept lattice. The
view of these concept lattices were completed with GraphPlace software.
GraphPlace’s output is a Postscript file that can be viewed in any postscript viewer.
Concept lattice was computed from the output matrix. Man can see the difference
between the concept lattice computed from the original matrix and the concept lattice
computed from the output matrix using SVD.
Fig. 4. Lattice after using SVD
43
e
10
e,g,h
11 32
21
25 41
42
Fig. 5. Example 1
Example 1. Node 11 (figure 5): Node 11 has attributes (e, g, h) in line 69 of the
original matrix. Attribute (e) disappears from line 69 after SVD. Attribute (e) is a
shared attribute and exists only in this node at this level. When an attribute is lost the
connection between node 11 and node 10 is lost too. The other attributes in node 11
Using Matrix Decompositions in Formal Concept Analysis 127
(g, h) appear in the other nodes at the same level. For that node 11 is lost with node
10, which has lost all connections with other nodes.
Example 3. Node 34 (figure 7): Node 34 has attributes (a, i) in lines (8, 9, 11, 16, 17,
18, 19, 20, 21). After the SVD attribute (i) from lines (16, 17, 18, 19, 20, 21) is lost
and only attribute (a) remains. Attribute (a) appears in other nodes at the same level
(node 38), and lines (8, 9, 11) appear in other nodes at the same level (node 38).
43
6
35
i
1 f,i
a,i 8
38
34
a,b,f,i
39
41
42
Fig. 6. Example 3
Also, node 8 has attributes (f, i) in lines (8,9,11,79), after SVD we lose the attribute
(i) from line (79) and only attribute (f) remains, that appears in other node at the same
level (node 38), and lines (8,9,11) appear in other node at the same level (node 38).
For these nodes 34 and 8 are after SVD lost. Loosing nodes (34 and 8) means also
loosing connection with node 1 who has attribute (i), and we loose also node 1.
5. Conclusion
Every node in the concept has a distinct set of attributes. That means that two nodes
can not be found with the same set of attributes. Experiment consists in finding
changes of objects and their set of attributes after using SVD. SVD reduces some
attributes from the original matrix because these attributes do not appear in many
objects. Attributes are not primary and they can be lost without any problem.
References
[1] 1. G. Battista, P. Eades, R. Tamassia, and I. Tollis, “Algorithms for drawing
graphs, an annotated bibiliography”, Computational Geometry, Theory and
Applications, vol. 4, 1994, pp. 235–282.
[2] M. Berry and M. Browne. “Understanding Search Engines”, Mathematical
Modeling and Text Retrieval, Siam, 1999.
128 V. Snasel, P. Gajdos, H. M. Dahwa Abdulla and M. Polovincak
[3] M. Berry, S. Dumais, and T.Letsche. “Computation Methods for Intelligent
Information Access”, Proceedings of the 1995 ACM/IEEE Supercomputing
Conference, 1995.
[4] P.Gajdoš and P.Moravec, “Concept lattice generation by singular value
decomposition”, CLA 2004, 2004, pp. 13–22.
[5] B. Ganter and R. Wille, Formal Concept Analysis. Springer-Verlag Berlin
Heidelberg, 1999.
[6] T. G. Kolda and D. P. O’Leary, Computation and uses of the semidiscrete
matrix decomposition, Technical Report CS-TR-4012, Oak Ridge National
Laboratory, 1999, pp. 8-16.
[7] T.G.Kolda and D.P.O’Leary, “A semidiscrete matrix decomposition for latent
semantic indexing information retrieval”, ACM Transactions on Information
Systems 16(4), 1998, pp. 322–346.
[8] D.P.O’Leary and S.Peleg, “Digital image compression by outer product
expansion”, IEEE Transactions on Communications 31, 1983, pp. 441–444.
[9] R. Wille, Lattices in data analysis: How to draw them with a computer,
Algorithms and Order, 1989, pp. 33–58.
[10] Farial Shahnaz, Michael W. Berry, V. Paul Pauca, Robert J. Plemmons,
Document clustering using nonnegative matrix factorization. Inf. Process.
Manage. 42(2): 373-386 2006.
[11] Lee, D., Seung, “H. Algorithms for non-negative matrix factorization”,
Advances in Neural Information Processing Systems, 2001, pp. 556-562.
[12] G.Young and C.Eckart, The approximation of one matrix by another of lower
rank, Psychometrika 1, 1936, pp. 211–218.