<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Concept Lattice Reduction by Singular Value Decomposition</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Vaclav</forename><surname>Snasel</surname></persName>
							<email>vaclav.snasel@vsb.cz</email>
							<affiliation key="aff0">
								<orgName type="institution">VSB Technical University Ostrava</orgName>
								<address>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Martin</forename><surname>Polovincak</surname></persName>
							<email>martin.polovincak.fei@vsb.cz</email>
							<affiliation key="aff0">
								<orgName type="institution">VSB Technical University Ostrava</orgName>
								<address>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hussam</forename><forename type="middle">M</forename><surname>Dahwa</surname></persName>
							<email>hussamdahwa@vsb.cz</email>
							<affiliation key="aff0">
								<orgName type="institution">VSB Technical University Ostrava</orgName>
								<address>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Concept Lattice Reduction by Singular Value Decomposition</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">742B88FB26A3E0451EF1A548300AF539</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T02:43+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>High complexity of lattice construction algorithms and uneasy way of visualising lattices are two important problems connected with the formal concept analysis. Algorithm complexity plays significant role when computing all concepts from a huge incidence matrix. In this paper we try to modify an incidence matrix using matrix decomposition, creating a new matrix with fewer dimensions as an input for some known algorithms for lattice construction. Results are presented by visualising neural network. Neural network is responsive for reducing result dimension to two dimensional space and we are able to present result as a picture that we are able to analyse.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>We are dealing with uses of matrix decompositions for reduction of concept lattice. These methods are well known in the area of information retrieval under the name Latent Semantic Indexing (LSI) or Latent Semantic Analysis (LSA). LSI and LSA have been used for discovery of latent dependencies between terms (or documents) <ref type="bibr" target="#b0">[1]</ref>, <ref type="bibr" target="#b5">[6]</ref>. We would like to apply this approach in the area of formal concept analysis (FCA). In this paper, we want to present decomposition's results with neural network <ref type="bibr" target="#b4">[5]</ref>. First, we will introduce basic terms of Formal Concept Analysis (FCA) <ref type="bibr" target="#b2">[3]</ref> and then we will describe the rank-k singe value decomposition (SVD) <ref type="bibr" target="#b7">[8]</ref>. Singular Value Decomposition is one of the various matrix decomposition techniques arising from numerical linear algebra. SVD reduces both the column space and the row space of the term-document matrix to lower dimensional spaces <ref type="bibr" target="#b3">[4]</ref>. After using SVD on input matrix we can build lattices and visualise them using neural networks <ref type="bibr" target="#b4">[5]</ref> and u-matrix <ref type="bibr" target="#b8">[9]</ref>. Numerous studies suggest that graphical representation and display of results can improve information retrieval performance <ref type="bibr" target="#b9">[10]</ref>. Once having visualisation we can analyze result of matrix reduction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Concept Lattice Reduction</head><p>In the following paragraphs we would like to introduce important basic terms of formal concept analysis, singu-Proceedings of the Spring Young Researcher's Colloquium on Database and Information Systems, Moscow, Russia, 2007 lar value decomposition, self organized maps and unified distrance matrix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Formal concept analysis</head><p>Formal Concept Analysis was first introduced by Rudolf Wille in 1980. FCA is based on the philosophical understanding of the world in terms of objects and attributes. It is assumed that a relation exists to connect objects to the attributes they possess. Formal context and formal concept are the fundamental notions of Formal Concept Analysis <ref type="bibr" target="#b2">[3]</ref>.</p><p>A formal context C = (G, M, I) consists of two sets; G and M , with I in relation to G and M . The elements of G are defined as objects and the elements of M are defined as attributes of the context. In order to express that an object g ∈ G is related to I with the attribute m ∈ M , we record it as gIm or (g, m) ∈ I and read it as the object g has the attribute m. I is also defined as the context incidence relation. The concept lattice B(G, M, I) is a complete lattice in which infimum and supremum are given by:</p><formula xml:id="formula_0">t∈T (A t , B t ) = t∈T A t , t∈T B t t∈T (A t , B t ) = t∈T A t , t∈T B t .</formula><p>We refer to <ref type="bibr" target="#b2">[3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Singular Value Decomposition</head><p>Singular value decomposition (SVD) is well known because of its application in information retrieval as LSI. SVD is especially suitable in its variant for sparse matrices <ref type="bibr" target="#b6">[7]</ref>. Now we have a decomposition of the original matrix A. It is not needed to say, that the left and right singular vectors are not sparse. We have at most r nonzero singular numbers, where rank r is the smaller of the two matrix dimensions. However, we would not conserve much memory by storing the term-by-document matrix this way. Luckily, because the singular values usually fall quickly, we can take only k greatest singular values and corresponding singular vector co-ordinates and create a k-reduced singular decomposition of A.</p><p>Let us have k, 0 &lt; k &lt; r and singular value decomposition of</p><formula xml:id="formula_1">A A = U ΣV T = (U k U 0 ) Σ k 0 0 Σ 0 V T k V T 0 We call A k = U k Σ k V T k a k-reduced singular value de- composition (rank-k SVD).</formula><p>In information retrieval, if every document is relevant to only one topic, we obtain a latent semantics -semantically related words and documents will have similar vectors in the reduced space. For an illustration of rank-k SVD see figure <ref type="figure" target="#fig_1">1</ref>, the grey areas determine first k coordinates from singular vectors, which are being used.</p><p>Theorem 2: (Eckart-Young) Among all m × n matrices C of rank at most k A k is the one, that minimises</p><formula xml:id="formula_2">||A k − A|| 2 F = i,j (A i,j − C w,j ) 2 .</formula><p>Because rank-k SVD is the best rank-k approximation of original matrix A, any other decomposition will increase the sum of squares of matrix A − A k .</p><p>The SVD is hard to compute and once computed, it reflects only the decomposition of the original matrix. The recalculation of SVD is expensive, so it is impossible to recalculate SVD every time new rows or columns are inserted. The SVD-Updating is a partial solution, but since the error increases slightly with inserted rows and columns when updates occur frequently, the recalculation of SVD may be needed.</p><p>Note: From now on, we will assume rank-k singular value decomposition when speaking about SVD.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Self-organizing maps</head><p>Kohonen Self-Organizing Map (SOM) <ref type="bibr" target="#b4">[5]</ref> is a competitive artificial neural network. They are used to clasify and cluster data set according to similarity <ref type="bibr" target="#b1">[2]</ref>. SOM artifical network is structured in two layers. The first one represents the input data, the second one is a neuron's grid, usually bidimensional, full connected. Each input is connected to all output neurons. Output neurons are arranged in low dimensional (usually 2D or 3D) grid. Attached to every neuron there is a weight vector with the same dimensionality as the input vectors. The number of input dimensions is usually a lot higher than the output grid dimension. SOMs are mainly used for dimensionality reduction rather than expansion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Unified distance matrix</head><p>The unified distance matrix (U-matrix) makes the 2D visualization of multi-variate data possible using SOM's codevectors as data source <ref type="bibr" target="#b8">[9]</ref>. This is achieved by using topological relations property among neurons after the learning process. U-matrix contains the distances from each unit center to all of its neighbours. By U-matrix we can detect topological relations among neurons and infer about the input data structure. High values in the Umatrix represent a frontier region between clusters, and low values represent a high degree of similarities among neurons on that region, clusters. This can be a visual task when we use some color schema.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Our Experiment</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Input data</head><p>Data from the Ostrava Mass Transit Authority was used in this experiment. We can describe data in a graph. Rows represent tram stops, columns represent tram lines. Let 1 represent a tram stopping at a tram stop in the appropriate row and column crossing, otherwise zero. This data was chosen because we can easily see if the SVD is working. Many tram lines share the same tram stops. These parts of tram lines should be minimized with SVD and it should also be visible after using SVD.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Steps of our Experiment</head><p>Our experiment included several steps. The first step was to read and transform data into adjacency matrix. After transforming and computing (using SVD) three new matrixes were obtained as results from SVD A = U ΣV T . After choosing rank-k, a new matrix was computed. Lattices were build from both original matrix and matrix computed by SVD. Concepts of lattices acted as the source for SOMs learning phase. Having learned SOMs, visualisation as done using U-matrix technique.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Reading and transforming data</head><p>Data was obtained from tram timetables. The first ten tram lines and their tram stops were chosen. The transformed matrix had 92 rows and 10 columns. Transformation was simple. If a tram stops at a particular tram stop, the number 1 appeared in the appropriate row and column crossing, otherwise zero. The matrix is also an </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Using SVD</head><p>SVD computations were made by SVDLIBC-fix software. This software is free and is written in C language. An incidence matrix acts as the input. Software can compute all three matrixes -U and Σ and V T . k-rank was chosen k = 4. Matrixes were U (92x10), Σ(10x10) and V T (10x10).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Visualising result</head><p>Data from the Ostrava Mass Transit Authority was used in this experiment. We can describe data in a graph. Rows represent tram stops, columns represent tram lines. Let ones represent a tram stopping at the tram stop in the appropriate row and column crossing, otherwise zero.</p><p>First SOM was learned by giving these initial data. Second SOM was learned by giving data after applying SVD on initial data. There were two SOM networks according to type of input data.</p><p>Next part of research was about visualising SOM network via unified distance matrix (U-matrix) algorithm. Euclidean metric was used for computing distances between nodes. We got two greyscale pictures (figure <ref type="figure">2</ref>, figure <ref type="figure">3</ref>) that we are able to analyze, especialy number and form of visible clusters.</p><p>There are two u-matrix visualisations. Figure <ref type="figure">2</ref> matches concepts of lattice built from the original matrix, figure <ref type="figure">3</ref> matches concepts of lattice built from the original matrix changed with SVD. The number of differencies between rows in tables was decreased by SVD computation. Thus you can see smaller number of clusters. There is no doubt method was successful in reducing number of concepts.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>For a set A ⊂ G of object we define A = {m ∈ M | gIm f or all g ∈ A} (the set of attributes common to the objects in A). Correspondingly, for a set B of attributes we define B = {g ∈ G | gIm f or all m ∈ B} (the set of objects which have all attributes in B). A formal concept of the context (G, M, I) is a pair (A, B) with A ⊆ G, B ⊆ M , A = B and B = A. We call A the extent and B the intent of the concept (A, B). B(G, M, I) denotes the set of all concepts of context (G, M, I).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: k-reduced singular value decomposition</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :Figure 3 :</head><label>23</label><figDesc>Figure 2: Original data</figDesc><graphic coords="3,87.30,57.45,180.00,126.78" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Understanding search engines</title>
		<author>
			<persName><forename type="first">M</forename><surname>Berry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Browne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Mathematical Modeling and Text Retrieval</title>
				<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Discovering ontological semantics using fca and som</title>
		<author>
			<persName><forename type="first">C</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Lee</forename><forename type="middle">C</forename><surname>Kiu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">M2USIC</title>
		<imprint>
			<date type="published" when="2004">2004. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Formal Concept Analysis</title>
		<author>
			<persName><forename type="first">B</forename><surname>Ganter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wille</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
			<publisher>Springer-Verlag</publisher>
			<pubPlace>Berlin Heidelberg</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Complexity reduction in lattice-based information retrieval</title>
		<author>
			<persName><forename type="first">Douglas</forename><surname>Vogel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Karen</forename><forename type="middle">S K</forename><surname>Cheung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">285299</biblScope>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Self-organizing maps</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kohonen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Computation methods for intelligent information access</title>
		<author>
			<persName><forename type="first">T</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Letsche</forename><forename type="middle">M</forename><surname>Berry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dumais</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1995 ACM/IEEE Supercomputing Conference</title>
				<meeting>the 1995 ACM/IEEE Supercomputing Conference</meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Computation methods for intelligent information access</title>
		<author>
			<persName><forename type="first">T</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Letsche</forename><forename type="middle">M</forename><surname>Berry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dumais</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1995 ACM/IEEE Supercomputing Conference</title>
				<meeting>the 1995 ACM/IEEE Supercomputing Conference</meeting>
		<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Concept lattice generation by singular value decomposition</title>
		<author>
			<persName><forename type="first">P</forename><surname>Moravec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gajdos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">CLA</title>
		<imprint>
			<biblScope unit="page" from="13" to="22" />
			<date type="published" when="2004">2004. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Som-based data visualization methods</title>
		<author>
			<persName><forename type="first">J</forename><surname>Vesanto</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="6" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Map displays for information retrieval</title>
		<author>
			<persName><forename type="first">X</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Society of Information Science</title>
		<imprint>
			<biblScope unit="page" from="40" to="54" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
