Geometrical approach for modeling semantics in linguistics © Milan Gudába1, Stanislav Horal1, Ladislav Izakovič1, Michaela Kalinová1 and Václav Snášel2 1 Department of informatics, FPV, University of Saint Cyril and Methodius, Nám. J. Herdu 2, 917 01, Trnava, Slovakia 2 Department of informatics, FEI, VŠB – Technical University of Ostrava, 17. listopadu 15, 708 33, Ostrava-Poruba, Czech republic gudaba@gmail.com, stanislav.horal@ucm.sk, izakovil@ucm.sk, michaela.kalinova@ucm.sk, vaclav.snasel@vsb.cz Abstract technologies. Among these data we can include especially text, image and acoustic documents. The set The information is at the present time often of text documents we will be consider as an input area. saved and available in electronic form. With Almost in all well-known information retrieval systems still increasing quantity of accessible, most occurs the morphological part. By the help of this part frequently text information, the need of we can, by using stop-list, remove non-semantic words organization of these data is raising. The from documents, and semantic significant words problem of fast and effective information convert to the basic form. In this way, we specify terms retrieval occurs very often. In this contribution which after evaluation make vector in the space of we describe the method for creating word concepts. This vector is then used for documents vector space and using NOT operation for identification from the point of view his content [16]. more effective acquirement of relevant The application of the method based on combination documents. of the vector model and Boolean logic appears as a suitable way for creating the model of natural language. In this contribution we present constructions in the vector space based on the standard linear algebra, and also some examples of using the vector negation for 1 Introduction separation meanings of ambiguous words. In quantum logic arbitrary sets are substituted by linear subspaces High volume of text documents and the rate of growth of the vector space and union, intersection and these documents requires finding new approaches in complement are substituted by the vector sum, linguistics and in areas related with information intersection and orthogonal complements of these retrieval (IR). These new approaches are based on subspaces. principles which are derived from natural sciences. A useful tool for information retrieving and Noticeable expansion of geometrics was motivated processing is latent semantic analysis – LSA. This by Descartes ideas and by establishment of the method, which is based on singular value coordinate system which permit interconnection decomposition (SVD), we can use for improving access between geometrics and algebra [12, 13]. to desirable documents. We regard the factor of greatest In IR was geometrical methods enveloped in the singular values k. LSA has geometrical representation, form of the vector model. Another meaningful step in in which objects (e.g. documents and terms) are geometrical understanding of the world was distributed in the low-dimensional space. As an interconnection of geometrics and logics. example we can use term-document matrix, in which This connection was made on the ground of the rows represent terms and columns of matrix represent quantum physics [9, 12]. documents. Nonzero values in matrix signify that A huge amount of multimedia data is at the present corresponding documents include required terms. This time coupled with expansion of information vector space model was described by Salton [10]. In the first part of our contribution is described n-dimensional subspace. This guarantees method representation of word meanings in the vector space. of least squares. Each word is then represented by using Second part is focused on operations in the word space. n most significant latent variables. This process is called Next chapter is devoted to basic knowledge from latent semantic analysis – LSA [6]. Especially for these Boolean logic. In the last part we mentioned theoretical purposes of determining semantic similarity between knowledge applied in the process of searching word words was made by Schütze one variant of LSA [11]. meanings. Instead of using documents as columns in matrix, there were used content-bearing words. Consequently, in our case is vector of the word koruna (crown) defined upon that, that it was frequently occurred by the words cena 2 Word meaning representation in vector (price) and mena (currency). This method is convenient space for semantic tasks, like is clustering words according to similar meaning and determination disambiguation of words [7, 8]. Vector space could be understood as a set of points, in which each point of the space is defined by the list of coordinates [5]. Two points can be accountable by adding theirs coordinates and each point can be 3 Logical operations in word space multiplied by the scalar (in this paper scalars are real numbers, therefore all our vector spaces are “real” vector spaces). The first linguistic examples of vector At investigation dependencies of words in the word spaces were developed for information retrieval [10]. space we can use logical operations, mainly primarily By accounting occurrences of each word in the each negation in relations of orthogonality and disjunction in document we get term-document matrix. Each couple i,j relations of the vector sum of subspaces. in matrix indicates number how many times was the word wi occurred in the document Dj. Then rows of the matrix can be understood as word-vectors. 3.1 Vector negation Dimensions of this vector space (number of coordinates We would like to model meaning of expression given to each word) are therefore equal to the number „koruna NOT klenot“ (koruna – crown as a coin, klenot of documents in collection. Document vectors are – crown as a jewel) in a such way, that system will be generated by calculating (weighted) sum of word- able to awake, that we are interested in finances, vectors of words occurred in given document. but not about meaning of the word koruna in the sense Similar techniques are used in information retrieval of the jewel. Therefore we need to find aspects of for determination similarity relation between words and meaning the word koruna which are different from the documents. Similarity can be determined by calculating word koruna as a jewel, and have no relation to this cosine of the angle between two vectors [10], where wi, word. Word meanings have not interrelationship if they di are coordinates of the vectors w and d, w ⋅ d is inner have not any common marks. Document is considered product w and d. ||w|| is length of the vector w [5]. as irrelevant for user if the inner product with user query is equal to zero, when query vector and document sim( w, d ) = ∑w d = w⋅d i i vector are orthogonal [5]. ∑w ∑d w d 2 i i 2 Definition 1. Two words a and b are considered as irrelevant to each other, if their vectors are orthogonal, The calculus is further simplified by normalization of i.e. a a b are irrelevant to each other, if a ⋅ b = 0 [15]. all vectors to the same length, consequently then the cosine similarity is equal with euclidean inner product. Definition 2. Let V is a vector space with inner This is standard method which avoids add great weight product. Then we could define for vector subspace A⊆V of consequence to frequent words or large documents. orthogonal subspace A┴ [15] Normalized vectors was used in all models and experiments described in this contribution. A┴ ≡ {v ∈V: ∀a∈ A, a · v = 0}. This structure can by used for determination similarities between pairs of words – two words will If A and B are subspaces of the space V, then NOT B have high similarity, if they will be situated in same represent B┴ and A NOT B represent projection A into documents and only seldom is one word occurred B┴. When a, b belongs to V, then a NOT b represent without another. Some words are combined into projection a into ┴, where is the subspace combined query statements by using the commutative {λb : λ∈R}. vector sum. These definitions can be used for realization The term-document matrices are typically very calculations with vectors in vector space. We apply sparse. Information could be concentrated in low standard method of projection. number of dimensions when we use singular values from decomposition, transformation each word into Theorem 1. Let a, b are subsets of V. Then a NOT b is ⊥ orthogonal complement A . So we have three represented by vector operations which we use in collection of L(V) and are a ⋅b defined as follows [2]: a NOT b = a − 2 b, b Conjunction A AND B = A ∩ B 2 where b = b ⋅ b is length of the vector b [15]. Disjunction A OR B = A + B ⊥ Negation NOT A = A Example: When we are doing inner product with b, then we obtain It is simple to prove, that these three operations on L(V) are sufficed to realize any necessary relations ( A + A = V , A ∩ A = 0 ) and to define logic on ⊥ ⊥   (a NOT b)⋅ b =  a − a ⋅ b ⋅ b  = a ⋅ b − (a ⋅ bb)(⋅ bb ⋅ b) = 0 2 L(V).  b  The important piece of knowledge is also that every subspace A ∈ L (V ) can be identified (by using scalar This proves that a NOT b and b are orthogonal, therefore vector a NOT b is certainly part of a, that is product) with special projective map PA : V → A irrelevant to b (Definition 1), as we required. and through this bijection is logic of subspaces L(V) equivalent to logic of projection mapping in the vector When we have normalized vectors, then Theorem 1 has space V. following form The quantum logic is distinguished from Boolean logic at least in two properties: quantum logic is not distributive as well as commutative. a NOT b = a − (a ⋅ b )b . The disjunction in set theory can be modeled as For the purpose of finding expressions or documents, union of sets, which corresponds in linear algebra which correspondent to a NOT b, is not important to the vector sum of subspaces, where A+B for each candidate from a as well as b consequently is the smallest subspace of V containing A as well as B. determine certain differences. Theorem 1 shows that finding similarity between another vector and a NOT b For determination of similarity between arbitrary is simple calculus of inner product. objects is necessary to define some function σ: D × D → R. These functions assign a real number to pair of objects oi, oj from their domain area D. This formula will be a measure of similarity relation 4 Quantum logic and vector space of objects, which must satisfy following requests: 1. σ (oi, oj) ≥ 0 With concept of quantum logic we met for the first time 2. σ (oi, oj) = σ (oj, oi) , i.e. remaining of symmetry in the theory of the quantum mechanic, which was 3. when oi = oj, than σ (oj, oi) = max σ (ok, ol); for ∀ ok, presented by Birkhoff and von Neumann (1936) [2]. ol ∈ D From the set theory is known, that if we have sets A and B and the element a ∈ A or a ∈ B , then also union of Definition 3. Let terms b1 ... bn ∈ V. Term b1 OR ... OR bn is represented by thesubspace [15] these sets C= A ∪ B will contain this element. But the quantum logic does not describe A and B as sets, but as subspaces of the vector space. B = {λ1b1 + K + λ n bn : λi ∈ R}. The structure of the quantum logic is simple and we can obtain it by substitution of sets and subspaces by The search of similarity relation between an individual vector spaces and subspaces [2]. Points in quantum term a and a general subspace B, is more complicated mechanics are represented as subspaces of the vector as a search of similarity relation between individual space V. In this connection we can consider the terms. collection of subspaces L(V) in the vector space V. The lower bound of A, B ∈ L (V ) is the biggest From the look of quantum physic, we can use PB element C ∈ L (V ) , where C ⊆ A and C ⊆ B , what to measure a probability that any element was found in some state [12]. The value a ⋅ PB (a ) is interpreted is exactly conjunction of A ∩ B . The upper bound of A and B is the smallest subspace D ∈ L(V ) , where as a measured probability. For our purposes we define probability with following relation A ⊆ D and B ⊆ D . These two operations give partially formed set L(V), which is structure of the sim(a, B ) = a ⋅ PB (a ) , lattice. If we work in the area of scalar product, we can define for each subspace A ∈ L(V ) its (special) where probability is given by scalar product a with context common with jewels. For testing the projection a to the subspace B, from which we calculate effectiveness of our operator of negation we will try to value of term a lying in subspace B. Problematic find less common meanings of chosen words which are similarity relation was searched from various looks, see related with prevailing expression. [1],[12]. Table 1. Example of the influence of the factor k on the In practice, if the set {bj} is orthonormal then it is relevance of documents. not correct only to calculate sim(a,bj) for every vector bj in order. For obtaining an orthonormal base {b~ } for j koruna subspace B is convenient firstly construct orthonormal k=2 k=8 k=15 base for B by in practice used Gram-Schmidt method D14 1 D13 0,773 D13 0,657 [5]. D15 0,999 D15 0,647 D23 0,563 ~ ~ ( PB (a ) = ∑ a ⋅ b j b j) D10 0,999 D14 0,643 D18 0,478 j D13 0,998 D12 0,637 D10 0,432 Consequently, we can write D24 0,967 D10 0,597 D15 0,413 ~ D28 0,96 D11 0,592 D14 0,384 ( sim(a, B ) = ∑ j a ⋅ b j . ) D25 0,958 D22 0,561 D11 0,349 For enumeration sim(a,B) we need to calculate D27 0,956 D24 0,363 D22 0,327 ~ a scalar product a with every vector b j . This similarity D11 0,952 D16 0,336 relation is more difficult to calculate as in Theorem 1. D26 0,946 D18 0,328 The result which we reached by comparing every document a NOT b using only one operation - scalar D22 0,935 D23 0,324 product, is the loss for disjunction, although how we D07 0,928 will show later, but is desirable for negated disjunction. D12 0,912 D08 0,908 5 Using the negation for search meanings D04 0,905 D16 0,902 In this part is presented an introductory example D03 0,885 of vector connections which demonstrate usage of vector negation and vector conjunction, vector D21 0,879 disjunction and negation together for finding vectors which represent different meanings of ambiguous D06 0,878 words. We describe shortly a document of obtained D05 0,85 experiment. It shows that vector negation has smart contribution contrary of classic Boolean method which D20 0,85 was described in paper [15]. D17 0,796 Our word space was constructed from 28 articles written in year 2006 which was obtained from the D18 0,751 Internet. The total number of acquired words was 5260. D23 0,716 The collection of articles was focused on economy, culture, sport, health and science and from every sphere D09 0,603 was processed at least two articles. Documents which D19 0,585 concern meanings of the word koruna (crown as coin) are marked as D13, D14, D15. On the other hand, documents related to koruna (crown as jewel) are marked D10, D11, D The present experiments represent a calculation Over data source was created parser which separated of similarity in relationship term-document for different individual words from the text. By using morphological values of k in area <2, 15>. Here we illustrate results for analyzer [4] was consequential constructed a list of factor value k=2, k=8 and k=15. terms. We assume that in articles occurred meanings of The data in Table 1 represent that vector negation ambiguous words. For example, word koruna (crown as is very effective for selection of relevant documents coin) is used more frequently in economic context as in which correspond to the required word koruna (crown) The vector negation and disjunction can be and left out the word klenot (jewel). combined with selection of some searched query from LSI regards k greatest singular values. The choice areas of documents. We do not negate only one of k have to be enough small for obtaining faster access argument, but several. If the user determines that he to documents, but enough great for adequate wants documents related with a but not with interception of structure of the corpus. b1,b2,...,bn, it will be interpreted (without next indication) that he wants only documents witch are not Table 2. Example of the influence of the operation related with unwanted terms bi. In this way, the next NOT and the influence of the factor k on the relevance expression of documents. a AND (NOT b1) AND (NOT b2) ... AND (NOT bn) koruna NOT klenot will pass into form k=2 k=8 k=15 a NOT (b1 OR b2 ... OR bn). D14 1 D13 0,806 D13 0,621 D13 0,999 D15 0,677 D23 0,57 By using Definition 3 we will form a disjunction b1 OR b2 ... OR bn as vector subspace B = {λ1b1 + ... D15 0,999 D14 0,671 D18 0,55 +λnbn, λi ∈ R}. This term can be transformed on definite D10 0,998 D22 0,601 D15 0,439 vector which is orthogonal to all irrelevant arguments {bj}. This vector will be a - PB(a), where PB is D24 0,96 D10 0,582 D10 0,41 projection on the subspace B as in Theorem 1. This D25 0,951 D18 0,386 D14 0,4 implies that calculus of the similarity between all vectors with term a NOT (b1 OR b2 ... OR bn) is the D27 0,95 D24 0,374 same as simple scalar product which has the same D28 0,95 D23 0,374 computing effectiveness as the Theorem 1. This technique is assigned to systematic reduction of D26 0,94 D16 0,353 irrelevant terms. D07 0,927 D20 0,328 D22 0,924 D08 0,905 6 Conclusion and future work D04 0,905 The negation in the vector space is suitable tool for dimension reduction of required documents. The D16 0,883 specification of searched documents can be realized by using several disjunction operations in query. Our word D06 0,881 space consisted of 5260 words. More relevant results D03 0,88 can be obtained by application this method for largest document database. D21 0,867 Actual experiments represent calculus of similarity D05 0,844 in the relation term-document. By construction of the vector space model we have represented individual D20 0,823 occurrences in documents through Boolean function, D17 0,768 i.e. we have expressed attendance if you like absence of the given term in the document. D18 0,732 Contemporary experiments will be in future carried out over lexical database of words and the word D23 0,696 connections in the WordNet, where are recorded D09 0,605 relative lexical and semantic relations between individual contained words or concepts. D19 0,542 In the next experiments we target our effort to calculation of similarity in relation term-term by various forms of representation the weight of individual We realized a decomposition of matrix A for term in the document. different values of k. The most relevant documents were obtained for value of factor k=15. On the contrary, for the very small k=2 is obtained great volume of documents. It reduces their relevance in regard to required document. References [1] Bělohlávek, R., Snášel, V.: Podobnost a její modelování, Znalosti 2004, p 309-316 [2] Birkhoff G., von Neumann J. The logic of quantum mechanics. Annals of Mathematics, 37, 823–843. 1936. [3] Gärdenfors P. Conceptual Spaces: The Geometry of Thougt. MIT Press 2004. pp 307. [4] Horal S., Kalinová M., Kostolanský E.: Morfologická analýza slovenčiny - Analýza flexných tvarov. Conference Proceedings Informatika 2005. Bratislava 2005. [5] Jänich K. Linear algebra. Undergraduate Texts in Mathematics. 1994, Springer-Verlag. [6] Landauer T., Dumais S. A solution to plato’s problem: The latent semantic analysis theory of acquisition. Psychological Review, 104(2), 211–240. 1997. [7] Moravec P., Pokorný J., Snášel V. Using BFA with WordNet Ontology Based Model for Web Retrieval. Yaoundé 27.11.2005-1.12.2005. In: CHBEIR, Richard, DIPANDA, Albert, YÉTONGNON, Kokou (ed.) SITIS 2005. Dijon : University of Bourgogne, 2005, p. 254-259. [8] Moravec P., Pokorný J., Snášel V. WordNet Ontology Based Model for Web Retrieval. IEEE WIRI 2005. Japan Tokyo, p. 220-225. [9] Pokorný J., Snášel V., Kopecký M. Dokumentografické informační systémy. Karolinum, Skriptum MFF UK Praha, 2005, pp 184. [10] Salton G., McGill M. Introduction to modern information retrieval. McGraw-Hill, 1983, New York, NY. [11] Schütze H. Automatic word sense discrimination. Computational Linguistics, 24(1), 97–124. 1998. [12] Tversky A.: Features of similarity. Psych. Rev. 84 (1977), 327–352. [13] Van Rijsbergen K. The Geometry of Information retrieval. Cambridge University Press. 2004, pp 150. [14] Widdows D. Geometry and Meaning. CLSI Lecture Notes, No. 172. Stanford, California, 2004, pp 320. [15] Widdows D., Peters S. Word Vectors and Quantum Logic Experiments with negation and disjunction, Appeared in Mathematics of Language 8, Indiana, June 2003, p. 141-154. Proceedings of Mathematics of Language 8, 2003 [16] Zee A.: Quantum Field Theory. Princeton University Press. 2003.