=Paper=
{{Paper
|id=Vol-256/paper-14
|storemode=property
|title=Geometrical approach for modeling semantics in linguistics
|pdfUrl=https://ceur-ws.org/Vol-256/submission_5.pdf
|volume=Vol-256
|dblpUrl=https://dblp.org/rec/conf/syrcodis/GudabaHIKS07
}}
==Geometrical approach for modeling semantics in linguistics==
Geometrical approach for modeling semantics in linguistics
© Milan Gudába1, Stanislav Horal1, Ladislav Izakovič1,
Michaela Kalinová1 and Václav Snášel2
1
Department of informatics, FPV, University of Saint Cyril and Methodius,
Nám. J. Herdu 2, 917 01, Trnava, Slovakia
2
Department of informatics, FEI, VŠB – Technical University of Ostrava,
17. listopadu 15, 708 33, Ostrava-Poruba, Czech republic
gudaba@gmail.com, stanislav.horal@ucm.sk, izakovil@ucm.sk, michaela.kalinova@ucm.sk,
vaclav.snasel@vsb.cz
Abstract technologies. Among these data we can include
especially text, image and acoustic documents. The set
The information is at the present time often of text documents we will be consider as an input area.
saved and available in electronic form. With Almost in all well-known information retrieval systems
still increasing quantity of accessible, most occurs the morphological part. By the help of this part
frequently text information, the need of we can, by using stop-list, remove non-semantic words
organization of these data is raising. The from documents, and semantic significant words
problem of fast and effective information convert to the basic form. In this way, we specify terms
retrieval occurs very often. In this contribution which after evaluation make vector in the space of
we describe the method for creating word concepts. This vector is then used for documents
vector space and using NOT operation for identification from the point of view his content [16].
more effective acquirement of relevant The application of the method based on combination
documents. of the vector model and Boolean logic appears
as a suitable way for creating the model of natural
language. In this contribution we present constructions
in the vector space based on the standard linear algebra,
and also some examples of using the vector negation for
1 Introduction separation meanings of ambiguous words. In quantum
logic arbitrary sets are substituted by linear subspaces
High volume of text documents and the rate of growth of the vector space and union, intersection and
these documents requires finding new approaches in complement are substituted by the vector sum,
linguistics and in areas related with information intersection and orthogonal complements of these
retrieval (IR). These new approaches are based on subspaces.
principles which are derived from natural sciences. A useful tool for information retrieving and
Noticeable expansion of geometrics was motivated processing is latent semantic analysis – LSA. This
by Descartes ideas and by establishment of the method, which is based on singular value
coordinate system which permit interconnection decomposition (SVD), we can use for improving access
between geometrics and algebra [12, 13]. to desirable documents. We regard the factor of greatest
In IR was geometrical methods enveloped in the singular values k. LSA has geometrical representation,
form of the vector model. Another meaningful step in in which objects (e.g. documents and terms) are
geometrical understanding of the world was distributed in the low-dimensional space. As an
interconnection of geometrics and logics. example we can use term-document matrix, in which
This connection was made on the ground of the rows represent terms and columns of matrix represent
quantum physics [9, 12]. documents. Nonzero values in matrix signify that
A huge amount of multimedia data is at the present corresponding documents include required terms. This
time coupled with expansion of information vector space model was described by Salton [10].
In the first part of our contribution is described n-dimensional subspace. This guarantees method
representation of word meanings in the vector space. of least squares. Each word is then represented by using
Second part is focused on operations in the word space. n most significant latent variables. This process is called
Next chapter is devoted to basic knowledge from latent semantic analysis – LSA [6]. Especially for these
Boolean logic. In the last part we mentioned theoretical purposes of determining semantic similarity between
knowledge applied in the process of searching word words was made by Schütze one variant of LSA [11].
meanings. Instead of using documents as columns in matrix, there
were used content-bearing words. Consequently, in our
case is vector of the word koruna (crown) defined upon
that, that it was frequently occurred by the words cena
2 Word meaning representation in vector (price) and mena (currency). This method is convenient
space for semantic tasks, like is clustering words according to
similar meaning and determination disambiguation
of words [7, 8].
Vector space could be understood as a set of points,
in which each point of the space is defined by the list
of coordinates [5]. Two points can be accountable
by adding theirs coordinates and each point can be 3 Logical operations in word space
multiplied by the scalar (in this paper scalars are real
numbers, therefore all our vector spaces are “real”
vector spaces). The first linguistic examples of vector At investigation dependencies of words in the word
spaces were developed for information retrieval [10]. space we can use logical operations, mainly primarily
By accounting occurrences of each word in the each negation in relations of orthogonality and disjunction in
document we get term-document matrix. Each couple i,j relations of the vector sum of subspaces.
in matrix indicates number how many times was the
word wi occurred in the document Dj. Then rows
of the matrix can be understood as word-vectors. 3.1 Vector negation
Dimensions of this vector space (number of coordinates We would like to model meaning of expression
given to each word) are therefore equal to the number „koruna NOT klenot“ (koruna – crown as a coin, klenot
of documents in collection. Document vectors are – crown as a jewel) in a such way, that system will be
generated by calculating (weighted) sum of word- able to awake, that we are interested in finances,
vectors of words occurred in given document. but not about meaning of the word koruna in the sense
Similar techniques are used in information retrieval of the jewel. Therefore we need to find aspects of
for determination similarity relation between words and meaning the word koruna which are different from the
documents. Similarity can be determined by calculating word koruna as a jewel, and have no relation to this
cosine of the angle between two vectors [10], where wi, word. Word meanings have not interrelationship if they
di are coordinates of the vectors w and d, w ⋅ d is inner have not any common marks. Document is considered
product w and d. ||w|| is length of the vector w [5]. as irrelevant for user if the inner product with user
query is equal to zero, when query vector and document
sim( w, d ) =
∑w d =
w⋅d
i i
vector are orthogonal [5].
∑w ∑d w d
2
i i
2
Definition 1. Two words a and b are considered as
irrelevant to each other, if their vectors are orthogonal,
The calculus is further simplified by normalization of i.e. a a b are irrelevant to each other, if a ⋅ b = 0 [15].
all vectors to the same length, consequently then the
cosine similarity is equal with euclidean inner product. Definition 2. Let V is a vector space with inner
This is standard method which avoids add great weight product. Then we could define for vector subspace A⊆V
of consequence to frequent words or large documents. orthogonal subspace A┴ [15]
Normalized vectors was used in all models and
experiments described in this contribution. A┴ ≡ {v ∈V: ∀a∈ A, a · v = 0}.
This structure can by used for determination
similarities between pairs of words – two words will If A and B are subspaces of the space V, then NOT B
have high similarity, if they will be situated in same represent B┴ and A NOT B represent projection A into
documents and only seldom is one word occurred B┴. When a, b belongs to V, then a NOT b represent
without another. Some words are combined into projection a into ┴, where is the subspace
combined query statements by using the commutative {λb : λ∈R}.
vector sum. These definitions can be used for realization
The term-document matrices are typically very calculations with vectors in vector space. We apply
sparse. Information could be concentrated in low standard method of projection.
number of dimensions when we use singular values
from decomposition, transformation each word into
Theorem 1. Let a, b are subsets of V. Then a NOT b is ⊥
orthogonal complement A . So we have three
represented by vector operations which we use in collection of L(V) and are
a ⋅b defined as follows [2]:
a NOT b = a − 2
b,
b
Conjunction A AND B = A ∩ B
2
where b = b ⋅ b is length of the vector b [15]. Disjunction A OR B = A + B
⊥
Negation NOT A = A
Example: When we are doing inner product with b,
then we obtain It is simple to prove, that these three operations on L(V)
are sufficed to realize any necessary relations
( A + A = V , A ∩ A = 0 ) and to define logic on
⊥ ⊥
(a NOT b)⋅ b = a − a ⋅ b ⋅ b = a ⋅ b − (a ⋅ bb)(⋅ bb ⋅ b) = 0
2 L(V).
b The important piece of knowledge is also that every
subspace A ∈ L (V ) can be identified (by using scalar
This proves that a NOT b and b are orthogonal,
therefore vector a NOT b is certainly part of a, that is product) with special projective map PA : V → A
irrelevant to b (Definition 1), as we required. and through this bijection is logic of subspaces L(V)
equivalent to logic of projection mapping in the vector
When we have normalized vectors, then Theorem 1 has space V.
following form The quantum logic is distinguished from Boolean
logic at least in two properties: quantum logic is not
distributive as well as commutative.
a NOT b = a − (a ⋅ b )b .
The disjunction in set theory can be modeled as
For the purpose of finding expressions or documents, union of sets, which corresponds in linear algebra
which correspondent to a NOT b, is not important to the vector sum of subspaces, where A+B
for each candidate from a as well as b consequently is the smallest subspace of V containing A as well as B.
determine certain differences. Theorem 1 shows that
finding similarity between another vector and a NOT b For determination of similarity between arbitrary
is simple calculus of inner product. objects is necessary to define some function
σ: D × D → R. These functions assign a real number to
pair of objects oi, oj from their domain area D. This
formula will be a measure of similarity relation
4 Quantum logic and vector space of objects, which must satisfy following requests:
1. σ (oi, oj) ≥ 0
With concept of quantum logic we met for the first time 2. σ (oi, oj) = σ (oj, oi) , i.e. remaining of symmetry
in the theory of the quantum mechanic, which was 3. when oi = oj, than σ (oj, oi) = max σ (ok, ol); for ∀ ok,
presented by Birkhoff and von Neumann (1936) [2]. ol ∈ D
From the set theory is known, that if we have sets A and
B and the element a ∈ A or a ∈ B , then also union of Definition 3. Let terms b1 ... bn ∈ V. Term b1 OR ... OR
bn is represented by thesubspace [15]
these sets C= A ∪ B will contain this element. But the
quantum logic does not describe A and B as sets, but as
subspaces of the vector space. B = {λ1b1 + K + λ n bn : λi ∈ R}.
The structure of the quantum logic is simple and we
can obtain it by substitution of sets and subspaces by The search of similarity relation between an individual
vector spaces and subspaces [2]. Points in quantum term a and a general subspace B, is more complicated
mechanics are represented as subspaces of the vector as a search of similarity relation between individual
space V. In this connection we can consider the terms.
collection of subspaces L(V) in the vector space V.
The lower bound of A, B ∈ L (V ) is the biggest From the look of quantum physic, we can use PB
element C ∈ L (V ) , where C ⊆ A and C ⊆ B , what to measure a probability that any element was found
in some state [12]. The value a ⋅ PB (a ) is interpreted
is exactly conjunction of A ∩ B . The upper bound of
A and B is the smallest subspace D ∈ L(V ) , where as a measured probability. For our purposes we define
probability with following relation
A ⊆ D and B ⊆ D . These two operations give
partially formed set L(V), which is structure of the sim(a, B ) = a ⋅ PB (a ) ,
lattice. If we work in the area of scalar product, we can
define for each subspace A ∈ L(V ) its (special)
where probability is given by scalar product a with context common with jewels. For testing the
projection a to the subspace B, from which we calculate effectiveness of our operator of negation we will try to
value of term a lying in subspace B. Problematic find less common meanings of chosen words which are
similarity relation was searched from various looks, see related with prevailing expression.
[1],[12].
Table 1. Example of the influence of the factor k on the
In practice, if the set {bj} is orthonormal then it is relevance of documents.
not correct only to calculate sim(a,bj) for every vector bj
in order. For obtaining an orthonormal base {b~ } for
j
koruna
subspace B is convenient firstly construct orthonormal k=2 k=8 k=15
base for B by in practice used Gram-Schmidt method D14 1 D13 0,773 D13 0,657
[5].
D15 0,999 D15 0,647 D23 0,563
~ ~
(
PB (a ) = ∑ a ⋅ b j b j) D10 0,999 D14 0,643 D18 0,478
j
D13 0,998 D12 0,637 D10 0,432
Consequently, we can write D24 0,967 D10 0,597 D15 0,413
~ D28 0,96 D11 0,592 D14 0,384
(
sim(a, B ) = ∑ j a ⋅ b j . )
D25 0,958 D22 0,561 D11 0,349
For enumeration sim(a,B) we need to calculate D27 0,956 D24 0,363 D22 0,327
~
a scalar product a with every vector b j . This similarity D11 0,952 D16 0,336
relation is more difficult to calculate as in Theorem 1. D26 0,946 D18 0,328
The result which we reached by comparing every
document a NOT b using only one operation - scalar D22 0,935 D23 0,324
product, is the loss for disjunction, although how we D07 0,928
will show later, but is desirable for negated disjunction.
D12 0,912
D08 0,908
5 Using the negation for search meanings D04 0,905
D16 0,902
In this part is presented an introductory example
D03 0,885
of vector connections which demonstrate usage of
vector negation and vector conjunction, vector D21 0,879
disjunction and negation together for finding vectors
which represent different meanings of ambiguous D06 0,878
words. We describe shortly a document of obtained D05 0,85
experiment. It shows that vector negation has smart
contribution contrary of classic Boolean method which D20 0,85
was described in paper [15]. D17 0,796
Our word space was constructed from 28 articles
written in year 2006 which was obtained from the D18 0,751
Internet. The total number of acquired words was 5260. D23 0,716
The collection of articles was focused on economy,
culture, sport, health and science and from every sphere D09 0,603
was processed at least two articles. Documents which
D19 0,585
concern meanings of the word koruna (crown as coin)
are marked as D13, D14, D15. On the other hand,
documents related to koruna (crown as jewel) are
marked D10, D11, D The present experiments represent a calculation
Over data source was created parser which separated of similarity in relationship term-document for different
individual words from the text. By using morphological values of k in area <2, 15>. Here we illustrate results for
analyzer [4] was consequential constructed a list of factor value k=2, k=8 and k=15.
terms. We assume that in articles occurred meanings of The data in Table 1 represent that vector negation
ambiguous words. For example, word koruna (crown as is very effective for selection of relevant documents
coin) is used more frequently in economic context as in
which correspond to the required word koruna (crown) The vector negation and disjunction can be
and left out the word klenot (jewel). combined with selection of some searched query from
LSI regards k greatest singular values. The choice areas of documents. We do not negate only one
of k have to be enough small for obtaining faster access argument, but several. If the user determines that he
to documents, but enough great for adequate wants documents related with a but not with
interception of structure of the corpus. b1,b2,...,bn, it will be interpreted (without next
indication) that he wants only documents witch are not
Table 2. Example of the influence of the operation related with unwanted terms bi. In this way, the next
NOT and the influence of the factor k on the relevance expression
of documents.
a AND (NOT b1) AND (NOT b2) ... AND (NOT bn)
koruna NOT klenot
will pass into form
k=2 k=8 k=15
a NOT (b1 OR b2 ... OR bn).
D14 1 D13 0,806 D13 0,621
D13 0,999 D15 0,677 D23 0,57 By using Definition 3 we will form a disjunction
b1 OR b2 ... OR bn as vector subspace B = {λ1b1 + ...
D15 0,999 D14 0,671 D18 0,55
+λnbn, λi ∈ R}. This term can be transformed on definite
D10 0,998 D22 0,601 D15 0,439 vector which is orthogonal to all irrelevant arguments
{bj}. This vector will be a - PB(a), where PB is
D24 0,96 D10 0,582 D10 0,41 projection on the subspace B as in Theorem 1. This
D25 0,951 D18 0,386 D14 0,4 implies that calculus of the similarity between all
vectors with term a NOT (b1 OR b2 ... OR bn) is the
D27 0,95 D24 0,374 same as simple scalar product which has the same
D28 0,95 D23 0,374 computing effectiveness as the Theorem 1. This
technique is assigned to systematic reduction of
D26 0,94 D16 0,353 irrelevant terms.
D07 0,927 D20 0,328
D22 0,924
D08 0,905 6 Conclusion and future work
D04 0,905 The negation in the vector space is suitable tool for
dimension reduction of required documents. The
D16 0,883 specification of searched documents can be realized by
using several disjunction operations in query. Our word
D06 0,881
space consisted of 5260 words. More relevant results
D03 0,88 can be obtained by application this method for largest
document database.
D21 0,867 Actual experiments represent calculus of similarity
D05 0,844 in the relation term-document. By construction of the
vector space model we have represented individual
D20 0,823 occurrences in documents through Boolean function,
D17 0,768 i.e. we have expressed attendance if you like absence
of the given term in the document.
D18 0,732 Contemporary experiments will be in future carried
out over lexical database of words and the word
D23 0,696
connections in the WordNet, where are recorded
D09 0,605 relative lexical and semantic relations between
individual contained words or concepts.
D19 0,542
In the next experiments we target our effort
to calculation of similarity in relation term-term by
various forms of representation the weight of individual
We realized a decomposition of matrix A for term in the document.
different values of k. The most relevant documents were
obtained for value of factor k=15. On the contrary, for
the very small k=2 is obtained great volume of
documents. It reduces their relevance in regard to
required document.
References
[1] Bělohlávek, R., Snášel, V.: Podobnost a její
modelování, Znalosti 2004, p 309-316
[2] Birkhoff G., von Neumann J. The logic of quantum
mechanics. Annals of Mathematics, 37, 823–843.
1936.
[3] Gärdenfors P. Conceptual Spaces: The Geometry of
Thougt. MIT Press 2004. pp 307.
[4] Horal S., Kalinová M., Kostolanský E.:
Morfologická analýza slovenčiny - Analýza
flexných tvarov. Conference Proceedings
Informatika 2005. Bratislava 2005.
[5] Jänich K. Linear algebra. Undergraduate Texts in
Mathematics. 1994, Springer-Verlag.
[6] Landauer T., Dumais S. A solution to plato’s
problem: The latent semantic analysis theory of
acquisition. Psychological Review, 104(2),
211–240. 1997.
[7] Moravec P., Pokorný J., Snášel V. Using BFA with
WordNet Ontology Based Model for Web Retrieval.
Yaoundé 27.11.2005-1.12.2005. In: CHBEIR,
Richard, DIPANDA, Albert, YÉTONGNON,
Kokou (ed.) SITIS 2005. Dijon : University of
Bourgogne, 2005, p. 254-259.
[8] Moravec P., Pokorný J., Snášel V. WordNet
Ontology Based Model for Web Retrieval. IEEE
WIRI 2005. Japan Tokyo, p. 220-225.
[9] Pokorný J., Snášel V., Kopecký M.
Dokumentografické informační systémy. Karolinum,
Skriptum MFF UK Praha, 2005, pp 184.
[10] Salton G., McGill M. Introduction to modern
information retrieval. McGraw-Hill, 1983, New
York, NY.
[11] Schütze H. Automatic word sense discrimination.
Computational Linguistics, 24(1), 97–124. 1998.
[12] Tversky A.: Features of similarity. Psych. Rev. 84
(1977), 327–352.
[13] Van Rijsbergen K. The Geometry of Information
retrieval. Cambridge University Press. 2004, pp
150.
[14] Widdows D. Geometry and Meaning. CLSI Lecture
Notes, No. 172. Stanford, California, 2004, pp 320.
[15] Widdows D., Peters S. Word Vectors and Quantum
Logic Experiments with negation and disjunction,
Appeared in Mathematics of Language 8, Indiana,
June 2003, p. 141-154. Proceedings of Mathematics
of Language 8, 2003
[16] Zee A.: Quantum Field Theory. Princeton
University Press. 2003.