Geometrical approach for modeling semantics in linguistics

                           © Milan Gudába1, Stanislav Horal1, Ladislav Izakovič1,
                                    Michaela Kalinová1 and Václav Snášel2

            1
                Department of informatics, FPV, University of Saint Cyril and Methodius,
                               Nám. J. Herdu 2, 917 01, Trnava, Slovakia
                2
                    Department of informatics, FEI, VŠB – Technical University of Ostrava,
                          17. listopadu 15, 708 33, Ostrava-Poruba, Czech republic

  gudaba@gmail.com, stanislav.horal@ucm.sk, izakovil@ucm.sk, michaela.kalinova@ucm.sk,
                                   vaclav.snasel@vsb.cz


                         Abstract                          technologies. Among these data we can include
                                                           especially text, image and acoustic documents. The set
    The information is at the present time often           of text documents we will be consider as an input area.
    saved and available in electronic form. With           Almost in all well-known information retrieval systems
    still increasing quantity of accessible, most          occurs the morphological part. By the help of this part
    frequently text information, the need of               we can, by using stop-list, remove non-semantic words
    organization of these data is raising. The             from documents, and semantic significant words
    problem of fast and effective information              convert to the basic form. In this way, we specify terms
    retrieval occurs very often. In this contribution      which after evaluation make vector in the space of
    we describe the method for creating word               concepts. This vector is then used for documents
    vector space and using NOT operation for               identification from the point of view his content [16].
    more effective acquirement of relevant                     The application of the method based on combination
    documents.                                             of the vector model and Boolean logic appears
                                                           as a suitable way for creating the model of natural
                                                           language. In this contribution we present constructions
                                                           in the vector space based on the standard linear algebra,
                                                           and also some examples of using the vector negation for
1 Introduction                                             separation meanings of ambiguous words. In quantum
                                                           logic arbitrary sets are substituted by linear subspaces
High volume of text documents and the rate of growth       of the vector space and union, intersection and
these documents requires finding new approaches in         complement are substituted by the vector sum,
linguistics and in areas related with information          intersection and orthogonal complements of these
retrieval (IR). These new approaches are based on          subspaces.
principles which are derived from natural sciences.            A useful tool for information retrieving and
    Noticeable expansion of geometrics was motivated       processing is latent semantic analysis – LSA. This
by Descartes ideas and by establishment of the             method, which is based on singular value
coordinate system which permit interconnection             decomposition (SVD), we can use for improving access
between geometrics and algebra [12, 13].                   to desirable documents. We regard the factor of greatest
    In IR was geometrical methods enveloped in the         singular values k. LSA has geometrical representation,
form of the vector model. Another meaningful step in       in which objects (e.g. documents and terms) are
geometrical understanding of the world was                 distributed in the low-dimensional space. As an
interconnection     of    geometrics      and    logics.   example we can use term-document matrix, in which
This connection was made on the ground of the              rows represent terms and columns of matrix represent
quantum physics [9, 12].                                   documents. Nonzero values in matrix signify that
    A huge amount of multimedia data is at the present     corresponding documents include required terms. This
time coupled with expansion of information                 vector space model was described by Salton [10].
   In the first part of our contribution is described       n-dimensional subspace. This guarantees method
representation of word meanings in the vector space.        of least squares. Each word is then represented by using
Second part is focused on operations in the word space.     n most significant latent variables. This process is called
Next chapter is devoted to basic knowledge from             latent semantic analysis – LSA [6]. Especially for these
Boolean logic. In the last part we mentioned theoretical    purposes of determining semantic similarity between
knowledge applied in the process of searching word          words was made by Schütze one variant of LSA [11].
meanings.                                                   Instead of using documents as columns in matrix, there
                                                            were used content-bearing words. Consequently, in our
                                                            case is vector of the word koruna (crown) defined upon
                                                            that, that it was frequently occurred by the words cena
2 Word meaning representation in vector                     (price) and mena (currency). This method is convenient
space                                                       for semantic tasks, like is clustering words according to
                                                            similar meaning and determination disambiguation
                                                            of words [7, 8].
Vector space could be understood as a set of points,
in which each point of the space is defined by the list
of coordinates [5]. Two points can be accountable
by adding theirs coordinates and each point can be          3 Logical operations in word space
multiplied by the scalar (in this paper scalars are real
numbers, therefore all our vector spaces are “real”
vector spaces). The first linguistic examples of vector     At investigation dependencies of words in the word
spaces were developed for information retrieval [10].       space we can use logical operations, mainly primarily
By accounting occurrences of each word in the each          negation in relations of orthogonality and disjunction in
document we get term-document matrix. Each couple i,j       relations of the vector sum of subspaces.
in matrix indicates number how many times was the
word wi occurred in the document Dj. Then rows
of the matrix can be understood as word-vectors.            3.1 Vector negation
Dimensions of this vector space (number of coordinates      We would like to model meaning of expression
given to each word) are therefore equal to the number       „koruna NOT klenot“ (koruna – crown as a coin, klenot
of documents in collection. Document vectors are            – crown as a jewel) in a such way, that system will be
generated by calculating (weighted) sum of word-            able to awake, that we are interested in finances,
vectors of words occurred in given document.                but not about meaning of the word koruna in the sense
    Similar techniques are used in information retrieval    of the jewel. Therefore we need to find aspects of
for determination similarity relation between words and     meaning the word koruna which are different from the
documents. Similarity can be determined by calculating      word koruna as a jewel, and have no relation to this
cosine of the angle between two vectors [10], where wi,     word. Word meanings have not interrelationship if they
di are coordinates of the vectors w and d, w ⋅ d is inner   have not any common marks. Document is considered
product w and d. ||w|| is length of the vector w [5].       as irrelevant for user if the inner product with user
                                                            query is equal to zero, when query vector and document

      sim( w, d ) =
                        ∑w d =
                               w⋅d
                                i   i
                                                            vector are orthogonal [5].

                       ∑w ∑d w d
                            2
                            i           i
                                         2
                                                            Definition 1. Two words a and b are considered as
                                                            irrelevant to each other, if their vectors are orthogonal,
The calculus is further simplified by normalization of      i.e. a a b are irrelevant to each other, if a ⋅ b = 0 [15].
all vectors to the same length, consequently then the
cosine similarity is equal with euclidean inner product.    Definition 2. Let V is a vector space with inner
This is standard method which avoids add great weight       product. Then we could define for vector subspace A⊆V
of consequence to frequent words or large documents.        orthogonal subspace A┴ [15]
Normalized vectors was used in all models and
experiments described in this contribution.                            A┴ ≡ {v ∈V: ∀a∈ A, a · v = 0}.
    This structure can by used for determination
similarities between pairs of words – two words will        If A and B are subspaces of the space V, then NOT B
have high similarity, if they will be situated in same      represent B┴ and A NOT B represent projection A into
documents and only seldom is one word occurred              B┴. When a, b belongs to V, then a NOT b represent
without another. Some words are combined into               projection a into <b>┴, where <b> is the subspace
combined query statements by using the commutative          {λb : λ∈R}.
vector sum.                                                     These definitions can be used for realization
    The term-document matrices are typically very           calculations with vectors in vector space. We apply
sparse. Information could be concentrated in low            standard method of projection.
number of dimensions when we use singular values
from decomposition, transformation each word into
Theorem 1. Let a, b are subsets of V. Then a NOT b is                                                    ⊥
                                                                        orthogonal complement A . So we have three
represented by vector                                                   operations which we use in collection of L(V) and are
                                           a ⋅b                         defined as follows [2]:
                  a NOT b = a −                 2
                                                    b,
                                            b
                                                                        Conjunction     A AND B = A ∩ B
           2
where b        = b ⋅ b is length of the vector b [15].                  Disjunction     A OR B = A + B
                                                                                                     ⊥
                                                                        Negation        NOT A = A
Example: When we are doing inner product with b,
then we obtain                                                          It is simple to prove, that these three operations on L(V)
                                                                        are sufficed to realize any necessary relations
                                                                        ( A + A = V , A ∩ A = 0 ) and to define logic on
                                                                                ⊥               ⊥
                                    
  (a NOT b)⋅ b =  a − a ⋅ b ⋅ b  = a ⋅ b − (a ⋅ bb)(⋅ bb ⋅ b) = 0
                                 2                                      L(V).
                            b                                              The important piece of knowledge is also that every
                                                                        subspace A ∈ L (V ) can be identified (by using scalar
This proves that a NOT b and b are orthogonal,
therefore vector a NOT b is certainly part of a, that is                product) with special projective map PA : V → A
irrelevant to b (Definition 1), as we required.                         and through this bijection is logic of subspaces L(V)
                                                                        equivalent to logic of projection mapping in the vector
When we have normalized vectors, then Theorem 1 has                     space V.
following form                                                                The quantum logic is distinguished from Boolean
                                                                        logic at least in two properties: quantum logic is not
                                                                        distributive as well as commutative.
                 a NOT b = a − (a ⋅ b )b .
                                                                             The disjunction in set theory can be modeled as
For the purpose of finding expressions or documents,                    union of sets, which corresponds in linear algebra
which correspondent to a NOT b, is not important                        to the vector sum of subspaces, where A+B
for each candidate from a as well as b consequently                     is the smallest subspace of V containing A as well as B.
determine certain differences. Theorem 1 shows that
finding similarity between another vector and a NOT b                        For determination of similarity between arbitrary
is simple calculus of inner product.                                    objects is necessary to define some function
                                                                        σ: D × D → R. These functions assign a real number to
                                                                        pair of objects oi, oj from their domain area D. This
                                                                        formula will be a measure of similarity relation
4 Quantum logic and vector space                                        of objects, which must satisfy following requests:
                                                                        1. σ (oi, oj) ≥ 0
With concept of quantum logic we met for the first time                 2. σ (oi, oj) = σ (oj, oi) , i.e. remaining of symmetry
in the theory of the quantum mechanic, which was                        3. when oi = oj, than σ (oj, oi) = max σ (ok, ol); for ∀ ok,
presented by Birkhoff and von Neumann (1936) [2].                       ol ∈ D
From the set theory is known, that if we have sets A and
B and the element a ∈ A or a ∈ B , then also union of                   Definition 3. Let terms b1 ... bn ∈ V. Term b1 OR ... OR
                                                                        bn is represented by thesubspace [15]
these sets C= A ∪ B will contain this element. But the
quantum logic does not describe A and B as sets, but as
subspaces of the vector space.                                                      B = {λ1b1 + K + λ n bn : λi ∈ R}.
    The structure of the quantum logic is simple and we
can obtain it by substitution of sets and subspaces by                  The search of similarity relation between an individual
vector spaces and subspaces [2]. Points in quantum                      term a and a general subspace B, is more complicated
mechanics are represented as subspaces of the vector                    as a search of similarity relation between individual
space V. In this connection we can consider the                         terms.
collection of subspaces L(V) in the vector space V.
The lower bound of A, B ∈ L (V ) is the biggest                             From the look of quantum physic, we can use PB
element C ∈ L (V ) , where C ⊆ A and C ⊆ B , what                       to measure a probability that any element was found
                                                                        in some state [12]. The value a ⋅ PB (a ) is interpreted
is exactly conjunction of A ∩ B . The upper bound of
A and B is the smallest subspace D ∈ L(V ) , where                      as a measured probability. For our purposes we define
                                                                        probability with following relation
 A ⊆ D and B ⊆ D . These two operations give
partially formed set L(V), which is structure of the                                     sim(a, B ) = a ⋅ PB (a ) ,
lattice. If we work in the area of scalar product, we can
define for each subspace A ∈ L(V ) its (special)
where probability is given by scalar product a with           context common with jewels. For testing the
projection a to the subspace B, from which we calculate       effectiveness of our operator of negation we will try to
value of term a lying in subspace B. Problematic              find less common meanings of chosen words which are
similarity relation was searched from various looks, see      related with prevailing expression.
[1],[12].
                                                              Table 1. Example of the influence of the factor k on the
    In practice, if the set {bj} is orthonormal then it is                 relevance of documents.
not correct only to calculate sim(a,bj) for every vector bj
in order. For obtaining an orthonormal base      {b~ } for
                                                    j
                                                                    koruna
subspace B is convenient firstly construct orthonormal                    k=2            k=8            k=15
base for B by in practice used Gram-Schmidt method                  D14          1 D13      0,773    D13    0,657
[5].
                                                                    D15     0,999 D15       0,647    D23    0,563
                                 ~ ~
                                (
                 PB (a ) = ∑ a ⋅ b j b j)                           D10     0,999 D14       0,643    D18    0,478
                            j
                                                                    D13     0,998 D12       0,637    D10    0,432
Consequently, we can write                                          D24     0,967 D10       0,597    D15    0,413

                                     ~                              D28      0,96 D11       0,592    D14    0,384
                                    (
                sim(a, B ) = ∑ j a ⋅ b j .  )
                                                                    D25     0,958 D22       0,561    D11    0,349

   For enumeration sim(a,B) we need to calculate                    D27     0,956 D24       0,363    D22    0,327
                                     ~
a scalar product a with every vector b j . This similarity          D11     0,952 D16       0,336
relation is more difficult to calculate as in Theorem 1.            D26     0,946 D18       0,328
The result which we reached by comparing every
document a NOT b using only one operation - scalar                  D22     0,935 D23       0,324
product, is the loss for disjunction, although how we               D07     0,928
will show later, but is desirable for negated disjunction.
                                                                    D12     0,912
                                                                    D08     0,908
5 Using the negation for search meanings                            D04     0,905
                                                                    D16     0,902
In this part is presented an introductory example
                                                                    D03     0,885
of vector connections which demonstrate usage of
vector negation and vector conjunction, vector                      D21     0,879
disjunction and negation together for finding vectors
which represent different meanings of ambiguous                     D06     0,878
words. We describe shortly a document of obtained                   D05      0,85
experiment. It shows that vector negation has smart
contribution contrary of classic Boolean method which               D20      0,85
was described in paper [15].                                        D17     0,796
     Our word space was constructed from 28 articles
written in year 2006 which was obtained from the                    D18     0,751
Internet. The total number of acquired words was 5260.              D23     0,716
The collection of articles was focused on economy,
culture, sport, health and science and from every sphere            D09     0,603
was processed at least two articles. Documents which
                                                                    D19     0,585
concern meanings of the word koruna (crown as coin)
are marked as D13, D14, D15. On the other hand,
documents related to koruna (crown as jewel) are
marked D10, D11, D                                            The present experiments represent a calculation
    Over data source was created parser which separated       of similarity in relationship term-document for different
individual words from the text. By using morphological        values of k in area <2, 15>. Here we illustrate results for
analyzer [4] was consequential constructed a list of          factor value k=2, k=8 and k=15.
terms. We assume that in articles occurred meanings of             The data in Table 1 represent that vector negation
ambiguous words. For example, word koruna (crown as           is very effective for selection of relevant documents
coin) is used more frequently in economic context as in
which correspond to the required word koruna (crown)            The vector negation and disjunction can be
and left out the word klenot (jewel).                      combined with selection of some searched query from
     LSI regards k greatest singular values. The choice    areas of documents. We do not negate only one
of k have to be enough small for obtaining faster access   argument, but several. If the user determines that he
to documents, but enough great for adequate                wants documents related with a but not with
interception of structure of the corpus.                   b1,b2,...,bn, it will be interpreted (without next
                                                           indication) that he wants only documents witch are not
 Table 2. Example of the influence of the operation        related with unwanted terms bi. In this way, the next
NOT and the influence of the factor k on the relevance     expression
                   of documents.
                                                              a AND (NOT b1) AND (NOT b2) ... AND (NOT bn)
      koruna NOT klenot
                                                           will pass into form
            k=2          k=8            k=15
                                                                         a NOT (b1 OR b2 ... OR bn).
      D14         1 D13 0,806 D13          0,621
      D13 0,999 D15 0,677 D23               0,57               By using Definition 3 we will form a disjunction
                                                           b1 OR b2 ... OR bn as vector subspace B = {λ1b1 + ...
      D15 0,999 D14 0,671 D18               0,55
                                                           +λnbn, λi ∈ R}. This term can be transformed on definite
      D10 0,998 D22 0,601 D15              0,439           vector which is orthogonal to all irrelevant arguments
                                                           {bj}. This vector will be a - PB(a), where PB is
      D24     0,96 D10 0,582 D10            0,41           projection on the subspace B as in Theorem 1. This
      D25 0,951 D18 0,386 D14                  0,4         implies that calculus of the similarity between all
                                                           vectors with term a NOT (b1 OR b2 ... OR bn) is the
      D27     0,95 D24 0,374                               same as simple scalar product which has the same
      D28     0,95 D23 0,374                               computing effectiveness as the Theorem 1. This
                                                           technique is assigned to systematic reduction of
      D26     0,94 D16 0,353                               irrelevant terms.
      D07 0,927 D20 0,328
      D22 0,924
      D08 0,905                                            6 Conclusion and future work
      D04 0,905                                            The negation in the vector space is suitable tool for
                                                           dimension reduction of required documents. The
      D16 0,883                                            specification of searched documents can be realized by
                                                           using several disjunction operations in query. Our word
      D06 0,881
                                                           space consisted of 5260 words. More relevant results
      D03     0,88                                         can be obtained by application this method for largest
                                                           document database.
      D21 0,867                                                Actual experiments represent calculus of similarity
      D05 0,844                                            in the relation term-document. By construction of the
                                                           vector space model we have represented individual
      D20 0,823                                            occurrences in documents through Boolean function,
      D17 0,768                                            i.e. we have expressed attendance if you like absence
                                                           of the given term in the document.
      D18 0,732                                                Contemporary experiments will be in future carried
                                                           out over lexical database of words and the word
      D23 0,696
                                                           connections in the WordNet, where are recorded
      D09 0,605                                            relative lexical and semantic relations between
                                                           individual contained words or concepts.
      D19 0,542
                                                              In the next experiments we target our effort
                                                           to calculation of similarity in relation term-term by
                                                           various forms of representation the weight of individual
     We realized a decomposition of matrix A for           term in the document.
different values of k. The most relevant documents were
obtained for value of factor k=15. On the contrary, for
the very small k=2 is obtained great volume of
documents. It reduces their relevance in regard to
required document.
References
[1] Bělohlávek, R., Snášel, V.: Podobnost a její
    modelování, Znalosti 2004, p 309-316
[2] Birkhoff G., von Neumann J. The logic of quantum
    mechanics. Annals of Mathematics, 37, 823–843.
    1936.
[3] Gärdenfors P. Conceptual Spaces: The Geometry of
    Thougt. MIT Press 2004. pp 307.
[4] Horal S., Kalinová M., Kostolanský E.:
    Morfologická analýza slovenčiny - Analýza
    flexných      tvarov.    Conference      Proceedings
    Informatika 2005. Bratislava 2005.
[5] Jänich K. Linear algebra. Undergraduate Texts in
    Mathematics. 1994, Springer-Verlag.
[6] Landauer T., Dumais S. A solution to plato’s
    problem: The latent semantic analysis theory of
    acquisition.     Psychological    Review,     104(2),
     211–240. 1997.
[7] Moravec P., Pokorný J., Snášel V. Using BFA with
    WordNet Ontology Based Model for Web Retrieval.
    Yaoundé 27.11.2005-1.12.2005. In: CHBEIR,
    Richard, DIPANDA, Albert, YÉTONGNON,
    Kokou (ed.) SITIS 2005. Dijon : University of
    Bourgogne, 2005, p. 254-259.
[8] Moravec P., Pokorný J., Snášel V. WordNet
    Ontology Based Model for Web Retrieval. IEEE
    WIRI 2005. Japan Tokyo, p. 220-225.
[9] Pokorný      J.,    Snášel    V.,    Kopecký      M.
    Dokumentografické informační systémy. Karolinum,
    Skriptum MFF UK Praha, 2005, pp 184.
[10] Salton G., McGill M. Introduction to modern
     information retrieval. McGraw-Hill, 1983, New
     York, NY.
[11] Schütze H. Automatic word sense discrimination.
    Computational Linguistics, 24(1), 97–124. 1998.
[12] Tversky A.: Features of similarity. Psych. Rev. 84
     (1977), 327–352.
[13] Van Rijsbergen K. The Geometry of Information
     retrieval. Cambridge University Press. 2004, pp
     150.
[14] Widdows D. Geometry and Meaning. CLSI Lecture
     Notes, No. 172. Stanford, California, 2004, pp 320.
[15] Widdows D., Peters S. Word Vectors and Quantum
     Logic Experiments with negation and disjunction,
    Appeared in Mathematics of Language 8, Indiana,
    June 2003, p. 141-154. Proceedings of Mathematics
    of Language 8, 2003
[16] Zee A.: Quantum Field Theory. Princeton
     University Press. 2003.