=Paper= {{Paper |id=None |storemode=property |title=Application of Geometrical Approaches to Information Retrieval |pdfUrl=https://ceur-ws.org/Vol-735/paper1.pdf |volume=Vol-735 |dblpUrl=https://dblp.org/rec/conf/syrcodis/Tereshchenko11 }} ==Application of Geometrical Approaches to Information Retrieval== https://ceur-ws.org/Vol-735/paper1.pdf
Application of geometrical approaches to Information Retrieval

                                                °
                                                c Vasyl Tereshchenko
                                      National Taras Shevchenko University of Kyiv
                                                 vtereshch@gmail.com



                        Abstract                                    3. The result of Problem B resolving is transduced
                                                                into correct result for Problem A.
    In this paper we propose an idea of transforma-                 Theorem. The search problems in relational database
    tion relational database problems to computa-               are transformed to computational geometry search prob-
    tional geometry problems to develop more ef-                lems in time O(N ).
    ficient algorithms for discovering useful infor-                Proof. To prove the theorem, it is necessary to prove
    mation from databases. We consider in detail                the fulfillment of three conditions mentioned above. To
    relational algebra operations - the base of rela-           this end, let us formalize input data sets of the relational
    tional language foundation - and give adequate              database search problem in terms and concepts of the ge-
    geometrical interpretation for each of them.                ometric search problem, and per contra, results of ge-
                                                                ometric search problem solution interpreted in terms of
1   Introduction                                                databases.
                                                                    Let each tuple of relation R put in accordance to some
Over the past ten years the relational database manage-         point (or IOW n-plex) of geometric space ER . Let each
ment systems (DBMS) have become wide applicable in              attribute of relation R put in accordance to some coordi-
different areas such as automated design system, CAE            nate axis in the following way: axis value area is defined
system, geographic information system, office informa-          by domain, the attribute is specified under so that value of
tion system and so on. However, the relational database         each tuple element corresponds to some coordinate value
management systems have limited capacity from the ob-           of corresponding space point. Such a correspondence is
ject’s modeling viewpoint. That makes the DBMS to               “one-one”. Ex facte, input data for relational database
be non-applicable for the complicated specialized ap-           problems are transformed into corresponding input data
plications. Also, the recent progress of communication          for computational geometry problem in time O(N ) and
and network technologies makes it easy to accumulate a          the received computational geometry problem solution is
large collection of unstructured or semi-structured texts       transformed into correct solution for relational database
data [2, 5, 6]. In this context, the problem of searching       problem in time O(N ) also. Let us consider the main op-
more efficient algorithms to discover useful information        erations of relational algebra that is the base for relational
from large non-structured databases that differs from ex-       languages creation. And by using examples of relational
istent information retrieving methods is a point of big         algebra search queries, we proved their geometrical re-
interest [7, 4]. The work [3] is worth to be mentioned,         alization (condition 2), and hence, the transformation of
since it is devoted to problem of discovering data in large     two classes of problems, mentioned above.
semistructured text collections.
    The paper proposes an algorithm based on one of the             Selection (S = σpredicate (R)).
computational geometry methods that is called the re-               Selection is a unary operation. The result of the se-
gional search algorithm and speeds up search substan-           lection is a new relation S containing only those tuples
tially. So nomogenously the subject about the possibil-         of the input relation R that holds the specified condition
ity to transform relational database problems to computa-       (predicate). Let the relation R of the relational database
tional geometry problems has been occurred taking into          put in accordance to the subspace ER of the space E n .
account a high efficient of geometric algorithms.               As it was mentioned above, the rank d of the relation
                                                                R defines the dimension of the corresponding subspace
2   The geometrical approach to informa-                        ER , Figure 1.
                                                                    Predicate in the selection operation defines some do-
    tion retrieval                                              main (plane of the rank k < d). We are interested in
                                                                all those points of the subspace ER that lie within the
Definition. Problem A are transduced into Problem B,            defined domain. Thereby, the predicate determines the
if:                                                             search region in the subspace ER , and under the geomet-
    1. The input data for Problem A are transduced into         ric interpretation the result of the selection operation is
corresponding input data for Problem B.                         the query about the points set of the subspace ER that
    2. The Problem B is solved.                                 lie within the queried region. Thus, the regional search
                                                                corresponds to the selection operation. There were pro-
Proceedings of the Spring Researcher’s Colloquium on Database   posed several solutions of the regional search. Among
and Information Systems, Moscow, Russia, 2011                   them, the algorithm based on the orthogonal range tree
                         Figure 1:                                                            Figure 3:

                                                                     Assume that we are given a point P                               =
                                                                    (1)       (2)        (n)
                                                                  (x , x , . . . xP ) over a subspace ER , dim(ER ) = d;
                                                                  where L             =       {l1 , l2 , . . . , ld } is a set of the
                                                                  coordinate axis of the subspace ER .                              Then
                                                                  Πatr.i (R) ≡ prli (ER ) = {P 0 |P 0 = prli P, ∀P ∈ ER }=
                                                                                                         (i)
                                                                  {P 0 |P 0 = (0, . . . , 0, xP , 0, . . . , 0), ∀P ∈ ER },
                                                                  Πatr.i,...,atr.j (R)            ≡             prπ (ER )={P 0 |P 0   =
                                                                  prπ P, π li , . . . , lj , ∀P          ∈          ER }= {P 0 |P 0   =
                                                                                  (i)          (j)
                                                                  (0, . . . , 0, xP , . . . , xP , 0, . . . , 0), ∀P ∈ ER }.

                                                                      Union (R ∪ S).
                                                                      The union of two relations R and S with tuples I and
                         Figure 2:
                                                                  J correspondingly results their concatenation by forma-
method and described in Preparata and Shamos [1] is               tion a new relation enclosing the maximal number of tu-
worth to be mentioned. The very algorithm uses a data             ples (I + J), if the duplicated tuples are expunged. The
structure called the orthogonal range tree that requires          relations R and S should be a union compatible (i.e., they
O(log d−1 N ) time per query, O(N log d−1 N ) space and           should have the same number of attributes with coinci-
O(N log d−1 N ) preprocessing time, where the N is the            dent domains). Let the relations R and S of the relational
number of points and d is the space dimension. For an             database put in accordance to the subspace ER and ES of
example let us consider the following relations:                  the space E n correspondingly. In the geometric space the
                                                                  union compatibility corresponds to the following condi-
   PRODUCER (PR, Surname, City, Status)                           tions:
   CUSTOMER (CS, Surname, City)                                       1. Relations R and S have the same number of at-
   DETAIL (DT, Name, Weight)                                      tributes ↔ corresponding subspaces ER and ES have the
   CPD ({CS, PR, DT}, Quantity, Price)                            same dimension
                                                                      R ↔ ER
  Query 1. Find out the list of all the details with the              S ↔ ES => dim(ER ) = dim(ES )
weight in range (0.2; 0.45).                                          2. Domains coincides ↔ corresponding subspaces
                                                                  ER and ES are given under the same field. Thus, the
   This query is composed in such a way:                          union compatibility of the relations R and S corresponds
                                                                  to the isomorphism of the subspaces ER and ES . Under
   σ0.2