=Paper= {{Paper |id=Vol-1/paper-9 |storemode=property |title=Generating queries from complex type definitions |pdfUrl=https://ceur-ws.org/Vol-1/jeusfeld-long.pdf |volume=Vol-1 |authors=M.A. Jeusfeld }} ==Generating queries from complex type definitions== https://ceur-ws.org/Vol-1/jeusfeld-long.pdf
              Generating queries from complex type de nitions
                                           Manfred A. Jeusfeld
                            Informatik V, RWTH Aachen, D-52056 Aachen
                                  jeusfeld@informatik.rwth-aachen.de


                      Abstract                               case of relational databases, only at relations
                                                             can be expressed. In the case of object-oriented
      Many information systems are imple-                    databases, the type system depends on the spe-
      mented as application programs connected               ci c data model of the database system.
      to a database system. A characteristic
      problem of such systems is the famous                 Handcoded clustering by lter procedures
      impedance mismatch, i.e., the conceptual               within the application program is error-prone
      distance between the programming and the               and gives away the chance of reasoning on
      database languages. The traditional solu-              the relationship between the information in the
      tion is to implement an interface that trans-          database and in the application program.
      forms one representation into the other.             Section 2 introduces API modules as the interface
      Commercial database systems o er prepro-          between the database and the application program.
      cessors that allow to embed the database          Base types are imported from the database. Appli-
      language (e.g., SQL) into the programming         cation speci c types are de ned by using tuple, set,
      language (e.g., C). Such an approach frees        and pointer constructors. The latter allows to rep-
      the application programmer from the task          resent recursive concepts of the database schema.
      to specify details of the communication.             Section 3 presents the mapping of the API mod-
      However, the impedance mismatch is not            ules to a logic program delivering complex terms.
      solved but aggravated. The set-oriented           These terms are read by a parser that itself is gen-
      database language is intermixed with the          erated from the API modules.
      element-oriented programming language, a             Section 4 relates the types in an API module to
      notorious cause for programming errors.           statements of a concept language. Thereby, types of
      Moreover, there is no support in map-             two di erent API modules can be checked for sub-
      ping the restricted data representation of        sumption and consistency.
      databases into the more complex type sys-
      tem of programming language. This pa-             2 Interface Modules
      per proposes an intermediate language, the
      API modules, for specifying the relation-         Interfaces between imperative-style programming
      ship between the representations in the           languages should both re ect the major type con-
      database and in the application program.          structors and the facilities of the database query
      The query for retrieving the information          language. The most common type constructors are
      and the data types for storing it can be          tuple and set. Some languages also support lists.
      generated from the API module. The mod-           Pseudo-recursive type de nitions are possible when
      ules are simple enough to allow reasoning         allowing pointer types, e.g. in C and Modula-2.
      on queries generated from them.                   Common base types are Integer and String. The
                                                        denotational semantics of a type expression is a po-
                                                        tentially in nite set of values, for example [Inte-
1 Introduction                                          ger,String] denotes the cartesian product of the se-
The purpose of a database system is to maintain a       mantics of the component types.
large amount of information for a variety of appli-
cation programs. The application-speci c clustering     2.1 Example
is either described as a database view de nition or     Assume a database provides information about a
performed by lters inside the application program.      company. An application programmer has the task
Both approaches have their disadvantages:               to process the information about the projects of em-
    View de nition languages are restricted to the
                                                        ployees who work for a given department. The API
      type system of the database system. In the        module could look as in Figure 1.
                                                          The FROM clause imports concepts from the
  
     This work was partly supported by the Commission   database schema. They are used like ( nite) base
of the European Union under ESPRIT BRA 6810 (Com-       types. Their extension is represented in the current
pulog 2).                                               database state. The TYPE clauses declare complex
 API-MODULE Emps;                                              hasType(T,T(_X,_Y1,...,_YK) :-
 FROM CompanyDb IMPORT Employee, Project,
      Department, String;                                        In(_X,C),
 TYPE                                                            ,
      EmpType/Employee = [name: String;                          ...
                          project: {Project};                    .
                          dept: DeptType];
                                                               The parts  have to be mapped as
        DeptType/Department = [deptName: String;
                               head: *EmpType];             follows:
 PORT
        e: {EmpType| dept.deptName=$N};                         If Ti is a set type fSg where S is a type
 END.                                                            name for a tuple-valued type with arity m then
                                                                  is replaced by
 Figure 1: API module for the company example
                                                                    SET_OF(S(_Z,_Z1,...,_Zm),
                                                                       ( A(C,ai,_X,_Z),
data structures on top of the imported concepts.                         hasType(S,S(_Z,_Z1,...,_Zm))),
EmpType is a record type which represents the name                      _Yi)
of an employee, his projects, and the department.              If Ti is a set type f*Sg where S is a type name
The latter is given by the name and the reference               of a tuple type with class D then 
(pointer) to that employee who is the head of the de-           is replaced by
partment. The purpose of the pointer is to encode
recursive type de nitions. The PORT declaration                     SET_OF(REF(S,_Z),
de nes which information of the database should be                        ( A(C,ai,_X,_Z),
teransfered to the application program. Here, all                           In(_Z,D)),
employees who have a department named by $N are                           _Yi)
of interest. The token $N denotes a placeholder for
a string whose value is inserted by the application            If Ti is a tuple type with arity m then the macro
program at run time.                                            is replaced by
                                                                    _Yi = Ti(_Y,_Z1,...,_Zm),
3 Query Generation                                                  A(C,ai,_X,_Y),
From the database point of view, an API module is                   hasType(Ti,_Yi)
a collection of simple view de nitions whose exten-            Finally, pointer types *Ti where Ti is a record
sions are represented by terms conforming the type              type with class D are mapped to the condition
de nitions. These views are encoded as a logic pro-
gram de ning a predicate hasType(T,V). It formally                  (_Yi = REF(Ti,_Y),
de nes the set of values V having type T, i.e., the se-             A(C,ai,_X,_Y),
mantics of the type T. The database system is mod-                  In(_Y,D);
elled by two predicates for accessing information:                  _Y = null_value)

    In(X,C) denotes that the database object X is            The operator ';' stands for a logical disjunction.
     an        instance       of       the        con-      There will be no backtracking on this disjunction.
     cept C, e.g., In(e2341,Employee), In("Peter            Thus, Y will either be bound to a term REF(.,.)
     Wolfe",String).                                        or to the special value null value.
    A(C,a,X,Y) states that the object X is related to
                                                              The PORT clauses specify those subsets of types
     the object Y by an attribute a which is de ned in      which are of interest to the application program. A
     class C, e.g., A(Employee,name,e2341,"Peter            port de nition
     Wolfe").                                                 PORT v: {T| a1.a2...an=$P}

   The logic program can automatically be gener-            is compile to the clause
ated from the type de nitions by a simple top down            askPort(_S,v,_P) :-
traversing1 algorithm on the syntax tree of a type              SET_OF(_X,
de nition :                                                        (hasType(T,_X),
   For each concept C imported in the API module                    path(_X,[a1,a2,...,an],_P),
we include a clause which delivers all values of type              _S).
C.
                                                            The prede ned predicate path evaluates the path ex-
    hasType(C,C(_X)) :-                                     pression a1.a2...an starting from X. Note that the
      In(_X,C).                                             parameter $P becomes an argument of the askPort
  A tuple type has the general form T/C =                   predicate. It is instantiated by the application pro-
[a1:T1,...,ak:Tk]    . The decoration C is called the       gram when calling the goal askPort. The result is
"class" of T. It is mapped to the clause pattern            returned in the rst argument.
                                                               The restriction in the port de nition can easily be
   1
     We adopt the syntax of Prolog to denote the clauses.   extended to contain several conditions. Moreover,
Variables start with an underscore. The meta predicate      one can allow a constant or a second path expression
SET OF(x,c,s) evaluates s as the set of all elements x      instead of the parameter on the right-hand side of
satisfying the condition c                                  the equality.
  hasType(String,String(_S)) :-                         terms as arguments. Thus, one has to ensure termi-
    In(_S,String).                                      nation when evaluation it by the SLD strategy for
  hasType(Project,Project(_P)) :-                       logic programs.
    In(_P,Project).                                        Fortunately, if one makes sure that the types in the
                                                        API module are de ned non-recursively, then there
  hasType(DeptType,DeptType(_D,_DN,_M)) :-
    In(_D,Department),                                  is a partial order on the type names. If a type de ni-
    _DN = String(_Z1),                                  tion for T1 uses a type T2 on the right-hand side, then
    A(Department,_D,deptName,_Z1),                      T1 > T2 holds. The de nition of the logic program
    hasType(String,_DN),                                generator propagates this property to all clauses of
    _M = REF(EmpType,_Z2),                              the hasType predicate: if hasType(T,.) occurs in
    A(Department,_D,head,_Z2),
    In(_Z2,Employee).                                   the condition of a clause hasType(R,.) then T must
                                                        be smaller than R. Consequently, the logic program
  hasType(EmpType,EmpType(_E,_N,_PS,_DT)) :-
                                                        terminates on each goal hasType(T,X)2.
     In(_E,Employee),                                      A corrolar of this proposition is the niteness of
    _N=String(_Z1),                                     the sets interpreting the types in the API module.
     A(Employee,_E,name,_Z1),
     hasType(String,_N),
     SET_OF( Project(_Z2),
                                                        4.2 Reasoning Services
       (A(Employee,_E,project,_Z2),                     The constructs in the API module were deliberately
        hasType(Project,Project(_Z2))),
        _PS),
                                                        choses to be conformant with the concept language
     _DT = DeptType(_D,_DN,_M),                         dialect of Buchheit et al. 1994. A couple of reasoning
     A(Employee,_E,dept,_D),                            services are possible, each determing a di erent set
     hasType(DeptType,_DT).                             of axioms to be reasoned about. We illustrate only
  askPort(_S,e,_N) :-                                   one service, type checking against the database.
     SET_OF(_X,                                            The type de nitions in an API module make
         (hasType(EmpType,_X),                          assumptions about the structure of the imported
         path(_X,[dept,deptName],_N),
         _S).                                           database concepts. In the example of Figure 1,
                                                        the concepts Employee must at least have three at-
                                                        tribute categories name, project, and dept. For
     Figure 2: Logic program for the example            the Department concept, two attributes categories
                                                        deptName and head are required. Moreover, at-
                                                        tribute cardinalities for the answer objects are
3.1 Mapping of the Example                              stated:
The de nition of hasType for the running example            a set-valued attribute like project does not in-
is presented in Figure 2.                                    duce any cardinality constraint;
   The values of the imported concepts are rep-
resented as unary terms, e.g. String("Peter                 a pointer-valued attribute like head restricts the
Wolfe"). Values of complex terms have more com-              the number of attribute llers to be less or equal
ponents according to the type de nition. For exam-           1;
ple,                                                        the remaining attributes like dept must have
EmpType(e2341,String("Peter Wolfe"),                         exactly one ller.
        [Project(p1),Project(p2)],
        DeptType(d41,String("Marketing"),                  Please note that these properties apply to the de-
         REF(EmpType,e3331)))                             ned concepts like EmpType (ET) and not to the
                                                        imported concepts like Employee (E). The concept
is the term representing a value of EmpType. Val-       language expression is:
ues of set types like fProjectg are sequences of val-
ues of the member type enclosed by brackets. The              ET = E u (= 1 name:S) u (= 1 dept:DT)
component for the dept attribute is avalue of type            DT = D u (= 1 deptName:S) u ( 1 head:E)
DeptType. This shows the representation of point-
ers as terms REF(T,X) where X is the identi er of         As prescribed by the logic program, the pointer-
the value (of type T pointed to. The identi er is al-   valued attribute head of DeptType is not refer-
ways the rst component of a term T(X,...). All          ing to EmpType directly but to its associated class
identi ers are constants from the database.             Employee. Thereby, circular concept de nitions are
                                                        prevented.
4 Properties of Interfaces                                These equalities for the type de nitions are true
Termination of the logic program is guaranteed, and     provided the database schema has a schema consis-
the types de ned in API modules can be compared         tent to it. At least it 3has to ful ll the following
with the database schema and with each other.           "well-typedness" axioms :
4.1 Termination                                            2
                                                             One has to assume that the underlying database
                                                        is nite. This is however a standard assumption with
On rst sight, the generated logic program is recur-     databases.
sive in the hasType clause and it contains complex         3
                                                             The symbol > stands for the most general concept.
                                                             are generated from the API module by a compiler.
       E v 8name:> u 8project:> u 8dept:>                    Since the askPort predicate can only return syn-
       D v 8deptName:> u 8head:>                             tactically correct terms, an exception handling for
                                                             malformed answers is super uous.
  One can check this by adding it to the database
schema and checking its consistency. The service
would just make sure that all referenced attributes
                                                             6 Related Work
are de ned in the database schema.                           Lee and Wiederhold 1994 present a mapping from
  With a stricter regime, one can demand that the            relational database schemas to complex objects. It
database schema must have the same or sharper car-           is more general in the sense that arbitrary arities of
dinality constraints and that the well-typedness is          the relations are allowed. In this paper, we assume a
re ned to the concepts appearing as attribute types          totally normalized schema of the database consisting
in the API module:                                           of unary relations for class membership and binary
                                                             relations for attributes. The advantage of our ap-
       E v ET u 8name:S u 8project:P u 8dept:DT              proach is that the algorithm for the generation of
       D v DT u 8deptName:S u 8head:ET                       the logic programm can be kept free of reasoning on
                                                             foreign key dependencies.
   Here, the database schema has to ful ll the struc-           Plateau et al. 1992 present the view system of
ture of the types in the API module. Consequently,           O2 as complex type de nitions coupled with the
all instances of the database concepts will apply to         database types and with prescriptions for graphical
the type de nitions. The type de nitions would only          display. The type system contained in the O2 data
project on the attributes of interest. Even if one           model. Reasoning on type correctness is done by the
regards this as a too narrow coupling, the test on           compiler.
consistency of the above axioms with the database               The Interface De nition Language IDL by Nestor
schema returns useful information to the designer of         et al. 1992 has four type constructors for records,
an API module.                                               lists, sets, and classes (unions of di erent record
                                                             types). The base types represents boolean, integers,
5 Programming Language                                       rationals, and strings. The values are transfered be-
                                                             tween two programs by using a term representation
  Embedding                                                  similar to ours. The di erence is the missing formal
From the API modules, programming language data              relationship between type de nitions and (database)
types can be derived. Currently, a prototype for             concepts.
the C++ language is implemented. The tuple types                A recent proposal by Papakonstantinou et al. 1994
are mapped to C++ structures, the sets to linked             encodes all type information with the term repre-
lists, and the pointers to C++ pointers. While the           sentation of a value. An application program has
concept language view makes no di erence between             to provide generic data structures capable of storing
pointer-valued attributes (like *EmpType and their           arbitrary values (though restricted to a xed set of
associated class Employee, the representation within         base types). The advantage is the exibility of the
the application program is very di erent:                    approach. A disadvantage is missing compile time
    A value Employee(X) is stored in a variable with
                                                             type checking.
      C++ type char* because X is just an string rep-           Persistent object systems, esp. Tycoon by Matthes
      resenting a database constant.                         1993, "lift" the type systems of information sources
                                                             and application programs into a single type system.
    A value REF(EmpType,X) is represented as a              Because of the heterogenous information sources, the
      main memory address pointing to the location           approach is more general than in O2 . Reasoning is
      where the value EmpType(X,...) is stored.              again restricted to type checking.
      This allows the application program to follow
      attribute chains by fast main memory adress-
      ing.
                                                             7 Conclusion
                                                             We de ned API modules as mediators between ap-
   Communication between an applications program             plication programs and databases. Both program-
and the database is routed through the ports. The            ming language data types and database queries are
term representations of port p returns in argu-              generated from the module description. The lan-
ment s of the query askPort(s,p,x1,...,xn) are               guage is simple enough to guarantee termination of
read by the application program. The arguments               the query and ecient reasoning on the type def-
x1,...,xn contain the constants for the selection            initions. Pointer types are introduced to simulate
conditions4 . The "read" procedure, basically a sim-         recursive datatypes and nd a natural counterpart
ple term parser, stores the values in the C++ data           in the database query.
structures. Both the parser and the data structures             In future, we plan to eliminate the distinction
   4
      Like for types the properties of port de nitions can   between application program and database in the
be investigated within the framework of concept lan-         API modules. Application programs can serve as a
guages. If the parameters x1,...xn are known, then           "database" provided they o er the ability the inter-
the selection conditions are path agreements. Moreover       pret queries on their information. Then, information
one may allow path expressions of the form a1 :a2 :::ar =     ow design between a collection of programs can be
b1 :b2 :::bs without compromising on the theoretical com-    supported by reasoning on the relationship between
plexity of the reasoning.                                    the type de nitions.
  Acknowledgement. Many thanks to Claudia
Welter and Martin Staudt for attacking weak points
in earlier versions of this paper.
References
[Buchheit et al., 1994] M. Buchheit, M.A. Jeusfeld,
  W. Nutt, and M. Staudt. Subsumption between
  queries to object-oriented databases. Information
  Systems, 19(1):33{54, 1994.
[Lee and Wiederhold, 1994] B.S.
  Lee and G. Wiederhold. Outer joins and lters
  for instantiating objects from relational databases
  trough views. IEEE Trans. Knowledge and Data
  Engineering, 6(1):108{119, 1994.
[Matthes, 1993] F. Matthes. Persistente Objektsys-
  teme. Springer-Verlag, 1993.
[Papakonstantinou et al., 1994] Y. Papakonstanti-
  nou, H. Garcia-Molina, and J. Widom. Object ex-
  change across heterogeneous information sources.
  Submitted paper, 1994.
[Plateau et al., 1992] D. Plateau, P. Borras, D. Lev-
  eque, J. Mamou, and D. Tallot. Building user in-
  terfaces with Looks. In F. Bancilhon, C. Delobel,
  P. Kannelakis (eds.):Building an Object-Oriented
  Database System - The Story of O2, Morgan-
  Kaufmann, 256{277, 1992.
[Nestor et al., 1992] J. R. Nestor, J. M. Newcomer,
  P. Giannini, and D. L. Stone. IDL - The language
  and its implementation. Prentice Hall, 1990.