Generating queries from complex type de nitions Manfred A. Jeusfeld Informatik V, RWTH Aachen, D-52056 Aachen jeusfeld@informatik.rwth-aachen.de Abstract case of relational databases, only at relations can be expressed. In the case of object-oriented Many information systems are imple- databases, the type system depends on the spe- mented as application programs connected ci c data model of the database system. to a database system. A characteristic problem of such systems is the famous  Handcoded clustering by lter procedures impedance mismatch, i.e., the conceptual within the application program is error-prone distance between the programming and the and gives away the chance of reasoning on database languages. The traditional solu- the relationship between the information in the tion is to implement an interface that trans- database and in the application program. forms one representation into the other. Section 2 introduces API modules as the interface Commercial database systems o er prepro- between the database and the application program. cessors that allow to embed the database Base types are imported from the database. Appli- language (e.g., SQL) into the programming cation speci c types are de ned by using tuple, set, language (e.g., C). Such an approach frees and pointer constructors. The latter allows to rep- the application programmer from the task resent recursive concepts of the database schema. to specify details of the communication. Section 3 presents the mapping of the API mod- However, the impedance mismatch is not ules to a logic program delivering complex terms. solved but aggravated. The set-oriented These terms are read by a parser that itself is gen- database language is intermixed with the erated from the API modules. element-oriented programming language, a Section 4 relates the types in an API module to notorious cause for programming errors. statements of a concept language. Thereby, types of Moreover, there is no support in map- two di erent API modules can be checked for sub- ping the restricted data representation of sumption and consistency. databases into the more complex type sys- tem of programming language. This pa- 2 Interface Modules per proposes an intermediate language, the API modules, for specifying the relation- Interfaces between imperative-style programming ship between the representations in the languages should both re ect the major type con- database and in the application program. structors and the facilities of the database query The query for retrieving the information language. The most common type constructors are and the data types for storing it can be tuple and set. Some languages also support lists. generated from the API module. The mod- Pseudo-recursive type de nitions are possible when ules are simple enough to allow reasoning allowing pointer types, e.g. in C and Modula-2. on queries generated from them. Common base types are Integer and String. The denotational semantics of a type expression is a po- tentially in nite set of values, for example [Inte- 1 Introduction ger,String] denotes the cartesian product of the se- The purpose of a database system is to maintain a mantics of the component types. large amount of information for a variety of appli- cation programs. The application-speci c clustering 2.1 Example is either described as a database view de nition or Assume a database provides information about a performed by lters inside the application program. company. An application programmer has the task Both approaches have their disadvantages: to process the information about the projects of em-  View de nition languages are restricted to the ployees who work for a given department. The API type system of the database system. In the module could look as in Figure 1. The FROM clause imports concepts from the  This work was partly supported by the Commission database schema. They are used like ( nite) base of the European Union under ESPRIT BRA 6810 (Com- types. Their extension is represented in the current pulog 2). database state. The TYPE clauses declare complex API-MODULE Emps; hasType(T,T(_X,_Y1,...,_YK) :- FROM CompanyDb IMPORT Employee, Project, Department, String; In(_X,C), TYPE , EmpType/Employee = [name: String; ... project: {Project}; . dept: DeptType]; The parts have to be mapped as DeptType/Department = [deptName: String; head: *EmpType]; follows: PORT e: {EmpType| dept.deptName=$N};  If Ti is a set type fSg where S is a type END. name for a tuple-valued type with arity m then is replaced by Figure 1: API module for the company example SET_OF(S(_Z,_Z1,...,_Zm), ( A(C,ai,_X,_Z), data structures on top of the imported concepts. hasType(S,S(_Z,_Z1,...,_Zm))), EmpType is a record type which represents the name _Yi) of an employee, his projects, and the department.  If Ti is a set type f*Sg where S is a type name The latter is given by the name and the reference of a tuple type with class D then (pointer) to that employee who is the head of the de- is replaced by partment. The purpose of the pointer is to encode recursive type de nitions. The PORT declaration SET_OF(REF(S,_Z), de nes which information of the database should be ( A(C,ai,_X,_Z), teransfered to the application program. Here, all In(_Z,D)), employees who have a department named by $N are _Yi) of interest. The token $N denotes a placeholder for a string whose value is inserted by the application  If Ti is a tuple type with arity m then the macro program at run time. is replaced by _Yi = Ti(_Y,_Z1,...,_Zm), 3 Query Generation A(C,ai,_X,_Y), From the database point of view, an API module is hasType(Ti,_Yi) a collection of simple view de nitions whose exten-  Finally, pointer types *Ti where Ti is a record sions are represented by terms conforming the type type with class D are mapped to the condition de nitions. These views are encoded as a logic pro- gram de ning a predicate hasType(T,V). It formally (_Yi = REF(Ti,_Y), de nes the set of values V having type T, i.e., the se- A(C,ai,_X,_Y), mantics of the type T. The database system is mod- In(_Y,D); elled by two predicates for accessing information: _Y = null_value)  In(X,C) denotes that the database object X is The operator ';' stands for a logical disjunction. an instance of the con- There will be no backtracking on this disjunction. cept C, e.g., In(e2341,Employee), In("Peter Thus, Y will either be bound to a term REF(.,.) Wolfe",String). or to the special value null value.  A(C,a,X,Y) states that the object X is related to The PORT clauses specify those subsets of types the object Y by an attribute a which is de ned in which are of interest to the application program. A class C, e.g., A(Employee,name,e2341,"Peter port de nition Wolfe"). PORT v: {T| a1.a2...an=$P} The logic program can automatically be gener- is compile to the clause ated from the type de nitions by a simple top down askPort(_S,v,_P) :- traversing1 algorithm on the syntax tree of a type SET_OF(_X, de nition : (hasType(T,_X), For each concept C imported in the API module path(_X,[a1,a2,...,an],_P), we include a clause which delivers all values of type _S). C. The prede ned predicate path evaluates the path ex- hasType(C,C(_X)) :- pression a1.a2...an starting from X. Note that the In(_X,C). parameter $P becomes an argument of the askPort A tuple type has the general form T/C = predicate. It is instantiated by the application pro- [a1:T1,...,ak:Tk] . The decoration C is called the gram when calling the goal askPort. The result is "class" of T. It is mapped to the clause pattern returned in the rst argument. The restriction in the port de nition can easily be 1 We adopt the syntax of Prolog to denote the clauses. extended to contain several conditions. Moreover, Variables start with an underscore. The meta predicate one can allow a constant or a second path expression SET OF(x,c,s) evaluates s as the set of all elements x instead of the parameter on the right-hand side of satisfying the condition c the equality. hasType(String,String(_S)) :- terms as arguments. Thus, one has to ensure termi- In(_S,String). nation when evaluation it by the SLD strategy for hasType(Project,Project(_P)) :- logic programs. In(_P,Project). Fortunately, if one makes sure that the types in the API module are de ned non-recursively, then there hasType(DeptType,DeptType(_D,_DN,_M)) :- In(_D,Department), is a partial order on the type names. If a type de ni- _DN = String(_Z1), tion for T1 uses a type T2 on the right-hand side, then A(Department,_D,deptName,_Z1), T1 > T2 holds. The de nition of the logic program hasType(String,_DN), generator propagates this property to all clauses of _M = REF(EmpType,_Z2), the hasType predicate: if hasType(T,.) occurs in A(Department,_D,head,_Z2), In(_Z2,Employee). the condition of a clause hasType(R,.) then T must be smaller than R. Consequently, the logic program hasType(EmpType,EmpType(_E,_N,_PS,_DT)) :- terminates on each goal hasType(T,X)2. In(_E,Employee), A corrolar of this proposition is the niteness of _N=String(_Z1), the sets interpreting the types in the API module. A(Employee,_E,name,_Z1), hasType(String,_N), SET_OF( Project(_Z2), 4.2 Reasoning Services (A(Employee,_E,project,_Z2), The constructs in the API module were deliberately hasType(Project,Project(_Z2))), _PS), choses to be conformant with the concept language _DT = DeptType(_D,_DN,_M), dialect of Buchheit et al. 1994. A couple of reasoning A(Employee,_E,dept,_D), services are possible, each determing a di erent set hasType(DeptType,_DT). of axioms to be reasoned about. We illustrate only askPort(_S,e,_N) :- one service, type checking against the database. SET_OF(_X, The type de nitions in an API module make (hasType(EmpType,_X), assumptions about the structure of the imported path(_X,[dept,deptName],_N), _S). database concepts. In the example of Figure 1, the concepts Employee must at least have three at- tribute categories name, project, and dept. For Figure 2: Logic program for the example the Department concept, two attributes categories deptName and head are required. Moreover, at- tribute cardinalities for the answer objects are 3.1 Mapping of the Example stated: The de nition of hasType for the running example  a set-valued attribute like project does not in- is presented in Figure 2. duce any cardinality constraint; The values of the imported concepts are rep- resented as unary terms, e.g. String("Peter  a pointer-valued attribute like head restricts the Wolfe"). Values of complex terms have more com- the number of attribute llers to be less or equal ponents according to the type de nition. For exam- 1; ple,  the remaining attributes like dept must have EmpType(e2341,String("Peter Wolfe"), exactly one ller. [Project(p1),Project(p2)], DeptType(d41,String("Marketing"), Please note that these properties apply to the de- REF(EmpType,e3331))) ned concepts like EmpType (ET) and not to the imported concepts like Employee (E). The concept is the term representing a value of EmpType. Val- language expression is: ues of set types like fProjectg are sequences of val- ues of the member type enclosed by brackets. The ET = E u (= 1 name:S) u (= 1 dept:DT) component for the dept attribute is avalue of type DT = D u (= 1 deptName:S) u ( 1 head:E) DeptType. This shows the representation of point- ers as terms REF(T,X) where X is the identi er of As prescribed by the logic program, the pointer- the value (of type T pointed to. The identi er is al- valued attribute head of DeptType is not refer- ways the rst component of a term T(X,...). All ing to EmpType directly but to its associated class identi ers are constants from the database. Employee. Thereby, circular concept de nitions are prevented. 4 Properties of Interfaces These equalities for the type de nitions are true Termination of the logic program is guaranteed, and provided the database schema has a schema consis- the types de ned in API modules can be compared tent to it. At least it 3has to ful ll the following with the database schema and with each other. "well-typedness" axioms : 4.1 Termination 2 One has to assume that the underlying database is nite. This is however a standard assumption with On rst sight, the generated logic program is recur- databases. sive in the hasType clause and it contains complex 3 The symbol > stands for the most general concept. are generated from the API module by a compiler. E v 8name:> u 8project:> u 8dept:> Since the askPort predicate can only return syn- D v 8deptName:> u 8head:> tactically correct terms, an exception handling for malformed answers is super uous. One can check this by adding it to the database schema and checking its consistency. The service would just make sure that all referenced attributes 6 Related Work are de ned in the database schema. Lee and Wiederhold 1994 present a mapping from With a stricter regime, one can demand that the relational database schemas to complex objects. It database schema must have the same or sharper car- is more general in the sense that arbitrary arities of dinality constraints and that the well-typedness is the relations are allowed. In this paper, we assume a re ned to the concepts appearing as attribute types totally normalized schema of the database consisting in the API module: of unary relations for class membership and binary relations for attributes. The advantage of our ap- E v ET u 8name:S u 8project:P u 8dept:DT proach is that the algorithm for the generation of D v DT u 8deptName:S u 8head:ET the logic programm can be kept free of reasoning on foreign key dependencies. Here, the database schema has to ful ll the struc- Plateau et al. 1992 present the view system of ture of the types in the API module. Consequently, O2 as complex type de nitions coupled with the all instances of the database concepts will apply to database types and with prescriptions for graphical the type de nitions. The type de nitions would only display. The type system contained in the O2 data project on the attributes of interest. Even if one model. Reasoning on type correctness is done by the regards this as a too narrow coupling, the test on compiler. consistency of the above axioms with the database The Interface De nition Language IDL by Nestor schema returns useful information to the designer of et al. 1992 has four type constructors for records, an API module. lists, sets, and classes (unions of di erent record types). The base types represents boolean, integers, 5 Programming Language rationals, and strings. The values are transfered be- tween two programs by using a term representation Embedding similar to ours. The di erence is the missing formal From the API modules, programming language data relationship between type de nitions and (database) types can be derived. Currently, a prototype for concepts. the C++ language is implemented. The tuple types A recent proposal by Papakonstantinou et al. 1994 are mapped to C++ structures, the sets to linked encodes all type information with the term repre- lists, and the pointers to C++ pointers. While the sentation of a value. An application program has concept language view makes no di erence between to provide generic data structures capable of storing pointer-valued attributes (like *EmpType and their arbitrary values (though restricted to a xed set of associated class Employee, the representation within base types). The advantage is the exibility of the the application program is very di erent: approach. A disadvantage is missing compile time  A value Employee(X) is stored in a variable with type checking. C++ type char* because X is just an string rep- Persistent object systems, esp. Tycoon by Matthes resenting a database constant. 1993, "lift" the type systems of information sources and application programs into a single type system.  A value REF(EmpType,X) is represented as a Because of the heterogenous information sources, the main memory address pointing to the location approach is more general than in O2 . Reasoning is where the value EmpType(X,...) is stored. again restricted to type checking. This allows the application program to follow attribute chains by fast main memory adress- ing. 7 Conclusion We de ned API modules as mediators between ap- Communication between an applications program plication programs and databases. Both program- and the database is routed through the ports. The ming language data types and database queries are term representations of port p returns in argu- generated from the module description. The lan- ment s of the query askPort(s,p,x1,...,xn) are guage is simple enough to guarantee termination of read by the application program. The arguments the query and ecient reasoning on the type def- x1,...,xn contain the constants for the selection initions. Pointer types are introduced to simulate conditions4 . The "read" procedure, basically a sim- recursive datatypes and nd a natural counterpart ple term parser, stores the values in the C++ data in the database query. structures. Both the parser and the data structures In future, we plan to eliminate the distinction 4 Like for types the properties of port de nitions can between application program and database in the be investigated within the framework of concept lan- API modules. Application programs can serve as a guages. If the parameters x1,...xn are known, then "database" provided they o er the ability the inter- the selection conditions are path agreements. Moreover pret queries on their information. Then, information one may allow path expressions of the form a1 :a2 :::ar = ow design between a collection of programs can be b1 :b2 :::bs without compromising on the theoretical com- supported by reasoning on the relationship between plexity of the reasoning. the type de nitions. Acknowledgement. Many thanks to Claudia Welter and Martin Staudt for attacking weak points in earlier versions of this paper. References [Buchheit et al., 1994] M. Buchheit, M.A. Jeusfeld, W. Nutt, and M. Staudt. Subsumption between queries to object-oriented databases. Information Systems, 19(1):33{54, 1994. [Lee and Wiederhold, 1994] B.S. Lee and G. Wiederhold. Outer joins and lters for instantiating objects from relational databases trough views. IEEE Trans. Knowledge and Data Engineering, 6(1):108{119, 1994. [Matthes, 1993] F. Matthes. Persistente Objektsys- teme. Springer-Verlag, 1993. [Papakonstantinou et al., 1994] Y. Papakonstanti- nou, H. Garcia-Molina, and J. Widom. Object ex- change across heterogeneous information sources. Submitted paper, 1994. [Plateau et al., 1992] D. Plateau, P. Borras, D. Lev- eque, J. Mamou, and D. Tallot. Building user in- terfaces with Looks. In F. Bancilhon, C. Delobel, P. Kannelakis (eds.):Building an Object-Oriented Database System - The Story of O2, Morgan- Kaufmann, 256{277, 1992. [Nestor et al., 1992] J. R. Nestor, J. M. Newcomer, P. Giannini, and D. L. Stone. IDL - The language and its implementation. Prentice Hall, 1990.