=Paper=
{{Paper
|id=None
|storemode=property
|title=An Expert System for Semantic and Relational Database Design
|pdfUrl=https://ceur-ws.org/Vol-961/paper39.pdf
|volume=Vol-961
|dblpUrl=https://dblp.org/rec/conf/caise/Bouzeghoub89
}}
==An Expert System for Semantic and Relational Database Design==
An Expert System for Semantic and Relational Database Design Mokrane BOllzeglzollb Laboratoire MASI, Universite P. et M. Curie, Centre de Versailles ( 45, avenue des Etats-Unis 78000 Versailles, France Email: mob@litp.inria.fr Abstact : Database design has become an art that requires a high level of competence. The database design process is characterized by a certain indetermination in the way of choosing data structures and constraints. Several different schemas may describe the same reality. The design process is also characterized by an intuitive and empirical methodology; consequently, the quality of the schema obtained is heavily dependent on the database administrator's experience and insight in the database design. The development of increasingly sophisticated and efficient relational Data Base Management Systems has emphasized the need for increasingly complex information systems. It is therefore no longer possible to envision database design carried out without a computer aid. The expert system approach, described in this paper, is more adapted to the difficult problem of database modeling. It integrates algorithms, heuristics and practical know-how, and proposes an interactive methodology which is based on abstraction levels, user friendly interfaces and a modular knowledge base. This approach was fIrst implemented by a prototype which was followed by a commercial product named SECSI(Systeme Expert en Conception de Systemes d'information). Keywords: Information Systems, Database Design, Semantic Data Models, Relational Model, Expert • Systems, Computer Aided Software Engineering. ® SEeSI is a registrated mark of INFOSYS. 1. Introduction approach can efficiently contribute to the database conceptual design. Finally, section 8 concludes by Relational data bases are nowadays the preferential pointing to the future developments. tools for memorization and restitution of large amounts of data. This is especially true because of the great performance of the systems (DBMS). 2. Database Design Problems However, using the relational model as a conceptual design tool was somewhat Data base design is a difficult task. Since the controversial [KENT 79]. Indeed relational begining of the database era, many design concepts are neither simple to use nor sufficient to problems have been pointed. In addition to these capture the semantics of the user's application. At problems, the introduction of knowledge bases, the conceptual level, new models are necessary to deductive databases and object oriented databases, capture the semantics of the real world with impose new complex problems. The following preciseness and naturalness. Semantic data sub-sections outline the different problems as well models seem to be more suitable for this objective as in traditional databases and in future databases. [SMIT 77], [HAMM 81], [MYLO 80], [HULL In the following, we are not concerned by the 87], [PECK 88]. physical organisation of data, nor by its optimization. We point only to those problems Even with these new models, database design implied by the conceptual level and the logical remains a lengthy and tedious process without level (i.e. semantic level). computer aids. Many computer tools have already been built, but they are far from solving all the 2.1. Design problems in data base design problems. traditional databases. We have proposed a new approach based on techniques used in artificial intelligence [BOUZ In this section, we outline the main problems 83]. This approach aims to combine various which must be solved in traditional databases. categories of knowledge coming both from relational theory, semantic data models, artificial (1) Size oj the application: the difficulty of intelligence and software engineering. An expert the design increases with the size of the system product for conceptual and logical design application. Designing small data bases is quite has been built to apply the main ideas of this easy, but structuring several hundred objects, approach and to point to specific problems relationships and constraints without computer [BOUZ 85]. aids is always a hard task, especially for people whose profession is not to be a "specialist" in The SECSI system is developed in Prolog database design. language, under PC-MS/DOS environment and UNIX workstation environment (particularly SUN (2) Relative perception oj the real world workstations). Several other environments are and modeling choices: the real world may be envisioned in the nearest future. The product was reported in different ways with respect to the commercially available since june 1988, and more designers' perceptions: several database schemas than fifthy installations exist today. An english may describe the same reality. The problem is to version will be available too. characterize the best schema with respect to formal rules, then to fmd a methodology to build and to The product SECSI can be considered as a choose this schema. central part of a software engineering environment. It was one of the first tools of the (3) Availability oj injormation and the CASE technology. SECSI can be used as well as problem oj restructuring : the designer by expert designers for data modeling or schema cannot often get a detailed description of objects validation, and by novice users and students in because of the lack of knowledge about the order to learn database design and relational application. However the design must start with technology. incomplete and imperfect knowledge. As the design is bas"ed on a fuzzy universe of discourse, After a brief outline, in section 2, of the data new information may arise and may entail changes base design problems in both traditional and new in the definition of classes, relationships and databases, section 3 presents an overview of the constraints.These changes must be integrated existing tools. Section 4 characterizes the expert without reconsidering all thc modeling phases system approach for database design. Sections 5 which were already done. and 6 detail the expert system approach, its objectives, its structure and the knowledge involved. Section 7 highlights how a deductive 2.2. Design problems in future Model [SMIT 77). The different graphical tools whic~ are pr?vided act as a word process; they databases p~rrrut to deSign. to store and to layout aesthetic dla~rams. but they rarely help in the modeling Besides the traditional database problems chOIces nor m the consistency checking of the kn.owledge bases. deductive databases and object deSIgned schemas. All the design choices are left onented databases have introduced the following to the database administrator (DBA). In the best problems [BROD 84). [GALL 84). [BROD 86). case. the graphical tool offers a few facilities for syntactic control or. if integrated with a data (l). Representation of more complex dictionary. it insures the correspondance between objects: future databases will be able to t~e ~omponents of the schema and the terms of the represent more complex objects than flat data dIc~lOnary. These tools are very interesting for structures. The new objects may be rules. abstract therr user fn~ndly aspect and their help to a clean data types •.texts. graphics or images. Hence the documentation. but they are far from being correspondmg representation models could be effective modeling tools as they do not make any very difficult to learn and to use. design choice. (2) Rep'resentation of more complex The second category of tools provides a set cOllstramts : models have to express more of al~orithms which are generally built upon the complex constraints as general integrity constraints Relanonal M?d.el [CODD 7~). Such tools provide (state constraints) and dynamic constraints programs denvmg a normalized relational schema (transition constraints). For example. some logic froma set of attributes and dependencies [BERN ( based models allow to express most constraints as 76) [BEER 79) [FAGIN 77). This approach is one any first ~rder formu~as. Besides this problem of of the best formalized. it permits to characterize specification, the deSigner has to decide wether a and to automaticaly design the best relational set of integrity constraints is consistent. easy to schema: with respect to redundancy and to update check and to maintain. anomalies. Unforrunately, this approach ignores natural. objects such as entities. relationships, (3). Represent~tion of tile dynamics of genera~zatlon hierarchies and semantic integrity ob!ects: tr~dltlonal database design makes a constramts. The only constraints which are Slnct separation between data and the behavioural hand.led are generaly functional dependencies and of this data. The trends in the new models are (as multivalued dependencies (all of them considered ~vell as in the object oriented languages) to as semantic links). But the acquisition of these mtegrate the dynamic aspect with the static aspect dependencies. in the context of the universel of the database [OODB 86). This leads to a relation assunption. remains a combinatory double reflection effon during the database design problem which requires a detailed study of the process. semantics of all elementary relationships between attributes. However. if these tools are not suitable . This list of problems shows how complex and for the conceptual level, they could be strong difficult the database design was and will be. components at the logical level. The.n! nowdar, no useful methodology could be envlSlOl1ed Without computer design aid. That is The third category of tools consists of a set why CASE technology becomes more and more of interactive programs which can be considered necessary and urgent. as a combination of manual tools and algorithmic tools. They often call for CAD techniques [BROD 87). Interactions between the users and the 3. Overview of Existing system are question-answering oriented or graphics language oriented. This approach is Database Design Tools based on the idea that database design is panly an automated process and partly a human a n ' l To solve some of the traditional database Because of the last reason. this category of tools is problems. many tools have been proposed and more realistic t!Jan the. previous category of tools, commercialized during the last decade [DBEN and thus more mterestmg from the practical point 84]. [BOUZ 86b). [BROD 87). We distinguish of view. However if this approach does not three categories of tools. They are generally always succeed. it is because computer aids are characterized by the methodology and the models programmed once and for all; hence it is difficult which are supponed. to modify or to add design rules. Tools appear as black boxes in which you must believe and accept The first category of tools consists of their results, or you don't believe, then you . graphi~al tools which are generally based on remake manually the design. But this remains an semantic data models lilce the Entity-Relationship interesting approach if we combine it with the new Model [CHEN 76), or the Semantic Hierarchy 2 developments in knowledge engineering and information system changes? How to add new deductive systems. That is what we have done attributes, new relationships and new constraints with SECS!. without redesigning the whole database? How 10 integrate several data base schemas that have been The computer design tool we have built, designed independently? combines the advantages of these three catogories of tools and aims to do more by providing a (4) Database administration: how to develop unified methodology which combines theoretical and maintain a detalled and precise documentation knowledge (expressed by algorithms and rules) on the complete life cycle of the database (from the and practical know-how (expressed by heuristics design phase through the operational phase) ? and cooking recipes). The deductive approach What are the repercussions on the data and the fulfils the lacks of the first two categories of 100ls, already eltisting programs when the characteristics and a modular knowledge base increases the of the software or hardware environment are evolution of the design techniques and allows the modified ? CASE system to evolve better than the third category of tools. Some of these objectives have already been reached (e.g. conceptual and logical design, some schema restructuring and some documentation). 4. SEeSI: a New Generation Others are in the prototyping phase (e.g. physical design, view integration). of Tools for the Database Designers More than just a tool for the design of database schemas, SECSI can also be used in many other ways as for example, a validation tool of manual This section summarizes the different objectives of SECSI, highlights its different characteristics as an design or as an educational tool in data modeling. expert system and reports the different levels of Combined with a DBMS, SECSI can be expertise of its knowledge base. considered as a powerfull prototyping environment of relational databases. SECSI is already used to design buiseness applications as 4.1. Main Objectives of SECSI well as technical applications like geographical databases. SECSI has been conceived for answering many problems related 10 the development of database applications. Generally, design 1001s are devoted 4.1. Main Characteristics of an to a given specific task within the design process. Expert System for Database Design SECSI aims to cover all the life cycle of this process, from the conceptual level to the physical The general defmition of an expert system states level [BOUZ 86a]. These objectives are declared that it is a specialized software 1001 which solves a hereafter as a set of questions for which the problem as well as a human expert can do [HAYE desired system must give answers. 84] . Practically, in our area, this definition may be interpreted by the following four concrete (l) Conceptual lIlodeling: how to transform characteristics: incomplete specifications of a problem, expressed in an ambiguous natural language, into a formal, The first characteristic is to constitute a complete and consistent conceptual schema complete knowledge base including ? How objects can be classified as entities, theoretical algorithms and rules, and experimental relationships, attributes or constraints? Can the knowledge in the database design process. human designer be liberated from these different choices? The second characteristic is to provide an interactive methodological environment (2) Physical design : how to generate an which accepts incomplete specifications, provides efficient data slOring and retrieving structure, the same reasoning as a human expert, permits starting from the conceptual schema of a backtracking 10 any design step, uses a question- database? How to take into account the different answering system and can infer over examples. parameters that characterize: (i) the given application and its cost requirements, (ii) the The third characteristic is to be an open software and hardware environment which will be system which can learn and integrate new used for development and implementation? theoreiical and experimental rules. The system must also trartsfer its expertise through its use (via (3) Schema restructuring: how to make sure explanation of its design rules) and through that the database structure will evolve as the justification of its results. 3 The founh characteristic is to provide end· propenies are described by facts and rules which user friendly interfaces by offering a constitute the detailed description of a given workstation which reproduces the main features application. of the manual design (Le. a graphical interface to design the first rough sketch of the database All the knowledge referred to above constitute schema. a natural language to facilitate the the knowledge base of the expen system SECS!. communication of the specifications, and a declarative language to specify high level assertions that could have ambiguous interpretation in the natural language or a complex 5. An Open System representation in the graphical interface). To Architecture facilitate the interaction between the system and the user. other features like icons. menues and mouse S ECSI architecture is characterised by its must be used too. modularity which permits, thanks to the atomicity of its knowledge base. to add new design rules 4.2. The Different Levels of and new various interfaces. This section describes the components of this architecture. Expertise of a Case Tool in Database Design 5.1. The knowledge base The expenise of the system is divided into three The knowledge base is composed of two pans: a categories: rule base and a fact base. (1) Theoretical knowledge: this knowledge The rule base is created and updated by the coincides with the design concepts (models, rules) expen designer. This base contains general rules and the design methodology (ad vices. reasoning such as normalization rules, mapping rules; but principles). Theoretical knowledge is composed also specific rules which can be system dependent of algorithms. rules and heuristics: or even application dependent. General design A1~orithms : some parts of the design process rules are grouped into the DESIGN module which are well isolated and formalized. and have is used by the methodological regulator and the already been expressed by many efficient explanator. This part of the knowledge constitutes algorithms. such as normalization algorithms. the system shell. hence automatically delivered in cost evaluation of transactions and access paths the basic version of the system. optimization. - .R.uk£: correspond to some known or admitted The fact base is designed by the database expenise as in normalization process administrator. It contains compiled specifications (Armstrong's inference rules). view integ- describing a given application. This part is ration rules. mappings rules between different created and updated by the DBA. It corresponds models. and consistency enforcement rules. to the problem which is submitted to the system. - Heuristics: may be particular interpretations of the real world. assumptions in some value distributions. or simplification of the IUUlfATOI DOCUMlNT4110lf correlations between attributes or between constraints. INTlRlACU I . (2) Specific domain knowledge: for each application domain (e.g. insurance. banking. II'fATUlAL I ISOWLlDCl ....n ,N E E N RULIS medicine. travels). there is some common • G terminology. managerial rules and skills which IDleu""T·1 E I N N CE represent the specific know-how of the domain. This know-how can be represented. as well as we can fomalize it. either by general behavioural rules or by general predefined structures. This knowledge is stored in the system data dictionary ICRAPHIC·I E I and reused. when appropriate. during the design process. Dun ImUI"Cl Sl"I\"DAiD SQL (3) Specific application knowledge: within a specific application domain. there are some Fig,1: The SEeS! architecture propenies which characterize each application (e.g. reservation in a given travel agency). These 4 To represent these two types of knowledge, we constraints? This leads to the use of a utilize two different representation models: a theorem prover principle, based on a semantic network to represent facts and production backward-chaining. rules to represent behavioural constraints and to specify the design rules. The semantic network is (2) Transforming specifications from one given defined by a set of typed nodes (attributes, values, form to another: for example, the successive srtructured objects and possibly their instances) mapping of the natural language and a set of typed arcs representing relationships specification to the relational model schema. between nodes (aggregation, generalization, This leads to the use of an inference engine, association and some constraint arcs) [BOUZ 84]. based on a forward chaining. To.enhance the semantics of this model, additional constraints are defined: domains, roles, keys, Besides this basic program, the inference intersection of classes, cardinalities and functional engine provides two important modules: the dependencies. The figure 2 gives an illustration of methodology regulator and the results justificator. the different concepts of this semantic data model. The first module allows the user to backtrack to any design step to redesign his database or to modify his specifications. The second module makes the system able to explain and justify its results. 6.3. The external interfaces Describing the application in a comprehensible manner is an important problem in data base design. SECSI offers three types of interfaces: the specification interfaces, the acquisition interfaces and the interaction interfaces. (1) The specification interfaces consists of three languages: i. A natural language which accepts simple sentences, possibly composed of a conjunction of subjects,a verb and conjunction of SIJIl'lber Muque r)Jle PO'/I'tl' Color complements. Its role is particuraly V~ pl1.1) important for novice designers and for the easy '" fd rd 'I'uro<7.a communication of specifications. It also allows Fig.2: An example of a semantic network a rapid handling of the system without leaming any formal language. Each semantic relationship is represented by a ii. A high level declarative language which helps couple of binary arcs. Arcs a and p are called to specify what the natural language cannot atomic aggregation arcs, arcs r and 0 are called easily express. This language permits also to molecular aggregaiion arcs, and arcs g and s are describe any database schema specified into called generalization-specialization arcs. any given data model Cardinality constraints are expressed both over alp arcs and rio arcs. Other constraints like functional dependencies are portrayed by fd arcs too. 6.2. The inference engine The inference engine is a program composed of the basic mechanisms which permit to manage the rules into the rule base and to apply them onto the fact base. This inference engine functions by an alternate use of backward-chaining and forward- chaining. The combination of these two principles is made necessary because each step of the used methodology consists of: (I) Proving an hypothesis: for example. is a constraint derivable from a set of other given Fig,_3 An example of an applicatioD specification 5 iii. A graphical interface which enhances the user (3) The interaction interfaces distinguishe friendly interaction. This interface is used icons, menus, forms and documentation: either as an input facility or as a layout feature. As there is no common standard representation i . The menus visualize the authorized operae tions of graphics, SEeSI provides graphics on a given active window. According to the generator which permits to each user to have stages of design, only the permitted operations his own diagrams. are visualized or accessible. ii. Forms facilitate the capture of some cons- traints (e.g. functional depedencies, cardi- nalities), and permit to hihglight the default options generated by the system. "APPLICA1IONS SECSI llODElINO APflJCATI~~ NAlIE iElmONSmp ACQI11SITION Of CIJIOIN.lLITY CONS1llAl~TS DDJ'IERS('IUllCli,llJOO~A~) fUNCTIONAL DEPE~llENCIES CARDINAlITIES DElJVf]S VEJDC1L QIE.\T ~. Fjg4: An example of a graphical interface To be more flexible, the designer of a data base has the choice to describe his application in one or more of these languages, and then combining them in the same specification. An interactive parser Fig.6: A menu and fonn example for cardinality acquisition generates a fact base from the description, this base being progressively enriched by the ii. The documentation is accessible at all levels interactive acquisition module and transformed whenever the user of the system requests it, into a canonical semantic representation. during a session or independently. It consists of a concise user's manual, a synthesis of the (2) The acquisition ill/erface : the interactive methodology implemented by the system, and acquisition assistance helps in completing the the defmition of the main notions of data bases description of a problem through a question- to which the system refers. answering system that automatically reminds the user what he might have forgotten to specify or 6.4. The Expected Results what he has ignored in his description. This interactive acquisition process is based on When the design process is terminated, the system deduction rules, heuristics, and the analysis of produces the following results: examples fed in by the user. (1) A set of basic relations in 4NF and the various keys of these relations (primary keys and F~cr[ONAL NOiMAWATIOH PBASI or lUI RELATICfl VIllIlli(NU\lB>J.II,UQUF,lYI YOill' WlIll~eul'e ootwstll the ded'oco.t IX &Mil ~peDdeD<:a, P..... • lllI1d you 1tI1 .. ",!bot NUMBER cIelemint< mE" DOl 1 Fig.5: Interaction between cODlrajnts and examples 6 eliminate the possible inconsistencies using predifmed inference rules or with the end-user's help. The logical design phase transforms the conceptual schema into a fourth normal form t1lEA1E VIEW I'lliON relational schema with its associated integrity AS constraints and views. It performs the interactive SELEcr_, plm:, lddress acquisition of constraints and the choice of first fRO~CliU ~~IOS normal form relations. Constraints such as SELEcrnm, plm:, ,<\:bes$ intersection and union of classes, cardinalities of fROMSu~ relationships and functional dependencies between attributes are captured. The fust normal form relations are constructed by suppressing generalization hierarchies and removing Fig.? Example of results produced by SEeS! multivalued attributes. Normalization is carried out using dependencies between attributes. All the results can be obtained, as needed, either in standard SQL, as far as this language can do it, or in a more readable ad hoc language (i.e. a declarative language). Fot those results that ~~~ cannot be generated in SQL (especially semantic ~. 'Y ~ integrity constraints), the declarative language is used, and it is of the responsibility of the database administrator to program these results in the ... ''''In'''~ .., ••~n"".. J R,qwlrtlllUI u.". atqlllrtlllnl ... Ip." RtillllrllJlnl ... IJ." ,.,,'n,,'" corresponding language used for his application. 6.A Modular and Progressive ~t;:Jp View In~itltlo. Design Approach The design approach adopted by SECSI and CONCEPTUAL portrayed in figure 8 is based on three abstraction -.. . ~.Ir- ~...w.. SCHEMA I4lp~_1Iru , levels: the conceptual level, the logical level, and the physical level. To each level there correspond ~hoplllltl .lId a specific model and a specific design process. lIormllluUoa The methodology proceeds by stepwise refinement, going from informal description of a given problem down to physical representation of LOGICAL ~I_oohn SCIJEMA ....oetJ0t6ctl..._ . . this problem in terms of records and meso Staning CMaJ&iIu \bt gll:-..oa OPu~J..CIU from informal know ledge, three main phases successively produce: a formal specification, a conceptual schema, a logical relational schema and a physical schema. The conceptual design phase generates from the external description of an application a sound conceptual schema stored as a semantic network • pnYSICAL SCHEMA with its associated constraints. This generation is Fig,S; The deSlgn melbodotowy performed while conversing with the end-user. The different views of the application given by the The physical design phase gives an optimized end-user(s) are mixed into one description after physical schema of the data base. This schema elimination of redundancy and resolution of includes both a set of initial implemented relations conflicts. A verification step performs the and a set of indexes and formats. The choice of validation of the application description in order to implemented relations and plausible attributes for generate a consistent conceptual schema. In indexing depends on the most important or most addition to the syntactic controls, this phase frequent queries which will be performed on the detects generalization cycles, recursive database. This choice needs some estimations associations and objects which play several roles depending on the DBMS used. in the same relationship. The system tries to 7 The design of a database is an iterative .process. "supplier" and generating another object It implies numerous comings-and-goings between ("supplier") described by two attributes ("name" the universe of discourse and the expert system. and "address") and a relationship ("supplies") SEeSI provides this iteration at two levels : between "product" and "supplier". In fact we got more information in the second sentence as we (i) by permitting the interuption of a working know the minimum and the maximum number of session without losing the achieved task. products supllied by a given supplier (i.e. The recover of the session can be done cardinality constraints). But the second sentence without any redesign of the last schema. introduces an additional complexity, related to the usage of synonyms ("product" and "parts"), that (ii) by authorizing the interuption of a dialogue to can be solved if a data dictionary is provided. modify an assertion or to return to a preceding question. Another problem is related to objects that play several roles in the same relationship. For Some resumptions are automatic at each of this example, in the sentence: "A person could be levels without having to manually return to the married to another person", we do not know who current stage of modeling. is the wife and who is the husband. The interpretation must complete this sentence by The design of a data base is not a fully acquiring the different roles from the user and automatic process: a permanent interaction with modifying the previous sentence as follows: "A the designer allows the comb.ining of algorithmic person as a husband could be married to another tasks with human decisions. person as a wife". Every body who reads this sentence can infer more than a relationship between a person and a person; he can deduce that 7. How a Deductive a person cannot be married with himself (because of the term "another" in the sentence). One can Approach Can Contribute also deduce that there exist some persons who are to the Conceptual Design not married. This section highlights the contribution of expert Redundancy is a frequent problem in the system approach in the database design process. specification phase. Some new sentences, altough We particularly focus on that parts where the they are true, do not augment the semantics of the system has enough knowledge to infer from rules, application, as the new described facts can be heuristics and examples. deduced from the previous ones. For example, in the following description, the third sentence is redundant to the first two: "A person has a name 7.1. The Interpretation and the and and age. An employee is a person. An Control of Specifications employee has a name and an age." Again, in the following example, there is a redundancy, but it is Besides the syntactical checks of sentences, one of an underhand one: "Employees and secretaries are the hard problem, when compiling user persons. A secretary is an employee". Indeed specificiltions, is to decide wether a term in a given the second sentence makes a part of the first one sentence must be considered as an attribute, an redundant; as a secretary is an employee, it is not entity, a relationship or an integrity constraint. necessary to say that he or she is a person, this fact can automatically be deduced. ( Sentences are not only interpreted as independent units, but also as a whole consistent As the previous paragraphs show, the specification whose interpretation is stronger than interpretation of a given specification is not only a that of each sentence. So when a sentence is syntactic process, but a very high level semantic followed by another one, its interpretation could be modified because we get more information by reading two sentences than by reading only one. For example, from the following sentence: "A prodllct has a nllmber, a IInit-price, and its process based on expert knowledge: a lexicon of terms which correspond to the lISUal absttations (like aggregation and generalization), and to the different terms used I sllpplier", we understand that there is an object in the vocabulary of the application, named "product" and characterized by three ll attributes: "number unit-price", "supplier But , t1 ll • a set of semantic rules which permit to if we add a new sentence like: "Each prodllct distinguish between atomic objects (attributes) supplier, described by his name and address, and molecular objects (entities and supplies one to ten parts", we modify the relationships), previous interpretation by removing the attribute 8 - a set of inference rules which capture the IfCard(p(0,A1 ))=[1.1 j and A1->A2 integrity constraints involved by the different Then Card(p(O,A2»=[1.1]. sentences. Finally. as it is well-known. one can derive. using Armstrong's axioms. new functional dependencies a set of reinterpretation rules which are able to from a given set of dependencies. For example: modify the interpretation of an objet to another If Al->A2 and A2->A3 Then A1->A3. object with respect to a given set of related sentences. This process can be done as far as necessary to generate the maximum facts. All generated facts a set of redundancy checking rules which are so much gained questions for the user. This avoid the assertions that can be deduced from deduction process is based on two assumptions: other given facts. (i) the number of combinations necessary to get 7.2. The Acquisition of Constraints cardinalities is lower that that of getting functional dependencies, One the most problem related to the constraints acquisition is the combinatory explosion (e.g. the (ii) the acquisition of cardinalities is more natural acquisition of functional dependencies in the than the acquisition of functional dependencies. relational model). In this case. we cannot envision to ask the user all the possible questions. So we However. all functional dependencies are not must use various means in order: (i) to limit the implied by cardinalities. This derivation process different combinations to consider and (ii) to can then be regarded as a heuristic to reduce the avoid all questions for which an answer can be combinatory explosion in the search of functional produced by using a set of inference rules. dependencies. For all other dependencies which are not implied by cardinalities. one must use As stated above, the problem of combinatory Armstrong's inference rules or. in the last. explosion arises for all constraints involving questions to the user. attributes (e.g. functional dependencies. cardinalities of attributes, keys). To illustrate this Before requiring questions to get dependencies. problem. we will consider the case of the one can go far. when possible, in the usage of acquisition of functional dependencies. Suppose heuristics. For example. we can make the we have an object type O(Al .... ,An) where for assumption that. in most applications. there is no each instance, we allow the attributes to be need to search for functional dependencies with mono valued or multivalued. This can be more than four or five attributes in their left represented by the following rliagram: hands ide. This can contribute to considerably reduce the combinatory explosion. Fianally. one can also use examples to reduce the combinatory explosion of functional dependencies. More precisely. examples could be used to generate some non valide functional dependencies; hence a set of unnecessary questions to ask to the designer. For example, from the following small relational extension. An NAME AGE ADDRESS As one can see. cardinalities specify in one hand the monovalued or multivalued attributes (p Dupond 24 Pa::h cardinalities). and in the other hand wether. for a Durand 27 Lyon given attribute value. we can associate one or Durand 27 Dijon many instances of the object (a cardinalities). Martin 24 Paris There is a particular correlation between cardinalities and functional dependencies. Indeed. from a set of attributes with their cardinalities. we one can generate the following non valide can deduce some functional dependencies. For dependencies: example: NAME -1-> ADDRESS If Card(a(A1 ,0»=[1.1 j and Card(p(0,A2))=[1.1j AGE ·I->ADDRESS Then A1->A2. (AGE, ADDRESS) ·I->NOM In the same way. from a combination of (NAME,AGE) -1-> ADDRESS cardinalities and functional dependencies. one can We can notice that we cannot say anything about derive new cardinalities and son on. For example: the NAME and AGE. 9 To summarize, the problems related to To be more relational. we must remove constraint acquisition could be solved by using molecular aggregation arcs (rio) and replace them different techniques as : (i) interaction rules by references and referential constraints. as between constraints, (ii) heuristics, (iii) examples portrayed in the following schema: and finally. when necessary, (iv) questions for the user. 7.3.Transformation of the Semantic Model to the Relational Model p8'\ I),~ Sa One of the main tasks of the system SECSI is to transform a conceptual schema expressed in a semantic model into a logical schema expressed in + a relational model. As it is known. the problem in this case is to not loose the semantics between the two levels. For this objective. the system provides a set of transformation rules which conserves the semantics of the conceptual level at the logical level. by generating new objects, In fact. this transformation depends on ·the semantic integrity constraints and views (queries). cardinality values. To satisfy the first normal form defInition of relations. the cardinality of one of the One simple transformation rule is to represent two a)'cs (either r or 0) must be equal to [1.1) or any molecular objet with monovalued attributes [0,1). References designate the foreign keys of (cardinalities of p arcs equal [1,1) or [0.1]) of the the related objects. semantic model by fIrst normal form relation in the relational model. If the two cardinalities are different of [1.1) or [0,1). we must use another transformation which implies the defInition of an inteimediate object in such a way that one of its cardinalities equals + j PfRSO~(Nm.SlIJ1llJlll. 1,11 \ SurmJIl2, !gel [1.1). That permits to come back to the previous transformation. This case is portrayed by the following schema. ~ When some attributes are multivalued attributes (cardinalities of p arcs equal [IoN) or [O,N)) they must be transformed into a molecular object related to the fust one. to satisfy the atomicity of values which characterizes the fIrst normal form relations. Depending on the attributes wether they are related by functional dependencies or not, this transformation may generate one or several molecular objects. l One of the semantic concepts which is not PII.N,-'F supported by the relational model is the generalization hierarchy. Thanks to the inheritance property. this concept can be represented by basic relations (implemented relations) and virtual relations (calculated relations or views). For I NlJDe example. in the left handside of the following schema. one can replace the CLIENT object by a virtual relation calculated from the union of AGENT and PRIV ATE]ERS. The suppression of this generic class is immediately followed by the inheritance of its properties (related objects) to its sub-classes (schema of the right handside). 10 no semantic information is lost during each transformation, no duplicate relation schemas are generated, the number and the complexity of the generated integrity constraints would not be too high. After the mapping process, he normalization process is carried out, as in usual, by generating for each relation its minimal cover of functional dependencies. 7.4. The Justification of the Results l1.1FXf ~ AGf:\1 UPRIVAll:}Eltl Justifications and explanations are especially emphasized int the expert system area. Such aspect But this transformation is not the unique one is devoted in one hand to increase the user's allowed. Indeed, instead of removing the generic knowledge in database design, and in other hand class, one can remove the sub-classes by replacing to enhance the credibility of the results obtained by them with a specific attribute (say "role") which the system. Explanations and justifications are captures the role played by a given client in the lillie different. In the fust case, the system has to specialization hierarchy. The sub-classes AGENT explain the concepts, the methodology and the and PRIVATE]ERS should be calculated by design rules which constitute the know ledge base. restriction of the CLIENT class using the new In the second case, the system has to justify attribute role. This case is illustrated by the different representation choices for a given following schema transformation: database application. SECSI supports these two aspects at different levels. ,t6 At the first level, the system provides explanations for every ununderstood concept or P{ Y question during the interactive design process. If ~ , Ide a given concept is not understood by the user (e.g. a cardinality of a functional dependency), SECSI elaborates a text composed by a defmition of the concept and an illustrative example). If a given Dom(1011)'{t1t"I,IIIOI} question asked by the system is not understood PRIVATE PERS~IPERSOSllol",·t1IIOI·) because of its complexity (i.e. contains concepts AGE~I'IPDlSON I Rot,,"IIOI') which are not understood) or of its fuzzy form, the system decomposes the given question into easy sub-questions for which the user has only to In a given schema, the two preceding answer "yes" or "no". This gives the system the transformations are not both possible. Their ability to be used by both expert users and naive application depends on wether the sub-classes users. The documentation about the system use have specific properties or not. If they have, the and the syntactic form of the different languages is first transfonnation is better, otherwise the second also integrated at this level. one is beller. In practice, this strategy is not so simple: we can also move up some specific At the second level, the system justifies why an attributes if we introduce a specific integrity object of the application is represented as a constraint the check the null values on this normalized relation or as a virtual relation, why a attribute. To decide for each case which given fact is represented by an integrity-constraint transformation to apply, the expert system may whereas the user waits for a relationship, but also use heuristics based on the number of sub-classes, why a given fact does not explicitly appear in the the number of specific properties of each class and results, or why some artificial objects are in the the complexity of the possible integrity constraints results whereas the user description did not to be generated. contain them. To answer to a given question, the system elaborates a synthesis of the different In general, the mapping process from the design rules applied to the concerned object, from conceptual schema to the logical relational schema the analysis of the external description to the is based on a specific strategy which is built in normalized relational schema. This synthesis is such a way that: 11 based on the different states of the knowledge design choices and to explain reasoning base, which are saved all along the design steps. alternatives is one of the important feature of expert systems; it makes them attractive for Explanation and justification is a very complex complex problems. process which needs to remember a large amount of data an design rules. Its feasability is More than just a tool for the design of database demonstrated in the first prototype of SECSI schemas, SECSI can also be used in many other [BOUZ 86c], but its current industrial application ways, as for example, a validation tool of manual is very limited. design or as an educational tool in data modeling. Combined with a DBMS, SECSI can be 7.5, The Incremental Design considered as a powerfull prototyping environment of relational databases. SECSI is As often claimed, the database design task is an already used to design buiseness applications and iterative and long process which cannot be done technical applications like geographical databases. definitely in a short time. Many refinements are necessary during a long period of time (several Three main versions are dedicated to the design ( weeks or several months depending on the of databases: Version 1 concerns the production complexity of the application). The contribution of a normalized relational schema from the given of an expert system to this problem is to provide specifications in natural or declarative language. some methodological and some recovering Version 2 is dedicated to the optimization of features which pennit the user to backtrack to any structures and to the generation of access paths. design step and to interrupt his design with the Version 3 is an extension of the conceptual ( modeling to the integration of views or to an possibility to recover his session a few days later, without redesigning his application. incremental design of the data base. Each version evolves horizontally through a permanent research During each recovering session, the user would in improving the interfaces, the reasoning modify his first specification by adding new facts explanation and the documentation, offering or modifying old ones. Hence, two different among others the graphic edition and layout problems arise: possibilities adapted to each type of model used. - how to make sure that the specification is Currently, only the version I has reached the consistent, industrial level and then commercialized. The - how to integrate these new facts without other version are in the prototyping level. reconsidering with the user all the pevious design (especially the set of previous asked questions). With future development of deductive databases and knowledge bases, this approach is more To reach this double objective, SECSI stores in adequate to integrate new concepts and new design an extended fact base all the deduced facts from its rules. We think that only powerful expert systems rule base and all the captured fact from the user's will efficiently handle the complexity introduced answers. When a given session is recovered with by these new research developments. some possible specification updates, the system generates a new fact base and procceeds through its following design steps. Whenever the system References needs to ask a question to the user, the extended fact base is used first to derive a possible answer. [BAT! 85] BATIN! C. and CERI S. "Database Design: Then only questions concerning new facts and Methodologies. tools and environments" Ranel modified facts are effectively asked to the user. sesion, ACM SIGMOD 1985. 8. Conclusion: [BEER 79) BEERI C. and BERNSTEIN P.A. "Computational problems related to the design of normal Conn relation sct':m1S" ACNf Transactions On Database Systems, marcH 1979. [BERN 76J BERNSTEIN Ph. "Synthesizing Third Normal I In this paper, we have described an expert system Fonn Relations from Functional Dependencies" approach for database design. As an answer to the ACM TODS. Voll,N'4, 1976. various modeling problems, SECSI relies on the [BORG 85J BORGIDA A. and WILLIAMSON K. most elaborated concepts of databases and the "Accomoding Exeptions in DB and Refining the most recent techniques of artificial intelligence. schema by learning from them." VLDB Conf, Stockholm august 85. Compared to the existing tools, this approach [BOUZ 83J BOUZEGHOUB M. and GARDARIN G. "The seems more suitable for data base design in the design of an expert system for database design-" sense that it takes advantage of both theoretical InU. Worlcshop on New Applications of Databases, development and practical experience. Even if Cambridge (UK). sept. 83. Published in New practically limited, the ability to justify several 12 Applications of Databases. Academic Press. [HAMM 81] HAMMER N. and McLEOD D. "Data Base Gardarin & Gelenbe eds. 1984. Description with SDM: A Semantic Data Model" [BOUZ 84) BOUZEGHOUB M. "MORSE: A Functional (ACM TODS V6, N3, Sep 81). Query Language and its Semantic Data Model" [HAYE 84 ] HAYES-ROTH F. "The knowledge based IN RIA RR270 and Proceed of 84 Trends and Expert Systems: A Tutorial" Computing Revue, Application conf on Databases, IEEE-NBS Vol 17, N'9, 1984. Gaithersburg (USA), 1984. [HULL 87J HULL R. & KING R. "Semantic Database [BOUZ 85] BOUZEGHOUB M., GARDARIN G., Modeling: Survey, Applications and Research METAlS E. "Database Design Tools: an Expert Issues" ACM Computing Surveys, Vol 19, N'3, System Approach" VLDB Conf, Stockholm 1987. august 85. [KENT 79] KENT W. "Limitations of Record-Based [BOUZ 86a] BOUZEGHOUB M. "SECS!: Un Syst~me Wonnation Models." ACM Trans.on Database Expert en Conception de Syst~mes d'lnfonnations" Systems 4,1,1979. These de Doctorat de I'Universite Pierre et Marie [MYLO 80] MYLOPOULOS 1. BERNSTEIN P.A. Curie (Paris VI), mars 1986. WONG H.K.T." A language facility for designing [BOUZ 86b] BOUZEGHOUB M., COMYN I. & database intensive applications" ACM trans. On RICHARD D. "Conception de Bases de Donntes: Database Systems valIS, nb2, 1980. Etat de l'art sur la mod6lisation conceptuelle, [NILS 82] NILSSON N.J. "Principales of Artificial l'int6gration de vues et la conception physique" Intelligence", Springer_Verlag Berlin Heidelberg Rapport MASI-Universitt Paris VI NO 180, 1986. New York, 1982. [BOUZ 86c] BOUZEGHOUB M. & METAlS E. [OODB 86] First Workshop on Object Oriented Database "L'cxplication du raisonnement et la justification Systems, IEEE Computer Society Press, 1986. des resultats dans Ie syst~me expert SECSl", 2iem [PECK 88] PECKHAM 1. & MARY ANSKI F. "Semantic Colloque International en Intelligence Artificielle Data Models" ACM Computing Surveys, Vol 20, de Marseille, 1986). N'3, 1988. [BROD 84J BRODIE M., MYLOPOULOS 1., SCHMIDT [SMIT 77J SMITH I.M. and SMITH D.C.P. "Data Bases Y. "On Conceptual Modelling: Perspectives from Abstractions Aggregation and Generalisation .. Artificial Intelligence, Data Bases and ACM trans. On Database Systems, june 77. Programming languages. Springer-Verlag, NY [TUCH 85] TUCHERMAN 1.., FURTADO A. and 1984. CASANOVA M. "A Tool for Modular D"atabase [BROD 86] BRODIE M. & MYLOPOULOS 1. (editors) Design." VLDB Conf, Stockholm 1985. "On Knowledge Base Management Systems" [ULLM 82) ULLMAN J.D. "Prineiples of Database Springer Verlag, 1986. Systems" computer science press 1982. [BROD 87J BRODIE M. "Automating Database Design [ZAN! 81] ZANIOLO C., MELKANOFF M.M: "On the and Development" A Tutorial of the SIGMOD design of relational data base schemata", ACM Conf., San Francisco 1987. trans. On Database Systems, V6, NI; march 1981. [BROW 83] BROWN and STOTT-PARKER "LAURA: A formal Database Model and her Logical Design Methodology." Proceed. VLDB Conf, Florence 1983. [BUBE 82) GUSTAFSSON M., KARLSSON T. & BUBENKO 1. "A Declarative Approach to Conceptuallnfonnation Modeling" in Infonnation Systems Design Methodologies Olle, Verijn-Stuart editors, North Holland /Pub!. Co, 1982- [CERI 83] CERI S. (editor) "Methodology and Tools for Database Design" North Holland Pub!. Co, 1983. [CHEN 76J CHEN P.P. "The Entity Relationship Model - Toward a Unified View of Data" ACM trans. On Database Systems VI, NI, March 1976. [CODD 70] CODD E.F. "A Relational Model of Data for Large Shared Data Banks", Comm ACM,VoIl3, N06,1970. [CODD 791 CODD E.F. "Extending The Database Relational Model to capture more Meaning." ACM trans. On Database Systems, 4, Dec 79. [DBEN 84) Database Engineering Revue, "Special Issue on Database Design Aids" Vol7, N'4, 1984. [FAG! 77] FAGIN R. "Multivalued Dependencies and a New Normal Form for Relational Databases" ACM TODS, Vo12, N'3, Sept 1977. [GALL 84J GALLAlRE H., MINKER 1., NICOLAS I.M. "Logic and databases: a deductive approach" ACM Computing Surveys vol 16, NO 2, Iuin 84. [GARD 89] GARDARIN G; & VALDURIEZ P. "Relational Databases and Knowledge Bases" Addison Wesley Pub!. Co. 1989. 13