=Paper= {{Paper |id=None |storemode=property |title=An Expert System for Semantic and Relational Database Design |pdfUrl=https://ceur-ws.org/Vol-961/paper39.pdf |volume=Vol-961 |dblpUrl=https://dblp.org/rec/conf/caise/Bouzeghoub89 }} ==An Expert System for Semantic and Relational Database Design== https://ceur-ws.org/Vol-961/paper39.pdf
                                 An Expert System for
                        Semantic and Relational Database Design

                                                Mokrane      BOllzeglzollb

                         Laboratoire MASI,       Universite P. et M. Curie, Centre de Versailles

(                                 45, avenue des Etats-Unis 78000 Versailles, France
                                                 Email: mob@litp.inria.fr




    Abstact :

    Database design has become an art that requires a high level of competence. The database
    design process is characterized by a certain indetermination in the way of choosing data
    structures and constraints. Several different schemas may describe the same reality. The design
    process is also characterized by an intuitive and empirical methodology; consequently, the
    quality of the schema obtained is heavily dependent on the database administrator's experience
    and insight in the database design. The development of increasingly sophisticated and efficient
    relational Data Base Management Systems has emphasized the need for increasingly complex
    information systems. It is therefore no longer possible to envision database design carried out
    without a computer aid. The expert system approach, described in this paper, is more adapted
    to the difficult problem of database modeling. It integrates algorithms, heuristics and practical
    know-how, and proposes an interactive methodology which is based on abstraction levels, user
    friendly interfaces and a modular knowledge base. This approach was fIrst implemented by a
    prototype which was followed by a commercial product named SECSI (Systeme Expert en
    Conception de Systemes d'information).




    Keywords:

    Information Systems, Database Design, Semantic Data Models, Relational Model, Expert •
    Systems, Computer Aided Software Engineering.




    ® SEeSI is a registrated mark of INFOSYS.
1. Introduction                                        approach can efficiently contribute to the database
                                                       conceptual design. Finally, section 8 concludes by
Relational data bases are nowadays the preferential    pointing to the future developments.
tools for memorization and restitution of large
amounts of data. This is especially true because of
the great performance of the systems (DBMS).           2. Database Design Problems
However, using the relational model as a
conceptual design tool was somewhat                    Data base design is a difficult task. Since the
controversial [KENT 79]. Indeed relational             begining of the database era, many design
concepts are neither simple to use nor sufficient to   problems have been pointed. In addition to these
capture the semantics of the user's application. At    problems, the introduction of knowledge bases,
the conceptual level, new models are necessary to      deductive databases and object oriented databases,
capture the semantics of the real world with           impose new complex problems. The following
preciseness and naturalness. Semantic data             sub-sections outline the different problems as well
models seem to be more suitable for this objective     as in traditional databases and in future databases.
[SMIT 77], [HAMM 81], [MYLO 80], [HULL                 In the following, we are not concerned by the
87], [PECK 88].                                        physical organisation of data, nor by its
                                                       optimization. We point only to those problems
   Even with these new models, database design         implied by the conceptual level and the logical
remains a lengthy and tedious process without          level (i.e. semantic level).
computer aids. Many computer tools have already
been built, but they are far from solving all the      2.1.     Design problems in
data base design problems.
                                                                traditional databases.
   We have proposed a new approach based on
techniques used in artificial intelligence [BOUZ       In this section, we outline the main problems
83]. This approach aims to combine various             which must be solved in traditional databases.
categories of knowledge coming both from
relational theory, semantic data models, artificial    (1) Size oj the application: the difficulty of
intelligence and software engineering. An expert       the design increases with the size of the
system product for conceptual and logical design       application. Designing small data bases is quite
has been built to apply the main ideas of this         easy, but structuring several hundred objects,
approach and to point to specific problems             relationships and constraints without computer
[BOUZ 85].                                             aids is always a hard task, especially for people
                                                       whose profession is not to be a "specialist" in
   The SECSI system is developed in Prolog             database design.
language, under PC-MS/DOS environment and
UNIX workstation environment (particularly SUN         (2) Relative perception oj the real world
workstations). Several other environments are          and modeling choices: the real world may be
envisioned in the nearest future. The product was      reported in different ways with respect to the
commercially available since june 1988, and more       designers' perceptions: several database schemas
than fifthy installations exist today. An english      may describe the same reality. The problem is to
version will be available too.                         characterize the best schema with respect to formal
                                                       rules, then to fmd a methodology to build and to
   The product SECSI can be considered as a            choose this schema.
central part of a software engineering
environment. It was one of the first tools of the      (3) Availability oj injormation and the
CASE technology. SECSI can be used as well as          problem oj restructuring : the designer
by expert designers for data modeling or schema        cannot often get a detailed description of objects
validation, and by novice users and students in        because of the lack of knowledge about the
order to learn database design and relational          application. However the design must start with
technology.                                            incomplete and imperfect knowledge. As the
                                                       design is bas"ed on a fuzzy universe of discourse,
   After a brief outline, in section 2, of the data    new information may arise and may entail changes
base design problems in both traditional and new       in the definition of classes, relationships and
databases, section 3 presents an overview of the       constraints.These changes must be integrated
existing tools. Section 4 characterizes the expert     without reconsidering all thc modeling phases
system approach for database design. Sections 5        which were already done.
and 6 detail the expert system approach, its
objectives, its structure and the knowledge
involved. Section 7 highlights how a deductive
    2.2.     Design problems in future                         Model [SMIT 77). The different graphical tools
                                                               whic~ are pr?vided act as a word process; they
             databases                                         p~rrrut to deSign. to store and to layout aesthetic
                                                               dla~rams. but they rarely help in the modeling
    Besides the traditional database problems                  chOIces nor m the consistency checking of the
    kn.owledge bases. deductive databases and object           deSIgned schemas. All the design choices are left
    onented databases have introduced the following            to the database administrator (DBA). In the best
    problems [BROD 84). [GALL 84). [BROD 86).                  case. the graphical tool offers a few facilities for
                                                               syntactic control or. if integrated with a data
    (l). Representation of more complex                        dictionary. it insures the correspondance between
    objects: future databases will be able to                  t~e ~omponents of the schema and the terms of the
    represent more complex objects than flat data              dIc~lOnary. These tools are very interesting for
    structures. The new objects may be rules. abstract         therr user fn~ndly aspect and their help to a clean
    data types •.texts. graphics or images. Hence the          documentation. but they are far from being
    correspondmg representation models could be                effective modeling tools as they do not make any
    very difficult to learn and to use.                        design choice.
    (2) Rep'resentation of more complex                           The second category of tools provides a set
    cOllstramts : models have to express more                  of al~orithms which are generally built upon the
    complex constraints as general integrity constraints       Relanonal M?d.el [CODD 7~). Such tools provide
    (state constraints) and dynamic constraints                programs denvmg a normalized relational schema
    (transition constraints). For example. some logic          froma set of attributes and dependencies [BERN
(   based models allow to express most constraints as          76) [BEER 79) [FAGIN 77). This approach is one
    any first ~rder formu~as. Besides this problem of          of the best formalized. it permits to characterize
    specification, the deSigner has to decide wether a         and to automaticaly design the best relational
    set of integrity constraints is consistent. easy to        schema: with respect to redundancy and to update
    check and to maintain.                                     anomalies. Unforrunately, this approach ignores
                                                               natural. objects such as entities. relationships,
    (3). Represent~tion      of tile dynamics of               genera~zatlon hierarchies and semantic integrity
    ob!ects: tr~dltlonal database design makes a               constramts. The only constraints which are
    Slnct separation between data and the behavioural          hand.led are generaly functional dependencies and
    of this data. The trends in the new models are (as         multivalued dependencies (all of them considered
    ~vell as in the object oriented languages) to
                                                               as semantic links). But the acquisition of these
    mtegrate the dynamic aspect with the static aspect         dependencies. in the context of the universel
    of the database [OODB 86). This leads to a                 relation assunption. remains a combinatory
    double reflection effon during the database design         problem which requires a detailed study of the
    process.                                                   semantics of all elementary relationships between
                                                               attributes. However. if these tools are not suitable
     . This list of problems shows how complex and             for the conceptual level, they could be strong
    difficult the database design was and will be.             components at the logical level.
    The.n! nowdar, no useful methodology could be
    envlSlOl1ed Without computer design aid. That is              The third category of tools consists of a set
    why CASE technology becomes more and more                  of interactive programs which can be considered
    necessary and urgent.                                      as a combination of manual tools and algorithmic
                                                               tools. They often call for CAD techniques [BROD
                                                               87). Interactions between the users and the
     3. Overview of Existing                                   system are question-answering oriented or
                                                               graphics language oriented. This approach is
        Database Design Tools                                  based on the idea that database design is panly an
                                                               automated process and partly a human a n ' l
    To solve some of the traditional database                  Because of the last reason. this category of tools is
    problems. many tools have been proposed and                more realistic t!Jan the. previous category of tools,
    commercialized during the last decade [DBEN                and thus more mterestmg from the practical point
    84]. [BOUZ 86b). [BROD 87). We distinguish                 of view. However if this approach does not
    three categories of tools. They are generally              always succeed. it is because computer aids are
    characterized by the methodology and the models            programmed once and for all; hence it is difficult
    which are supponed.                                        to modify or to add design rules. Tools appear as
                                                               black boxes in which you must believe and accept
         The first category of tools consists of               their results, or you don't believe, then you
    . graphi~al tools which are generally based on             remake manually the design. But this remains an
      semantic data models lilce the Entity-Relationship       interesting approach if we combine it with the new
      Model [CHEN 76), or the Semantic Hierarchy

                                                           2
developments in knowledge engineering and                     information system changes? How to add new
deductive systems. That is what we have done                  attributes, new relationships and new constraints
with SECS!.                                                   without redesigning the whole database? How 10
                                                              integrate several data base schemas that have been
   The computer design tool we have built,                    designed independently?
combines the advantages of these three catogories
of tools and aims to do more by providing a                   (4) Database administration: how to develop
unified methodology which combines theoretical                and maintain a detalled and precise documentation
knowledge (expressed by algorithms and rules)                 on the complete life cycle of the database (from the
and practical know-how (expressed by heuristics               design phase through the operational phase) ?
and cooking recipes). The deductive approach                  What are the repercussions on the data and the
fulfils the lacks of the first two categories of 100ls,       already eltisting programs when the characteristics
and a modular knowledge base increases the                    of the software or hardware environment are
evolution of the design techniques and allows the             modified ?
CASE system to evolve better than the third
category of tools.                                               Some of these objectives have already been
                                                              reached (e.g. conceptual and logical design, some
                                                              schema restructuring and some documentation).
4. SEeSI: a New Generation                                    Others are in the prototyping phase (e.g. physical
                                                              design, view integration).
   of Tools for the Database
   Designers                                                     More than just a tool for the design of database
                                                              schemas, SECSI can also be used in many other
                                                              ways as for example, a validation tool of manual
This section summarizes the different objectives of
SECSI, highlights its different characteristics as an         design or as an educational tool in data modeling.
expert system and reports the different levels of             Combined with a DBMS, SECSI can be
expertise of its knowledge base.                              considered as a powerfull prototyping
                                                              environment of relational databases. SECSI is
                                                              already used to design buiseness applications as
4.1. Main Objectives of SECSI                                 well as technical applications like geographical
                                                              databases.
SECSI has been conceived for answering many
problems related 10 the development of database
applications. Generally, design 1001s are devoted             4.1. Main Characteristics of an
to a given specific task within the design process.           Expert System for Database Design
SECSI aims to cover all the life cycle of this
process, from the conceptual level to the physical            The general defmition of an expert system states
level [BOUZ 86a]. These objectives are declared               that it is a specialized software 1001 which solves a
hereafter as a set of questions for which the                 problem as well as a human expert can do [HAYE
desired system must give answers.                             84] . Practically, in our area, this definition may
                                                              be interpreted by the following four concrete
(l) Conceptual lIlodeling:      how to transform              characteristics:
incomplete specifications of a problem, expressed
in an ambiguous natural language, into a formal,                 The first characteristic is to constitute a
complete and consistent conceptual schema                     complete knowledge base                 including
 ? How objects can be classified as entities,                 theoretical algorithms and rules, and experimental
relationships, attributes or constraints? Can the             knowledge in the database design process.
human designer be liberated from these different
choices?                                                         The second characteristic is to provide an
                                                              interactive methodological environment
(2) Physical design : how to generate an                      which accepts incomplete specifications, provides
efficient data slOring and retrieving structure,              the same reasoning as a human expert, permits
starting from the conceptual schema of a                      backtracking 10 any design step, uses a question-
database? How to take into account the different              answering system and can infer over examples.
parameters that characterize: (i) the given
application and its cost requirements, (ii) the                  The third characteristic is to be an open
software and hardware environment which will be               system which can learn and integrate new
used for development and implementation?                      theoreiical and experimental rules. The system
                                                              must also trartsfer its expertise through its use (via
(3) Schema restructuring: how to make sure                    explanation of its design rules) and through
that the database structure will evolve as the                justification of its results.


                                                          3
   The founh characteristic is to provide end·              propenies are described by facts and rules which
user friendly interfaces by offering a                      constitute the detailed description of a given
workstation which reproduces the main features              application.
of the manual design (Le. a graphical interface to
design the first rough sketch of the database                  All the knowledge referred to above constitute
schema. a natural language to facilitate the                the knowledge base of the expen system SECS!.
communication of the specifications, and a
declarative language to specify high level
assertions that could have ambiguous
interpretation in the natural language or a complex
                                                            5. An Open System
representation in the graphical interface). To                     Architecture
facilitate the interaction between the system and the
user. other features like icons. menues and mouse           S ECSI architecture is characterised by its
must be used too.                                           modularity which permits, thanks to the atomicity
                                                            of its knowledge base. to add new design rules
4.2.     The Different Levels of                            and new various interfaces. This section describes
                                                            the components of this architecture.
         Expertise of a Case Tool
         in Database Design                                 5.1.    The knowledge base
The expenise of the system is divided into three            The knowledge base is composed of two pans: a
categories:                                                 rule base and a fact base.
(1) Theoretical knowledge:         this knowledge              The rule base is created and updated by the
coincides with the design concepts (models, rules)          expen designer. This base contains general rules
and the design methodology (ad vices. reasoning             such as normalization rules, mapping rules; but
principles). Theoretical knowledge is composed              also specific rules which can be system dependent
of algorithms. rules and heuristics:                        or even application dependent. General design
   A1~orithms : some parts of the design process            rules are grouped into the DESIGN module which
   are well isolated and formalized. and have               is used by the methodological regulator and the
   already been expressed by many efficient                 explanator. This part of the knowledge constitutes
   algorithms. such as normalization algorithms.            the system shell. hence automatically delivered in
   cost evaluation of transactions and access paths         the basic version of the system.
   optimization.
- .R.uk£: correspond to some known or admitted                 The fact base is designed by the database
   expenise as in normalization process                     administrator. It contains compiled specifications
   (Armstrong's inference rules). view integ-               describing a given application. This part is
   ration rules. mappings rules between different           created and updated by the DBA. It corresponds
   models. and consistency enforcement rules.               to the problem which is submitted to the system.
- Heuristics: may be particular interpretations of
   the real world. assumptions in some value
   distributions. or simplification of the                                                                    IUUlfATOI
                                                                          DOCUMlNT4110lf
   correlations between attributes or between
   constraints.
                                                                      INTlRlACU
                                                                                                                    I     .
(2) Specific domain knowledge: for each
application domain (e.g. insurance. banking.                          II'fATUlAL   I       ISOWLlDCl ....n
                                                                                                                    ,N E
                                                                                                                    E N
                                                                                                      RULIS
medicine. travels). there is some common                                                                            • G
terminology. managerial rules and skills which                        IDleu""T·1                                    E I
                                                                                                                    N N
                                                                                                                    CE
represent the specific know-how of the domain.
This know-how can be represented. as well as we
can fomalize it. either by general behavioural rules
or by general predefined structures. This
knowledge is stored in the system data dictionary
                                                                      ICRAPHIC·I
                                                                                                                    E

                                                                                                                              I
and reused. when appropriate. during the design
process.
                                                                                           Dun ImUI"Cl
                                                                                            Sl"I\"DAiD SQL
(3) Specific application knowledge: within
a specific application domain. there are some                              Fig,1: The SEeS! architecture
propenies which characterize each application
(e.g. reservation in a given travel agency). These


                                                        4
   To represent these two types of knowledge, we                    constraints? This leads to the use of a
utilize two different representation models: a                      theorem prover principle, based on a
semantic network to represent facts and production                  backward-chaining.
rules to represent behavioural constraints and to
specify the design rules. The semantic network is            (2)    Transforming specifications from one given
defined by a set of typed nodes (attributes, values,                form to another: for example, the successive
srtructured objects and possibly their instances)                   mapping of the natural language
and a set of typed arcs representing relationships                  specification to the relational model schema.
between nodes (aggregation, generalization,                         This leads to the use of an inference engine,
association and some constraint arcs) [BOUZ 84].                    based on a forward chaining.
To.enhance the semantics of this model, additional
constraints are defined: domains, roles, keys,                  Besides this basic program, the inference
intersection of classes, cardinalities and functional        engine provides two important modules: the
dependencies. The figure 2 gives an illustration of          methodology regulator and the results justificator.
the different concepts of this semantic data model.          The first module allows the user to backtrack to
                                                             any design step to redesign his database or to
                                                             modify his specifications. The second module
                                                             makes the system able to explain and justify its
                                                             results.

                                                             6.3.     The external interfaces
                                                             Describing the application in a comprehensible
                                                             manner is an important problem in data base
                                                             design. SECSI offers three types of interfaces: the
                                                             specification interfaces, the acquisition interfaces
                                                             and the interaction interfaces.

                                                             (1) The    specification interfaces consists of
                                                             three languages:
                                                             i. A natural language which accepts simple
                                                                 sentences, possibly composed of a conjunction
                                                                 of subjects,a verb and conjunction of
SIJIl'lber Muque r)Jle PO'/I'tl' Color                           complements. Its role is particuraly
          V~                             pl1.1)
                                                                 important for novice designers and for the easy

                                           '"
               fd         rd
                                           'I'uro<7.a
                                                                 communication of specifications. It also allows
               Fig.2: An example of a semantic network           a rapid handling of the system without leaming
                                                                 any formal language.
Each semantic relationship is represented by a               ii. A high level declarative language which helps
couple of binary arcs. Arcs a and p are called                   to specify what the natural language cannot
atomic aggregation arcs, arcs r and 0 are called                 easily express. This language permits also to
molecular aggregaiion arcs, and arcs g and s are                 describe any database schema specified into
called generalization-specialization arcs.                       any given data model
Cardinality constraints are expressed both over
alp arcs and rio arcs. Other constraints like
functional dependencies are portrayed by fd arcs
too.

6.2.           The inference engine
The inference engine is a program composed of
the basic mechanisms which permit to manage the
rules into the rule base and to apply them onto the
fact base. This inference engine functions by an
alternate use of backward-chaining and forward-
chaining. The combination of these two principles
is made necessary because each step of the used
methodology consists of:
(I) Proving an hypothesis: for example. is a
       constraint derivable from a set of other given               Fig,_3 An example of an applicatioD specification



                                                         5
iii. A graphical interface which enhances the user                                       (3) The interaction interfaces distinguishe
     friendly interaction. This interface is used                                        icons, menus, forms and documentation:
     either as an input facility or as a layout feature.
     As there is no common standard representation                                       i . The menus visualize the authorized operae tions
     of graphics, SEeSI provides graphics                                                    on a given active window. According to the
     generator which permits to each user to have                                            stages of design, only the permitted operations
     his own diagrams.                                                                       are visualized or accessible.

                                                                                         ii. Forms facilitate the capture of some cons-
                                                                                             traints (e.g. functional depedencies, cardi-
                                                                                             nalities), and permit to hihglight the default
                                                                                             options generated by the system.

                                                                                          "APPLICA1IONS SECSI llODElINO
                                                                                                                APflJCATI~~ NAlIE

                                                                                               iElmONSmp ACQI11SITION Of CIJIOIN.lLITY CONS1llAl~TS

                                                                                          DDJ'IERS('IUllCli,llJOO~A~)


                                                                                           fUNCTIONAL DEPE~llENCIES                     CARDINAlITIES
                                                                                                                           DElJVf]S
                                                                                                                                      VEJDC1L
                                                                                                                                    QIE.\T
                                                                                                                                                        ~.
                Fjg4: An example of a graphical interface

   To be more flexible, the designer of a data base
has the choice to describe his application in one or
more of these languages, and then combining them
in the same specification. An interactive parser                                         Fig.6: A menu and fonn example for cardinality acquisition
generates a fact base from the description, this
base being progressively enriched by the                                                 ii. The documentation is accessible at all levels
interactive acquisition module and transformed                                               whenever the user of the system requests it,
into a canonical semantic representation.                                                    during a session or independently. It consists
                                                                                             of a concise user's manual, a synthesis of the
(2) The acquisition ill/erface : the interactive                                             methodology implemented by the system, and
acquisition assistance helps in completing the                                               the defmition of the main notions of data bases
description of a problem through a question-                                                 to which the system refers.
answering system that automatically reminds the
user what he might have forgotten to specify or                                          6.4. The Expected Results
what he has ignored in his description. This
interactive acquisition process is based on                                              When the design process is terminated, the system
deduction rules, heuristics, and the analysis of                                         produces the following results:
examples fed in by the user.
                                                                                         (1) A set of basic relations in 4NF and the various
                                                                                         keys of these relations (primary keys and



  F~cr[ONAL
                 NOiMAWATIOH PBASI or lUI RELATICfl
  VIllIlli(NU\lB>J.II,UQUF,lYI YOill' WlIll~eul'e ootwstll the ded'oco.t IX &Mil ~peDdeD<:a,
    P..... • lllI1d you 1tI1 .. ",!bot NUMBER cIelemint< mE" DOl 1


       Fig.5: Interaction between cODlrajnts and examples


                                                                                     6
                                                            eliminate the possible inconsistencies using
                                                            predifmed inference rules or with the end-user's
                                                            help.

                                                               The logical design phase transforms the
                                                            conceptual schema into a fourth normal form
  t1lEA1E VIEW I'lliON                                      relational schema with its associated integrity
  AS                                                        constraints and views. It performs the interactive
  SELEcr_, plm:, lddress                                    acquisition of constraints and the choice of first
  fRO~CliU
  ~~IOS                                                     normal form relations. Constraints such as
  SELEcrnm, plm:, ,<\:bes$                                  intersection and union of classes, cardinalities of
  fROMSu~                                                   relationships and functional dependencies between
                                                            attributes are captured. The fust normal form
                                                            relations are constructed by suppressing
                                                            generalization hierarchies and removing
       Fig.? Example of results produced by SEeS!           multivalued attributes. Normalization is carried
                                                            out using dependencies between attributes.
   All the results can be obtained, as needed,
either in standard SQL, as far as this language can
do it, or in a more readable ad hoc language (i.e. a
declarative language). Fot those results that                    ~~~
cannot be generated in SQL (especially semantic
                                                                 ~.                       'Y                         ~
integrity constraints), the declarative language is
used, and it is of the responsibility of the database
administrator to program these results in the
                                                                    ... ''''In'''~        .., ••~n""..         J
                                                                  R,qwlrtlllUI u.". atqlllrtlllnl ... Ip." RtillllrllJlnl ... IJ."
                                                                                                                    ,.,,'n,,'"
corresponding language used for his application.


6.A Modular and Progressive                                                    ~t;:Jp     View In~itltlo.
  Design Approach
The design approach adopted by SECSI and                                                  CONCEPTUAL
portrayed in figure 8 is based on three abstraction
                                                                  -.. . ~.Ir-
                                                                  ~...w..                   SCHEMA                   I4lp~_1Iru




                                                                                                  ,
levels: the conceptual level, the logical level, and
the physical level. To each level there correspond
                                                                                            ~hoplllltl
                                                                                                     .lId
a specific model and a specific design process.                                             lIormllluUoa
The methodology proceeds by stepwise
refinement, going from informal description of a
given problem down to physical representation of                                               LOGICAL
                                                                  ~I_oohn                     SCIJEMA                ....oetJ0t6ctl..._ . .
this problem in terms of records and meso Staning                         CMaJ&iIu                                   \bt gll:-..oa




                                                                           OPu~J..CIU
from informal know ledge, three main phases
successively produce: a formal specification, a
conceptual schema, a logical relational schema and
a physical schema.

   The conceptual design phase generates from the
external description of an application a sound
conceptual schema stored as a semantic network
                                                                                                  •
                                                                                              pnYSICAL
                                                                                               SCHEMA


with its associated constraints. This generation is                        Fig,S; The deSlgn melbodotowy
performed while conversing with the end-user.
The different views of the application given by the            The physical design phase gives an optimized
end-user(s) are mixed into one description after            physical schema of the data base. This schema
elimination of redundancy and resolution of                 includes both a set of initial implemented relations
conflicts.     A verification step performs the             and a set of indexes and formats. The choice of
validation of the application description in order to       implemented relations and plausible attributes for
generate a consistent conceptual schema. In                 indexing depends on the most important or most
addition to the syntactic controls, this phase              frequent queries which will be performed on the
detects generalization          cycles, recursive           database. This choice needs some estimations
associations and objects which play several roles           depending on the DBMS used.
in the same relationship. The system tries to


                                                        7
        The design of a database is an iterative .process.       "supplier" and generating another object
    It implies numerous comings-and-goings between               ("supplier") described by two attributes ("name"
    the universe of discourse and the expert system.             and "address") and a relationship ("supplies")
    SEeSI provides this iteration at two levels :                between "product" and "supplier". In fact we got
                                                                 more information in the second sentence as we
    (i) by permitting the interuption of a working               know the minimum and the maximum number of
       session without losing the achieved task.                 products supllied by a given supplier (i.e.
       The recover of the session can be done                    cardinality constraints). But the second sentence
       without any redesign of the last schema.                  introduces an additional complexity, related to the
                                                                 usage of synonyms ("product" and "parts"), that
    (ii) by authorizing the interuption of a dialogue to         can be solved if a data dictionary is provided.
        modify an assertion or to return to a preceding
        question.                                                   Another problem is related to objects that play
                                                                 several roles in the same relationship. For
    Some resumptions are automatic at each of this               example, in the sentence: "A person could be
    levels without having to manually return to the              married to another person", we do not know who
    current stage of modeling.                                   is the wife and who is the husband. The
                                                                 interpretation must complete this sentence by
       The design of a data base is not a fully                  acquiring the different roles from the user and
    automatic process: a permanent interaction with              modifying the previous sentence as follows: "A
    the designer allows the comb.ining of algorithmic            person as a husband could be married to another
    tasks with human decisions.                                  person as a wife". Every body who reads this
                                                                 sentence can infer more than a relationship
                                                                 between a person and a person; he can deduce that
    7. How a Deductive                                           a person cannot be married with himself (because
                                                                 of the term "another" in the sentence). One can
       Approach Can Contribute                                   also deduce that there exist some persons who are
       to the Conceptual Design                                  not married.

    This section highlights the contribution of expert               Redundancy is a frequent problem in the
    system approach in the database design process.              specification phase. Some new sentences, altough
    We particularly focus on that parts where the                they are true, do not augment the semantics of the
    system has enough knowledge to infer from rules,             application, as the new described facts can be
    heuristics and examples.                                     deduced from the previous ones. For example, in
                                                                 the following description, the third sentence is
                                                                 redundant to the first two: "A person has a name
    7.1. The Interpretation and the                              and and age. An employee is a person. An
          Control of Specifications                              employee has a name and an age." Again, in the
                                                                 following example, there is a redundancy, but it is
    Besides the syntactical checks of sentences, one of          an underhand one: "Employees and secretaries are
    the hard problem, when compiling user                        persons. A secretary is an employee". Indeed
    specificiltions, is to decide wether a term in a given       the second sentence makes a part of the first one
    sentence must be considered as an attribute, an              redundant; as a secretary is an employee, it is not
    entity, a relationship or an integrity constraint.           necessary to say that he or she is a person, this
                                                                 fact can automatically be deduced.
(       Sentences are not only interpreted as
    independent units, but also as a whole consistent               As the previous paragraphs show, the
    specification whose interpretation is stronger than          interpretation of a given specification is not only a
    that of each sentence. So when a sentence is                 syntactic process, but a very high level semantic
    followed by another one, its interpretation could
    be modified because we get more information by
    reading two sentences than by reading only one.
    For example, from the following sentence: "A
    prodllct has a nllmber, a IInit-price, and its
                                                                 process based on expert knowledge:

                                                                   a lexicon of terms which correspond to the lISUal
                                                                   absttations (like aggregation and
                                                                   generalization), and to the different terms used
                                                                                                                         I
    sllpplier", we understand that there is an object              in the vocabulary of the application,
    named "product" and characterized by three
                        ll
    attributes: "number unit-price", "supplier But
                             , t1
                                                   ll
                                                        •          a set of semantic rules which permit to
    if we add a new sentence like: "Each prodllct                  distinguish between atomic objects (attributes)
    supplier, described by his name and address,                   and molecular      objects    (entities and
    supplies one to ten parts", we modify the                      relationships),
    previous interpretation by removing the attribute


                                                             8
- a set of inference rules which capture the                 IfCard(p(0,A1 ))=[1.1 j and A1->A2
  integrity constraints involved by the different            Then Card(p(O,A2»=[1.1].
  sentences.                                                 Finally. as it is well-known. one can derive. using
                                                             Armstrong's axioms. new functional dependencies
  a set of reinterpretation rules which are able to          from a given set of dependencies. For example:
  modify the interpretation of an objet to another           If Al->A2 and A2->A3 Then A1->A3.
  object with respect to a given set of related
  sentences.                                                    This process can be done as far as necessary to
                                                             generate the maximum facts. All generated facts
  a set of redundancy checking rules which                   are so much gained questions for the user. This
  avoid the assertions that can be deduced from              deduction process is based on two assumptions:
  other given facts.
                                                             (i) the number of combinations necessary to get
7.2. The Acquisition of Constraints                              cardinalities is lower that that of getting
                                                                 functional dependencies,
One the most problem related to the constraints
acquisition is the combinatory explosion (e.g. the           (ii) the acquisition of cardinalities is more natural
acquisition of functional dependencies in the                   than the acquisition of functional dependencies.
relational model). In this case. we cannot envision
to ask the user all the possible questions. So we               However. all functional dependencies are not
must use various means in order: (i) to limit the            implied by cardinalities. This derivation process
different combinations to consider and (ii) to               can then be regarded as a heuristic to reduce the
avoid all questions for which an answer can be               combinatory explosion in the search of functional
produced by using a set of inference rules.                  dependencies. For all other dependencies which
                                                             are not implied by cardinalities. one must use
   As stated above, the problem of combinatory               Armstrong's inference rules or. in the last.
explosion arises for all constraints involving               questions to the user.
attributes (e.g. functional dependencies.
cardinalities of attributes, keys). To illustrate this          Before requiring questions to get dependencies.
problem. we will consider the case of the                    one can go far. when possible, in the usage of
acquisition of functional dependencies. Suppose              heuristics. For example. we can make the
we have an object type O(Al .... ,An) where for              assumption that. in most applications. there is no
each instance, we allow the attributes to be                 need to search for functional dependencies with
mono valued or multivalued. This can be                      more than four or five attributes in their left
represented by the following rliagram:                       hands ide. This can contribute to considerably
                                                             reduce the combinatory explosion.

                                                                Fianally. one can also use examples to reduce
                                                             the combinatory explosion of functional
                                                             dependencies. More precisely. examples could be
                                                             used to generate some non valide functional
                                                             dependencies; hence a set of unnecessary
                                                             questions to ask to the designer. For example,
                                                             from the following small relational extension.
                                         An

                                                                 NAME             AGE          ADDRESS
   As one can see. cardinalities specify in one
hand the monovalued or multivalued attributes (p
                                                                Dupond              24            Pa::h
cardinalities). and in the other hand wether. for a             Durand              27            Lyon
given attribute value. we can associate one or                  Durand              27            Dijon
many instances of the object (a cardinalities).                 Martin              24            Paris
There is a particular correlation between
cardinalities and functional dependencies. Indeed.
from a set of attributes with their cardinalities. we        one can generate the following non valide
can deduce some functional dependencies. For                 dependencies:
example:                                                        NAME -1-> ADDRESS
If Card(a(A1 ,0»=[1.1 j and Card(p(0,A2))=[1.1j                 AGE ·I->ADDRESS
Then A1->A2.                                                    (AGE, ADDRESS) ·I->NOM
In the same way. from a combination of                          (NAME,AGE) -1-> ADDRESS
cardinalities and functional dependencies. one can           We can notice that we cannot say anything about
derive new cardinalities and son on. For example:            the NAME and AGE.

                                                         9
       To summarize, the problems related to                                 To be more relational. we must remove
    constraint acquisition could be solved by using                       molecular aggregation arcs (rio) and replace them
    different techniques as : (i) interaction rules                       by references and referential constraints. as
    between constraints, (ii) heuristics, (iii) examples                  portrayed in the following schema:
    and finally. when necessary, (iv) questions for the
    user.

    7.3.Transformation of the Semantic
        Model to the Relational Model
                                                                                                           p8'\
                                                                                                         I),~                Sa

    One of the main tasks of the system SECSI is to
    transform a conceptual schema expressed in a
    semantic model into a logical schema expressed in
                                                                                                   +
    a relational model. As it is known. the problem in
    this case is to not loose the semantics between the
    two levels. For this objective. the system
    provides a set of transformation rules which
    conserves the semantics of the conceptual level at
    the logical level. by generating new objects,                         In fact. this transformation depends on ·the
    semantic integrity constraints and views (queries).                   cardinality values. To satisfy the first normal form
                                                                          defInition of relations. the cardinality of one of the
       One simple transformation rule is to represent                     two a)'cs (either r or 0) must be equal to [1.1) or
    any molecular objet with monovalued attributes                        [0,1). References designate the foreign keys of
    (cardinalities of p arcs equal [1,1) or [0.1]) of the                 the related objects.
    semantic model by fIrst normal form relation in the
    relational model.                                                        If the two cardinalities are different of [1.1) or
                                                                          [0,1). we must use another transformation which
                                                                          implies the defInition of an inteimediate object in
                                                                          such a way that one of its cardinalities equals
                           + j PfRSO~(Nm.SlIJ1llJlll.
                        1,11     \
                                                        SurmJIl2, !gel
                                                                          [1.1). That permits to come back to the previous
                                                                          transformation. This case is portrayed by the
                                                                          following schema.
                    ~



       When some attributes are multivalued attributes
    (cardinalities of p arcs equal [IoN) or [O,N)) they
    must be transformed into a molecular object related
    to the fust one. to satisfy the atomicity of values
    which characterizes the fIrst normal form relations.
    Depending on the attributes wether they are related
    by functional dependencies or not, this
    transformation may generate one or several
    molecular objects.
l                                                                             One of the semantic concepts which is not



                    PII.N,-'F
                                                                          supported by the relational model is the
                                                                          generalization hierarchy. Thanks to the inheritance
                                                                          property. this concept can be represented by basic
                                                                          relations (implemented relations) and virtual
                                                                          relations (calculated relations or views). For
                                                                                                                                   I
                                     NlJDe
                                                                          example. in the left handside of the following
                                                                          schema. one can replace the CLIENT object by a
                                                                          virtual relation calculated from the union of
                                                                          AGENT and PRIV ATE]ERS. The suppression
                                                                          of this generic class is immediately followed by
                                                                          the inheritance of its properties (related objects) to
                                                                          its sub-classes (schema of the right handside).




                                                                         10
                                                                         no semantic information is lost during each
                                                                         transformation,
                                                                         no duplicate relation schemas are generated,

                                                                         the number and the complexity of the generated
                                                                         integrity constraints would not be too high.


                                                                        After the mapping process, he normalization
                                                                     process is carried out, as in usual, by generating
                                                                     for each relation its minimal cover of functional
                                                                     dependencies.

                                                                     7.4. The Justification of the Results
                                  l1.1FXf ~ AGf:\1 UPRIVAll:}Eltl    Justifications and explanations are especially
                                                                     emphasized int the expert system area. Such aspect
   But this transformation is not the unique one                     is devoted in one hand to increase the user's
allowed. Indeed, instead of removing the generic                     knowledge in database design, and in other hand
class, one can remove the sub-classes by replacing                   to enhance the credibility of the results obtained by
them with a specific attribute (say "role") which                    the system. Explanations and justifications are
captures the role played by a given client in the                    lillie different. In the fust case, the system has to
specialization hierarchy. The sub-classes AGENT                      explain the concepts, the methodology and the
and PRIVATE]ERS should be calculated by                              design rules which constitute the know ledge base.
restriction of the CLIENT class using the new                        In the second case, the system has to justify
attribute role. This case is illustrated by the                      different representation choices for a given
following schema transformation:                                     database application. SECSI supports these two
                                                                     aspects at different levels.

                                   ,t6                                  At the first level, the system provides
                                                                     explanations for every ununderstood concept or
                                   P{             Y                  question during the interactive design process. If

                                  ~                 ,
                                                   Ide
                                                                     a given concept is not understood by the user (e.g.
                                                                     a cardinality of a functional dependency), SECSI
                                                                     elaborates a text composed by a defmition of the
                                                                     concept and an illustrative example). If a given
                        Dom(1011)'{t1t"I,IIIOI}                      question asked by the system is not understood
                        PRIVATE PERS~IPERSOSllol",·t1IIOI·)          because of its complexity (i.e. contains concepts
                        AGE~I'IPDlSON I Rot,,"IIOI')                 which are not understood) or of its fuzzy form, the
                                                                     system decomposes the given question into easy
                                                                     sub-questions for which the user has only to
    In a given schema, the two preceding                             answer "yes" or "no". This gives the system the
transformations are not both possible. Their                         ability to be used by both expert users and naive
application depends on wether the sub-classes                        users. The documentation about the system use
have specific properties or not. If they have, the                   and the syntactic form of the different languages is
first transfonnation is better, otherwise the second                 also integrated at this level.
one is beller. In practice, this strategy is not so
simple: we can also move up some specific                               At the second level, the system justifies why an
attributes if we introduce a specific integrity                      object of the application is represented as a
constraint the check the null values on this                         normalized relation or as a virtual relation, why a
attribute. To decide for each case which                             given fact is represented by an integrity-constraint
transformation to apply, the expert system may                       whereas the user waits for a relationship, but also
use heuristics based on the number of sub-classes,                   why a given fact does not explicitly appear in the
the number of specific properties of each class and                  results, or why some artificial objects are in the
the complexity of the possible integrity constraints                 results whereas the user description did not
to be generated.                                                     contain them. To answer to a given question, the
                                                                     system elaborates a synthesis of the different
   In general, the mapping process from the                          design rules applied to the concerned object, from
conceptual schema to the logical relational schema                   the analysis of the external description to the
is based on a specific strategy which is built in                    normalized relational schema. This synthesis is
such a way that:

                                                                    11
    based on the different states of the knowledge          design choices and to explain reasoning
    base, which are saved all along the design steps.       alternatives is one of the important feature of
                                                            expert systems; it makes them attractive for
        Explanation and justification is a very complex     complex problems.
    process which needs to remember a large amount
    of data an design rules. Its feasability is                More than just a tool for the design of database
    demonstrated in the first prototype of SECSI            schemas, SECSI can also be used in many other
    [BOUZ 86c], but its current industrial application      ways, as for example, a validation tool of manual
    is very limited.                                        design or as an educational tool in data modeling.
                                                            Combined with a DBMS, SECSI can be
    7.5,     The Incremental Design                         considered as a powerfull prototyping
                                                            environment of relational databases. SECSI is
    As often claimed, the database design task is an        already used to design buiseness applications and
    iterative and long process which cannot be done         technical applications like geographical databases.
    definitely in a short time. Many refinements are
    necessary during a long period of time (several            Three main versions are dedicated to the design
(   weeks or several months depending on the                of databases: Version 1 concerns the production
    complexity of the application). The contribution        of a normalized relational schema from the given
    of an expert system to this problem is to provide       specifications in natural or declarative language.
    some methodological and some recovering                 Version 2 is dedicated to the optimization of
    features which pennit the user to backtrack to any      structures and to the generation of access paths.
    design step and to interrupt his design with the        Version 3 is an extension of the conceptual
(                                                           modeling to the integration of views or to an
    possibility to recover his session a few days later,
    without redesigning his application.                    incremental design of the data base. Each version
                                                            evolves horizontally through a permanent research
       During each recovering session, the user would       in improving the interfaces, the reasoning
    modify his first specification by adding new facts      explanation and the documentation, offering
    or modifying old ones. Hence, two different             among others the graphic edition and layout
    problems arise:                                         possibilities adapted to each type of model used.
    - how to make sure that the specification is            Currently, only the version I has reached the
    consistent,                                             industrial level and then commercialized. The
    - how to integrate these new facts without              other version are in the prototyping level.
    reconsidering with the user all the pevious design
    (especially the set of previous asked questions).          With future development of deductive databases
                                                            and knowledge bases, this approach is more
        To reach this double objective, SECSI stores in     adequate to integrate new concepts and new design
    an extended fact base all the deduced facts from its    rules. We think that only powerful expert systems
    rule base and all the captured fact from the user's     will efficiently handle the complexity introduced
    answers. When a given session is recovered with         by these new research developments.
    some possible specification updates, the system
    generates a new fact base and procceeds through
    its following design steps. Whenever the system         References
    needs to ask a question to the user, the extended
    fact base is used first to derive a possible answer.    [BAT! 85] BATIN! C. and CERI S. "Database Design:
    Then only questions concerning new facts and                  Methodologies. tools and environments" Ranel
    modified facts are effectively asked to the user.             sesion, ACM SIGMOD 1985.



    8. Conclusion:
                                                            [BEER 79) BEERI C. and BERNSTEIN P.A.
                                                                   "Computational problems related to the design of
                                                                  normal Conn relation sct':m1S" ACNf Transactions
                                                                  On Database Systems, marcH 1979.
                                                            [BERN 76J BERNSTEIN Ph. "Synthesizing Third Normal
                                                                                                                      I
    In this paper, we have described an expert system             Fonn Relations from Functional Dependencies"
    approach for database design. As an answer to the              ACM TODS. Voll,N'4, 1976.
    various modeling problems, SECSI relies on the          [BORG 85J BORGIDA A. and WILLIAMSON K.
    most elaborated concepts of databases and the                  "Accomoding Exeptions in DB and Refining the
    most recent techniques of artificial intelligence.             schema by learning from them." VLDB Conf,
                                                                   Stockholm august 85.
    Compared to the existing tools, this approach
                                                            [BOUZ 83J BOUZEGHOUB M. and GARDARIN G. "The
    seems more suitable for data base design in the               design of an expert system for database design-"
    sense that it takes advantage of both theoretical             InU. Worlcshop on New Applications of Databases,
    development and practical experience. Even if                 Cambridge (UK). sept. 83. Published in New
    practically limited, the ability to justify several


                                                           12
        Applications of Databases. Academic Press.            [HAMM 81] HAMMER N. and McLEOD D. "Data Base
      Gardarin & Gelenbe eds. 1984.                                 Description with SDM: A Semantic Data Model"
[BOUZ 84) BOUZEGHOUB M. "MORSE: A Functional                       (ACM TODS V6, N3, Sep 81).
      Query Language and its Semantic Data Model"             [HAYE 84 ] HAYES-ROTH F. "The knowledge based
      IN RIA RR270 and Proceed of 84 Trends and                      Expert Systems: A Tutorial" Computing Revue,
      Application conf on Databases, IEEE-NBS                       Vol 17, N'9, 1984.
      Gaithersburg (USA), 1984.                               [HULL 87J HULL R. & KING R. "Semantic Database
[BOUZ 85] BOUZEGHOUB M., GARDARIN G.,                               Modeling: Survey, Applications and Research
      METAlS E. "Database Design Tools: an Expert                   Issues" ACM Computing Surveys, Vol 19, N'3,
      System Approach" VLDB Conf, Stockholm                         1987.
      august 85.                                              [KENT 79] KENT W. "Limitations of Record-Based
[BOUZ 86a] BOUZEGHOUB M. "SECS!: Un Syst~me                          Wonnation Models." ACM Trans.on Database
        Expert en Conception de Syst~mes d'lnfonnations"           Systems 4,1,1979.
        These de Doctorat de I'Universite Pierre et Marie     [MYLO 80] MYLOPOULOS 1. BERNSTEIN P.A.
     Curie (Paris VI), mars 1986.                                  WONG H.K.T." A language facility for designing
[BOUZ 86b] BOUZEGHOUB M., COMYN I. &                                 database intensive applications" ACM trans. On
     RICHARD D. "Conception de Bases de Donntes:                     Database Systems valIS, nb2, 1980.
        Etat de l'art sur la mod6lisation conceptuelle,       [NILS 82] NILSSON N.J. "Principales of Artificial
        l'int6gration de vues et la conception physique"             Intelligence", Springer_Verlag Berlin Heidelberg
     Rapport MASI-Universitt Paris VI NO 180, 1986.                  New York, 1982.
[BOUZ 86c] BOUZEGHOUB M. & METAlS E.                          [OODB 86] First Workshop on Object Oriented Database
        "L'cxplication du raisonnement et la justification           Systems, IEEE Computer Society Press, 1986.
      des resultats dans Ie syst~me expert SECSl", 2iem       [PECK 88] PECKHAM 1. & MARY ANSKI F. "Semantic
      Colloque International en Intelligence Artificielle            Data Models" ACM Computing Surveys, Vol 20,
      de Marseille, 1986).                                           N'3, 1988.
[BROD 84J BRODIE M., MYLOPOULOS 1., SCHMIDT                   [SMIT 77J SMITH I.M. and SMITH D.C.P. "Data Bases
      Y. "On Conceptual Modelling: Perspectives from                 Abstractions Aggregation and Generalisation ..
        Artificial   Intelligence,   Data    Bases    and          ACM trans. On Database Systems, june 77.
      Programming languages. Springer-Verlag, NY              [TUCH 85] TUCHERMAN 1.., FURTADO A. and
      1984.                                                        CASANOVA M. "A Tool for Modular D"atabase
[BROD 86] BRODIE M. & MYLOPOULOS 1. (editors)                      Design." VLDB Conf, Stockholm 1985.
        "On Knowledge Base Management Systems"                [ULLM 82) ULLMAN J.D. "Prineiples of Database
      Springer Verlag, 1986.                                         Systems" computer science press 1982.
[BROD 87J BRODIE M. "Automating Database Design               [ZAN! 81] ZANIOLO C., MELKANOFF M.M: "On the
      and Development" A Tutorial of the SIGMOD                      design of relational data base schemata", ACM
      Conf., San Francisco 1987.                                     trans. On Database Systems, V6, NI; march 1981.
[BROW 83] BROWN and STOTT-PARKER "LAURA: A
      formal Database Model and her Logical Design
      Methodology." Proceed. VLDB Conf, Florence
      1983.
[BUBE 82) GUSTAFSSON M., KARLSSON T. &
      BUBENKO 1. "A Declarative Approach to
        Conceptuallnfonnation Modeling" in Infonnation
       Systems Design Methodologies Olle, Verijn-Stuart
       editors, North Holland /Pub!. Co, 1982-
[CERI 83] CERI S. (editor) "Methodology and Tools for
       Database Design" North Holland Pub!. Co, 1983.
[CHEN 76J CHEN P.P. "The Entity Relationship Model -
       Toward a Unified View of Data" ACM trans. On
       Database Systems VI, NI, March 1976.
[CODD 70] CODD E.F. "A Relational Model of Data for
       Large Shared Data Banks", Comm ACM,VoIl3,
       N06,1970.
[CODD 791 CODD E.F. "Extending The Database
       Relational Model to capture more Meaning." ACM
       trans. On Database Systems, 4, Dec 79.
[DBEN 84) Database Engineering Revue, "Special Issue on
       Database Design Aids" Vol7, N'4, 1984.
[FAG! 77] FAGIN R. "Multivalued Dependencies and a
       New Normal Form for Relational Databases"
       ACM TODS, Vo12, N'3, Sept 1977.
[GALL 84J GALLAlRE H., MINKER 1., NICOLAS I.M.
       "Logic and databases: a deductive approach" ACM
       Computing Surveys vol 16, NO 2, Iuin 84.
 [GARD 89] GARDARIN G; & VALDURIEZ P.
        "Relational Databases and Knowledge Bases"
        Addison Wesley Pub!. Co. 1989.


                                                             13