=Paper= {{Paper |id=Vol-19/paper-6 |storemode=property |title=Analysis and Design of Data Warehouses |pdfUrl=https://ceur-ws.org/Vol-19/paper5.pdf |volume=Vol-19 |dblpUrl=https://dblp.org/rec/conf/dmdw/Schouten99 }} ==Analysis and Design of Data Warehouses== https://ceur-ws.org/Vol-19/paper5.pdf
                                          Analysis and design of data warehouses

                                                                           Han Schouten
                                                                     Information Systems Dept.
                                                            Technical University - Delft, The Netherlands
                                                                   han.schouten@is.its.tudelft.nl




                                           Abstract                                    1 Introduction
       With the large-scale introduction of the data
       warehouse concept, a new phenomenon has ap-                                     At least two different types of computer applications can
       peared in the field of information systems devel-                               be distinguished in the information management of a
       opment. Facts in a data warehouse – as opposed                                  company: on-line transaction processing (OLTP) and on-
       to those in an operational database – mainly rep-                               line analytical processing (OLAP). The first supports the
       resent immutable, aggregated or otherwise de-                                   primary processes of the company with ordinary transac-
       rived, historical information. The aggregation                                  tion processing systems. The latter concerns management
       level and specific layout of management infor-                                  information about and control over primary processes by
       mation reports often cannot be specified on be-                                 means of management information systems. [Inmon93]
       forehand. Therefore, a data warehouse must be                                   gives several valid reasons for the physical separation of
       designed in such a way, that it provides optimal                                the databases and the applications in these two areas.
       support for aggregation on the fly and for navi-
       gation through aggregation hierarchies, that it al-                             A database filled with facts derived and aggregated from
       lows easy access to time series and that it enables                             an ordinary operational database for the sole purpose of
       reporting in any desired layout.                                                policy making is called a data warehouse. The facts in a
                                                                                       data warehouse may represent key performance indicators
       This publication describes the outline of two
                                                                                       to be used in a balanced score card approach to evaluate
       complementary methods for the analysis of data
       warehouse relations; one simple and the other ad-                               the operation of a business. There are sound methods for
       vanced. The simple method exploits the knowl-                                   the analysis and design of ordinary transaction processing
       edge contained in an ordinary relational schema.                                systems. A comparable method for the development of
       The advanced method is based on the analysis of                                 management information systems remains to be invented.
       derivation rules. Subsequently, the design of data                              Inmon deals with many phenomena related to data ware-
       warehouses based on these methods is investi-                                   house design, but leaves the ‘how’ of it completely un-
       gated. Special attention has been given to the ac-                              touched. This article is an attempt to devise a proper way
       tuality of data warehouses that contain historical                              of thinking and working to achieve this goal.
       information, to the transitivity of derivations, to
       the navigation through aggregation hierarchies                                  When developing a transaction processing system, nor-
       via so-called drill paths and to the maintenance of                             mally a distinction is made between the analysis of the
       various aggregation levels within a single data                                 ideas of the people in and around an organisation and the
       warehouse relation.                                                             design of the way in which the operation of these ideas is
                                                                                       supported with information technology. This distinction
_______________
                                                                                       minimises the complexity of information analysis and
The copyright of this paper belongs to the paper’s authors. Permission to copy         guarantees generally applicable and reusable definitions.
without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage.
                                                                                       The information analysis that always precedes the design
                                                                                       of information systems results in definitions for elemen-
Proceedings of the International Workshop on Design and                                tary concepts, such as facts, constraints, derivations,
Management of Data Warehouses (DMDW'99)                                                norms and prescriptions for desirable human behaviour.
Heidelberg, Germany, 14. - 15. 6. 1999
                                                                                       These concepts represent the whole communicable expe-
                                                                                       rience of people in and around an organisation. These
(S. Gatziu, M. Jeusfeld, M. Staudt, Y. Vassiliou, eds.)                                concepts form the very basis of any conceivable applica-
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-19/                  tion in the form of an information system. The definition




Han Schouten                                                                                                                               5-1
of these concepts is totally independent of some specific       sary and sufficient for the analysis and formulation of
application. The designer of an information system deter-       derivation rules.
mines the eventual form in which the facts and the guid-
                                                                The next chapter provides two theories and two methods
ance for the behaviour of human workers will be pre-
                                                                for the analysis of data warehouse relations. The first is
sented. The prescriptive qualities of a method for the
                                                                our workhorse that produces the bulk of our data ware-
analysis and the design of information systems guarantee
                                                                house definitions. The second is our fancy horse and con-
reproducibility, verifiability and accountability.
                                                                cerns the analysis of derivation rules in conceptual terms,
Management information – as opposed to operational in-          i.e. independent of the shape in which the eventual deri-
formation – is always aggregated and, therefore, derived        vation will be presented. Armed with the result of a fully-
information. Updating management information is rarely          fledged information analysis, a data warehouse can be
necessary. Redundantly storing facts is, therefore, admis-      designed. The design consists in the structure of the data
sible and sometimes – for better performance – even nec-        warehouse. We will devote special attention to evolving
essary. Redundancy often concerns the registration of           objects in data warehouses, to the transitivity of deriva-
management information at various aggregation levels.           tion, to drill paths and to aggregation levels. In the chapter
Well-specified information requirements will often trans-       called ‘Discussion’ various potentials and impracticalities
late into moderate levels of redundancy. On the other           of the approach presented in this article will be consid-
hand, the automated support for policy development in its       ered.
broadest sense often results in considerable, yet necessary
levels of redundancy and extraordinarily large databases.       2 The analysis of derivations
In this sense, the structure of a data warehouse more or
less depends of its intended usage.                             2.1 Concepts
The information in a data warehouse can be presented in a       Natural language oriented information analysis consists in
countless variety of different reports. Facts of various        a set of prescriptions for the decomposition of natural
types can be plotted one against the other. When such           language sentences into phrases and the classification and
facts occur within some aggregation hierarchy, policy           qualification of these phrases. Analysing sentences such
analysts can pass through this hierarchy from the highest       as:
to the lowest level (drill-down). At a given aggregation
level, the analyst can investigate each aggregation class
individually. The data warehouse design should, therefore,            (1)     the person with identification number 12038
optimally facilitate drill-down.                                              is resident of The Netherlands
                                                                      (2)     the person with identification number 12039
The distinction between analysis and design equally ap-                       is resident of Belgium
plies to the design of management information systems.
During analysis, more attention than with ordinary infor-
mation systems will have to be devoted to derivation            produces the following fully classified and qualified re-
rules. In a data warehouse, each relation represents a col-     sult:
lection of derived facts grouped in a particular way. The
analysis consists in the detection of derivable facts and the
applicable derivations. The design consists in grouping             ST1       Nationality
                                                                   SFT1        is resident of
derivable facts into data warehouse relations. The theory
                                                                                                  
of grouping has been documented in [Bakema94]. In the                 (1)             12038           The Netherlands
present article, the way of grouping derived facts into da-           (2)             12039               Belgium
tawarehouse relations is necessarily somewhat less ortho-
dox. Analysis and design according to the prescriptions             ST2       Person
described in this article yield an optimally structured data       SFT2       
warehouse. What we consider optimal, largely depends of                                                  identifies a person
the intended usage: standard report or broad analysis.               (3)               12038
                                                                     (4)               12039
Success in the analysis of management information sys-             OFT2       the person with 
in charge. A precise and complete prescription for infor-
mation analysis is given in [Nijssen94]. Full treatment of          ST3       Country
                                                                   SFT2       
this subject exceeds the scope of this article. Apart from                                          identifies a country
detailing the notions of concept, identity and meaning, we           (5)            The Netherlands
will not dwell on these matters. These notions are neces-            (6)                Belgium
                                                                   OFT3       




Han Schouten                                                                                                            5-2
   NFT1
   NT1        identification
              Identification number <…>                        CT2       Person
                                                                ST2      Person
   NT2        Country name                                     SFT2      
   NFT2a      the country name <…>                                                              identifies a person
   NFT2b      <…>                                                (3/3)              12038
                                                                 (4/4)              12039
                                                               OFT2      the person with
The prescription that yields this result is documented in                
[Nijssen94] and [Schouten94]. Moreover, [Nijssen94]
                                                                ST3      Man
provides prescriptions for recognition and modelling of        SFT3      
various types of constraints and cases of generalisation,                          identifies a person that is a man
specialisation and synonymy. Although the method may             (7/4)              12039
help to recognise and analyse prescriptions for desired        OFT3      the man with
human behaviour, it does not provide any prescription for                
it. In this article, modelling derivation rules will receive    ST4      Woman
full attention. Only if all prescriptions are followed me-     SFT4      
ticulously, the analysis result will be reproducible and                        identifies a person that is a woman
may survive the severest scrutiny.                               (8/3)              12038
                                                               OFT4      the woman with
Apparently, a sentence of the type ‘Person’ can be ex-                   
pressed in two different ways: as a sentence formulation        ST5      Person
according to ‘SFT2’ and as an object formulation ac-           SFT5      
cording to ‘OFT2’. The same observation applies to sen-                                      identifies a person
tences of the type ‘Country’ and the formulations ‘SFT3’         (9/4)        1482.63.152
and ‘OFT3’. Many people wonder if objects are facts and        OFT5      the person with
the other way around. Indeed, a sentence and the reference               
to an object embedded in a sentence are just different
manifestations of a single concept in someone’s mind.          CT3       Country

A human being is capable of communicating part of what          ST6      Country
he is aware of. The unit of information in the human con-      SFT6      
science is called a concept. A concept is indivisible, has                                   identifies a country
                                                                (10/5)       The Netherlands
an identity of its own and has at least one meaning. By
                                                                (11/6)         Belgium
necessity, information analysis restricts itself to communi-   OFT6      
cable concepts and consists in the recognition of the iden-
tity, all meanings and all manifestations of concepts that     CT4       Point in time
are worth considering in a given context. Among the ra-
                                                                ST7      Now
tional and communicable concepts actually we only find
                                                               SFT7