=Paper= {{Paper |id=Vol-39/paper-5 |storemode=property |title=MAC: Conceptual data modeling for OLAP |pdfUrl=https://ceur-ws.org/Vol-39/paper5.pdf |volume=Vol-39 |authors=A. Tsois,N. Karayiannidis,T. Sellis |dblpUrl=https://dblp.org/rec/conf/dmdw/TsoisKS01 }} ==MAC: Conceptual data modeling for OLAP== https://ceur-ws.org/Vol-39/paper5.pdf
                                   MAC: Conceptual Data Modeling for OLAP
                               Aris Tsois                                                                  Nikos Karayannidis
                        Knowledge and Database                                                          Knowledge and Database
                       Systems Laboratory, NTUA                                                        Systems Laboratory, NTUA
                        atsois@dblab.ece.ntua.gr                                                        nikos@dblab.ece.ntua.gr

                                                                      Timos Sellis
                                                      Knowledge and Database Systems Laboratory
                                                    Department of Electrical and Computer Engineering
                                                     National Technical University of Athens (NTUA)
                                                             Zografou 15773, Athens, Greece
                                                                 timos@dblab.ece.ntua.gr


                                                                                           The typical data flow path involves the gathering of data
                                                                                           from various sources into data warehouse systems and
                                             Abstract                                      then the usage of those data in the multidimensional
                                                                                           analysis process through the use of OLAP applications.
In this paper we address the issue of conceptual modeling                                  Multidimensional analysis mainly involves the
of data used in multidimensional analysis. We view the                                     computation of aggregated information using a large
problem from the end-user point of view and we describe                                    volume of detailed data. The information is analysed
a set of requirements for the conceptual modeling of real-                                 based on its detailed or derived properties (dimensions)
world OLAP scenarios. Based on those requirements we                                       using an almost static business model (hierarchies). The
then define a new conceptual model that intends to                                         reader is referred to [ChDa97] [Inmo96] [Kimb97]
capture the static properties of the involved information.                                 [Olap97] for an overview of Data Warehousing and
In its definition we use a minimal set of well-understood                                  OLAP. In the following we will assume the reader to be
OLAP concepts like dimensions, levels, hierarchies,                                        familiar with the terminology used in those areas.
measures and cubes. The central concept of the model is
the Multidimensional Aggregation Cube (MAC), which                                         A fundamental issue faced by vendors of OLAP
gives a broad and flexible definition to the notion of a                                   applications as well as by researchers in the OLAP
multidimensional cube. We evaluate our model against                                       domain is the modeling of data. The well-studied
other existing multidimensional models and show that                                       conceptual and logical models used in other database
MAC offers a unique combination of modeling skills. Our                                    areas, like the E/R model or the relational model, do not
main contribution is the definition of the basic concepts of                               seam to be sufficient for the OLAP case
our model; although the set of requirements and the                                        ([Kimb96][TBC99][S++98][Kimb97]). Vendors have
evaluation of all related models against those                                             adopted various models, while standardization bodies and
requirements represent an additional result.                                               researchers have developed and studied additional
                                                                                           models. All those models share some common concepts
                                                                                           like measures or hierarchies but there is still no formally
1         Introduction                                                                     defined and widely accepted (logical or conceptual) data
                                                                                           model. As proved by the history of the relational model a
In the last years On-Line Analytical Processing (OLAP)                                     common data model is the key for the collaboration and
[Codd93] has become a major research area in the                                           the rapid progress in an area.
database community [ChDa97]. The OLAP research is
tightly coupled with the research in data warehouses,                                      In this paper we address the problem of modeling real-
which are considered to be the information sources based                                   world OLAP scenarios at the conceptual level. The
on which On-Line Analytical Processing is performed.                                       current common practice is to use the well-known E/R
                                                                                           model [BCN92] and then to annotate the schema with
    The copyright of this paper belongs to the paper’s authors. Permission to copy         any additional OLAP specific information. Still, various
    without fee all or part of this material is granted provided that the copies are not   authors argue that the E/R model is not appropriate for
    made or distributed for direct commercial advantage.                                   OLAP scenarios since concepts like dimensions,
    Proceedings of the International Workshop on Design and                                hierarchies and cubes can only be partially represented.
    Management of Data Warehouses (DMDW'2001)                                              As a result, two publications ([TBC99] and [S++98])
    Interlaken, Switzerland, June 4, 2001                                                  already proposed extensions to the E/R model for the
    (D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.)                              multidimensional paradigm. Their approach is mainly
    http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-39/
                                                                                           suitable for the ODS (operational data store) part of data



A. Tsois, N. Karayannidis, T. Sellis                                                                                                              5-1
warehouses as they concentrate on the representation of      into the data warehouse from the OLTP system. For each
the source-detailed data.                                    sales transaction the OLTP system records the following
                                                             information:
In this paper we consider a slightly different approach,          • The date of the transaction.
where the information used in multidimensional analysis           • The cashier ID where the transaction took place.
is the primary target of our modeling concepts. The               • The ID of the products being sold.
information used in such an analysis process is mainly            • The customer ID.
aggregated data at various aggregation levels, or                 • The sales price for each product being sold.
combination of such levels. Furthermore, the dimensions,
the particular aggregation levels as well as the various     We assume that all the above information is somehow
hierarchies defined on dimensions represent information      stored in the data warehouse. We are not going to talk
used during the analysis.                                    about the design process of the data warehouse, neither
                                                             about how data is loaded from the OLTP system since our
In order to define a useful conceptual model we first        model does not address those issues. The MAC model,
investigate a set of example queries and derive a list of    which we propose, is mainly suitable for the users of the
modeling requirements. Based on those requirements we        data warehouse, the persons that analyze the information
then define the concepts of our model and their semantics.   through the use of an OLAP application.
The central concept of our model is the Multidimensional
Aggregation Cube (MAC), which is equivalent to an n-         As described by a plethora of OLAP papers [Mendel], the
way relationship relating measure values to a set of         multidimensional analysis is mainly based on drill-down,
dimension values. A careful definition of dimension          roll-up, slice and dice operations that are performed on a
values allows a single MAC to represent measure values       multidimensional view of data. Measures values are
of arbitrary aggregation levels. This is an essential        selected and aggregated using various predefined
difference with respect to the other conceptual models and   dimensions, dimension levels and hierarchies. The
can be used to simplify the schema of the various OLAP       dimension levels, the aggregation paths defined by
scenarios. An additional novelty of our model is the         hierarchies, the dimensions and the measures are the main
explicit modeling of analysis paths, a feature quite         concepts used in the analysis. For our example scenario
important for OLAP applications.                             assume that the analysis is performed on the sales price
                                                             (price of sold items) using the hierarchies defined in
Generally      speaking,    the    concepts     used   in    Figure 1.
multidimensional analysis are mapped directly to
corresponding concepts of the MAC model. As a result,              $QDO\VLV 3DWKV                                                         'LPHQVLRQV

the proposed MAC model allows OLAP scenarios to be                                   5HVLGHQFH     5HVLGHQFH     5HVLGHQFH
                                                              3 &XVWRPHU
modeled in a natural and straightforward way.                                          $UHD           &LW\        5HJLRQ

Furthermore, its abilities to model complex dimensions                                                                                          &OLHQW

and hierarchies and the broad definition of cubes makes it    3 &XVWRPHU            3URIHVVLRQ

suitable for highly complicated OLAP applications.
                                                              3     'DWH              0RQWK         _s*1.1
there is a functional dependency among them. In our             Year.1999   Store_Area._x     ALL         ALL       _s
example, cell_A refers to sales measured per Product and
Cashier while cell_C refers to sales measured per Brand                          Table 2: The query Q4
and Cashier. Those cells, although defined at different
levels of granularity, can be part of the same cube. This is    An additional advantage of our approach is that the most
due to the definition of the dimension domain, which            typical OLAP operations: drill-down, roll-up, slice and
states that all members of all participating levels are valid   dice are translated to simple selection queries on the cube.
dimension values.                                               For example, a drill-down on the results of Q4 to the
                                                                Product level can be expressed by simply replacing Brand
We believe that the above property of our cubes is crucial      with Product at the Item coordinate of Table 2.
for the compact and intuitive representation of
multidimensional data. As explained in the section 2, the       Nevertheless, there is an important argument against this
OLAP users usually handle data defined over various             modeling approach. The cube can represent cells with
levels of granularity. Queries may impose selection             measures that are functionally dependent but it cannot
conditions on various dimension levels and may require          guaranty their consistency. The measure of cell_C could




A. Tsois, N. Karayannidis, T. Sellis                                                                                     5-8
have been 50 in Table 1 without violating in any way the      During the last few years a plethora of multidimensional
definition of the cube C1. Still, from a semantically point   data models for data warehouses and OLAP have been
of view this value would be inconsistent with the values      proposed. A comparison of some of them can be found in
of cell_A and cell_B. This inability to guaranty              [VaSe99] and [SBH99]. We are currently aware of 12
consistency comes from total absence of the involved          models that have been published in research papers. Most
aggregation functions. Each measure of a cube is              of them are logical data models and only few ([TBC99]
semantically related to some aggregation. In the above        [S++98]) can be considered as purely conceptual. Each of
examples the measure Sum of sales is obviously related        those models has taken a somehow different modeling
to the SUM aggregation function. Still, the cube definition   approach ranging from a simple global table to
does not include this relationship making impossible any      sophisticated object classes.
consistency checking.
                                                              In order to demonstrate that our model is not ‘yet another’
We have intentionally chosen not to include the               multidimensional model we evaluated all 12 published
aggregation functions in the MAC model for a number of        models against the requirements described in section 2.
reasons. We decided not to define operations, not to          This may not be fair for the purely logical models since
include aggregation functions and not to cover any other      the requirements represent conceptual modeling needs.
functional aspects of data in the current model because       Still, the evaluation is done only to demonstrate that none
those aspects are orthogonal to the defined concepts. The     of the models published so far has the expressive power
model can be extended to include aggregation functions,       of MAC. In fact the evaluation shows that one
consistency-checking algorithms and operations without        requirement is not satisfied by any of the models and even
changing any of the existing MAC concepts. Furthermore,       for the remaining requirements there is no model
a separate functional model can cover those needs. It         satisfying all of them. Since our model can satisfy all the
seams that a separate functional model is more suitable       requirements of the evaluation we argue that our proposal
because each application domain requires its own specific     is an improvement to the existing status.
aggregation functions, operations and consistency
semantics.                                                    The requirements of our evaluation are presented in the
                                                              following list. Each requirement states what the model
For example in some applications it might be acceptable       should be capable of representing within a schema.
to view and analyze data that includes inconsistent parts     1. Levels within dimension (even in the form of simple
(maybe due to missing, incomplete or wrong                         attributes).
information). A separate functional model can be tailored     2. Grouping/classification relationships among levels.
to cope with operations on such data. By forcing              3. Many-to-many type of grouping/classification
consistency at the data modeling level we would make               relationships.
our model inappropriate for those applications.               4. N-way grouping/classification relationships that
                                                                   relate n dimension levels.
4    Related Work                                             5. Grouping/classification relationships that do not
                                                                   require total participation of the involved levels.
Modeling multidimensional data is not an OLAP specific        6. Analysis paths.
issue. In the database community, several research areas      7. Multiple measures as part of one concept.
like    statistical  databases,  scientific   databases,      8. Measures defined at any granularity level – for each
geographical databases and temporal databases deal with            involved dimension.
multidimensional data. Still, each of these areas has         9. Measure values defined over various granularity
particular modeling needs and has developed specialized            levels as part of one concept.
multidimensional data models. The area closest to data        10. Measure values characterized, for some of its
warehouses and OLAP is the statistical database area               dimensions, by more than one dimension level
[Shos97] where several multidimensional models have                members.
been proposed [OOM85], [RR91]. In fact those models
where proposed long before the appearance of the term         Note that the aggregation level of a measure value is the
“OLAP” [Codd93].                                              lowest dimension level that can be used to characterize
                                                              this value. Also, an analysis path is a lattice of
In the data warehouse and OLAP area the first                 grouping/classification relationships defined on a set of
multidimensional data models where developed by               levels. This lattice prevents the user from performing a
product vendors as the research in the OLAP domain has        meaningless (according to the schema designer) drill-
followed the evolution of industrial products. Vendors as     down or roll-up operation to an arbitrary -outside the
still using and developing their own data models. Also,       lattice- level of the dimension.
various standardization bodies have defined their own
models [Meta97] [Olap97] [TPC99]. Due to space
limitations we are not going to discuss any of the
previously referenced models but refer the reader to
[VaSe99] for an overview and comparison of those
models.



A. Tsois, N. Karayannidis, T. Sellis                                                                                 5-9
             1    2    3   4    5      6   7   8   9    10     [AGS97]) do so by providing the appropriate
[AGS97]                  9 9                                   transformation operations. So, this requirement does not
[CaTo98]     9 9           9   9                               mean that dimensions and measures are represented in the
[DaTh97]     9           9 9                                   same manner by the model. We believe that our model
[GoRi98]     9 9     9   9 9   9                               can easily support this requirement through the definition
[GyLa97]     9           9 9 9 9                               of the proper transformation operations (initially called
[Lehn98]     9           9 9   9                               Push and Pull by [AGS97]).
[LiWa96]     9             9 9 9
[PeJe99]     9 9 9 9 9     9 9 9                               5    Conclusions
[S++98]      9 9 9   9   9                                     In this paper we addressed the problem of conceptual
[TBC99]      9 9 9 9 9   9                                     modeling of data used in multidimensional analysis. We
[Truj99]     9           9 9                                   presented a set of modeling requirements through the use
[Vass98]     9 9           9                                   of examples and with those requirements in mind we
MAC          9 9 9 9 9 9 9 9 9 9                               defined a new conceptual data model, named MAC. The
                                                               proposed model uses concepts familiar to OLAP users,
   Table 3: Evaluation of multidimensional models              like dimensions, levels, paths, measures and cubes. Those
                                                               concepts are properly defined in order to allow modeling
The result of our evaluation is shown in Table 3. Note that    of complicated real-world scenarios. Our evaluation and
some of the models ([GyLa97], [Truj99], [AGS97])               comparison to previously published models showed that
represent relationships among levels using user-defined        MAC offers a unique combination of modeling skills. Our
functions, which are then used in operations. Also, other      model is the first user-centric conceptual model to define
models ([LiWa96], [Lehn98], [DaTh97]) leave the                cubes as multi-granularity relationships making both
relationships to be defined by the particular data instances   schemas and queries much more simple and intuitive. The
and provide no schema definition for them. In both cases       model defines dimension levels, drilling relationships,
we considered that the requirements 2,3,4,5 involving          dimension paths and dimensions as first-class and
grouping/classification relationships as part of the schema    standalone concepts, making it possible to share those
are not met.                                                   concepts among multiple cubes. Furthermore, the
                                                               complexity of drilling relationships and the usage of
The requirement not met by any of the models is the            analysis paths in the definition of dimensions are
concept of an analysis path. We believe that this              additional novelties of our model taking a step beyond the
information is an important structural part of the             classical multiple hierarchies. Finally, note that the
dimension design and it should be represented at the           definition of dimension domains implicitly represents a
conceptual level.                                              straightforward method for semantic query optimization
                                                               at both the schema and the instance level.
Although our model seams to be able to model a broader
range of OLAP scenario than other proposed models,             Future work includes the definition of MAC as an
there are a few requirements mentioned in several papers       extension to the E/R model and the research of a suitable
([Codd93], [PeJe99], [TBC99] [GyLa97]), which are not          logical model on which concepts of our model can be
satisfied by MAC. In our opinion, the most important of        mapped. We also plan to define a functional model that
such requirements is the support for correct aggregation       will include aggregation functions, derived measures, and
of data. As described in [LeSh97] the measures cannot          operations and will define the summarizability [LeSh97]
always be consistently aggregated by an arbitrary              of measures as well as other consistency rules.
aggregation function. In order to provide support for
correct aggregations the model must include additional         Acknowledgements
information         regarding         measures       and       This work has been partially funded by the European’s
grouping/classification relationships. Our model does not      Union Information Society Technologies Programme
include such additional information since we believe that      (IST) under project EDITH (IST-1999-20722).
this kind of information, as well as information about
aggregation functions and derived measures, can be             References
described by an independent functional model which will
supplement MAC.                                                [AGS97] R. Agrawal, A. Gupta, S. Sarawagi: Modeling
                                                                  Multidimensional Databases. Proc. of the ICDE 1997.
A second important requirement stated by various papers
([GyLa97], [PeJe99], [AGS97]) is the need for symmetric        [BCN92] C. Batini, S. Ceri, S. Navathe: Conceptual
treatment of dimensions and measures. It is important to          Database Design. Benjamin/Cummings, 1992.
note that what the authors finally mean by symmetric
treatment is the ability to transform a measure into a         [CaTo98] L. Cabibbo, R. Torlone: A Logical Approach to
dimension and the other way around. All models claiming            Multidimensional Databases. Proc. of the EDBT
to support this requirement ([GyLa97], [PeJe99],                   1998.




A. Tsois, N. Karayannidis, T. Sellis                                                                                5-10
[ChDa97] S. Chaudhuri, U. Dayal: An overview of Data       [PeJe99] T. B. Pedersen, C. S. Jensen: Multidimensional
   Warehousing and OLAP technology. ACM SIGMOD                 Data Modeling of Complex Data. Proc. of the ICDE
   Record, 26(1), March 1997.                                  1999.

[Codd93] E. F. Codd: Providing OLAP to user-analysts:      [RR91]   M. Rafanelli, F.L. Ricci: A functional model
    An IT mandate. E.F. Codd and Associates, 1993.            for macro-databases. SIGMOD Record, 20(1), March
                                                              1991.
[DaTh97] A. Datta, H. Thomas: A conceptual Model and
   an algebra for On-Line Analytical Processing in Data    [SBH99] C. Sapia, M. Blaschka, G. Höfling: An
   Warehouse. Proc. of the WITS 1997.                         Overview of Multidimensional Data Models for
                                                              OLAP. Technical Report 1999.
[G++96] J. Gray et al.: Data Cube: A Relational               http://www.forwiss.tu-muenchen.de/
   Aggregation Operator Generalizing Group-By, Cross-
   Tab and Sub-Totals. Proc. of the ICDE 1996.            [S++98] C. Sapia, M. Blaschka, G. Höfling, B. Dinter:
                                                              Extending the E/R model for the Multidimensional
[GoRi98] M. Golfarelli, S. Rizzi: A Methodological            Paradigm. Proc. of the DWDM 1998.
    Framework for Data Warehouse Design. Proc. of the
    DOLAP 1998                                            [Shos97] A. Shoshani: OLAP and statistical databases:
                                                              Similarities and differences. Proc. of the PODS 1997.
[GyLa97] M. Gyssens, L.V.S. Lakshmanan: A
    Foundation for Multi-Dimensional Databases. Proc. [TBC99] N. Tryfona, F. Busborg, J. G. B. Chistiansen:
    of the VLDB 1997.                                         starER: A Conceptual Model for Data Warehouse
                                                              Design. Proc. of the DOLAP 1999.
[Inmo96] W.H. Inmon: Building the Data Warehouse.
    John Wiley & Sons, 1996.                              [TPC99] TPC: TPC Benchmark H and TPC Benchmark
                                                              R. Transaction Processing Council. June 1999.
[Kimb96] R. Kimball: The Data Warehouse Toolkit. John         http://www.tpc.org/
    Wiley & Sons, 1996.
                                                          [Truj99] J. Trujillo: The GOLD model: An Object
[Kimb97] R. Kimball: A Dimensional Modeling                   Oriented     multidimensional      data     model  for
    Manifesto. DBMS, August 1997.                             multidimensional databases. Proc. of the ECOOP
                                                              1999.
[Lehn98] W. Lehner: Modeling Large Scale OLAP
    Scenarios. Proc. of the EDBT 1998                     [VaSe99] P. Vassiliadis, T. Sellis: A Survey of Logical
                                                              Models for OLAP Databases. SIGMOD Record 28(4),
[LeSh97] H. Lenz, A. Shoshani: Summarizability in             Dec. 1999.
    OLAP and Statistical Databases. In Proc. of the
    SSDBM 1997.                                           [Vass98] P. Vassiliadis: Modeling Multidimensional
                                                              Databases, cube and cube operations. Proc. of the
[LiWa96] C. Li, X. S. Wang: A Data Model for                  SSDBM 1998.
    Supporting On-Line Analytical Processing. In Proc. of
    the CIKM 1996.

[Mendel] A. O. Mendelzon: Data warehousing and
   OLAP: a research-oriented bibliography .
    http://www.cs.toronto.edu/~mendel/dwbib.html.

[Meta91] Metadata Coalition: Meta Data Interchange
   Specification. (MDIS Version 1.1), August 1997.

[Olap97] OLAP Council: OLAP and OLAP Server
    Definitions.                                    1997
    http://www.olapcouncil.org/reasearch/glossary.htm

[OOM85] G. Ozsoyoglu, M. Ozsoyoglu, F. Mata: A
   Language and a Physical Organization Technique for
   Summary Tables. Proc. of the SIGMOD 1985.




A. Tsois, N. Karayannidis, T. Sellis                                                                          5-11