=Paper=
{{Paper
|id=Vol-39/paper-8
|storemode=property
|title=Meta Cube-X: An XML Metadata Foundation for Interoperability Search among Web Data Warehouses
|pdfUrl=https://ceur-ws.org/Vol-39/paper8.pdf
|volume=Vol-39
|authors=N. Thanh Binh,A M. Tjoa,O. Mangisengi
|dblpUrl=https://dblp.org/rec/conf/dmdw/BinhTM01
}}
==Meta Cube-X: An XML Metadata Foundation for Interoperability Search among Web Data Warehouses==
MetaCube-X: An XML Metadata Foundation for
Interoperability Search among Web Warehouses
Nguyen Thanh Binh, A Min Tjoa Oscar Mangisengi
Institute of Software Technology, Dept. of Computer Science
Vienna University of Technology National University of Singapore,
Favoritenstrasse 9-11/188, S16 Level 5, 3 Drive 2,
A-1040 Vienna, Austria Singapore 117543
{binh,tjoa}@ifs.tuwien.ac.at oscar@comp.nus.edu.sg
Abstract [Wan97]. However, each approach presents its own view
of multidimensional analysis requirements, terminology
OLAP (Online Analysis Processing) applications and formalism. Consequently, there is no commonly
have very special requirements to the underlying accepted formal multidimensional data model established.
multidimensional data that differs significantly Such a model is necessary to serve as a foundation for
from other areas of application (e.g. the existence standardization and future research. This has been the
of highly structured dimensions). In addition, main motivation for us to invest and focus on a new
providing access and search among multiple, multidimensional data model that is suitable for OLAP
heterogeneous, distributed and autonomous data applications. Since these applications have very special
warehouses, especially web warehouses, has requirements to the underlying multidimensional data that
become one of the leading issues in data differ significantly from other areas of application (e.g.
warehouse research and industry. This paper
the existence of highly structured dimensions). In this
proposes MetaCube-X to provide interoperability
context, the concepts of MetaCube have been introduced
search among Web data warehouses.
in [Ngu00].
On the other hand, the World Wide Web is a distributed
1 Introduction global information resource that contains a large amount
The concept of On-Line Analytical Processing (OLAP), of information placed on the web independently by
first introduced by [Cod93] to enable business decision different organizations. Therefore, related information
makers to work with data warehouses, supports dynamic may appear across different web sites. Furthermore, Web
synthesis, analysis, and consolidation of large volumes of warehousing is a novel and very active research area,
multidimensional data. OLAP tools are frequently used as which combines two rapidly developing technologies, i.e.
front-end in data warehouse environments. They allow the data warehousing and Web technology depicted in figure
interactive analysis of multidimensional data. Independent 1 [Mat99] and provides a suitable approach to
from the different possible architectures concerning data systematically discover and acquire strategic information
storage and query processing, they all present the data to from the Web. This information may be identified,
the user in a multidimensional data model and queries are cataloged, managed and then accessed by the end users
formulated using the multidimensional paradigm. The [Mat99], via search engines or some Web information
research community for different areas of applications has management system.
proposed several formal multidimensional metadata
models and corresponding query languages [Agr95], Data Warehousing
contributes:
[Bla98], [Cab98], [Cha97], [Eck00], [Gra96], [Gys97], Data management
warehousing approach
[Leh98], [Li96], [Man99], [Ngu00], [Ola97], [Vas98],
Web Warehousing
The copyright of this paper belongs to the paper’s authors. Permission to copy
without fee all or part of this material is granted provided that the copies are not
The Web
made or distributed for direct commercial advantage. contributes:
Web technology
Proceedings of the International Workshop on Design and text and multimedia
managament
Management of Data Warehouses (DMDW'2001)
Interlaken, Switzerland, June 4, 2001
(D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.)
Figure 1: The hybrid of Web warehousing systems.
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-39/
N.T. Binh, A M. Tjoa, O. Mangisengi 8-1
To provide the user with a powerful and friendly query namely MetaCube in [Ngu00], the concept of which is a
mechanism for accessing information on the web, the generalization of other former multidimensional data
critical problem is to find an effective way to build web models, i.e. relational and multidimensional OLAP
data models. The key objective of our approach is to models. First, the MetaCube model is able to represent
design and implement a web warehousing system based and capture natural hierarchical relationships among
on MetaCube-X protocol given in Figure 2, which members within a dimension as well as the relationships
provides access and search capability among multiple, between dimension members and measure data values.
heterogeneous, distributed and autonomous web Hereafter, dimensions and data cubes with their operators
warehouses. The MetaCube-X is an XML (eXtensible are formally introduced. Each MetaCube is associated
Markup Language) instance of the MetaCube concept with a set of groups each of which contains a subset of the
[Ngu00] for supporting data warehouses federation. As a MetaCube domain, which is a poset of data cells.
result, the MetaCube-X provides a neutral syntax for Furthermore, MetaCube operators (e.g. jumping,
interoperability among different Web warehousing rollingUp and drillingDown) are defined in a very elegant
systems. In this concept, we define a global MetaCube-X manner.
stored in a server and local MetaCube-Xs stored in local
[Gmo99] presents distributed and parallel computing
Web warehouses.
issues in data warehousing. [Alb98a], [Alb98b], [Bau97],
The remainder of this paper is organized as follows. In [Hüm00], [Leh98] present the prototypical distributed
section 2, we discuss about related works. Then in section OLAP system developed in the context of the CUBE-
3, we introduce MetaCube-X: from conceptual data model STAR project. [Hüm00] presents distributed data
to the XML implementation. The paper concludes with warehousing based on the Common Object Request
section 4, which presents our current and future works. Broker Architecture (CORBA).
A variety of approaches for interoperability have been
proposed, aiming at different levels of integration in
2 Related works related to federated database management systems
[She98]. According to [Gar99], data federations will be
Our work is related to research within the area of metadata very important and XML will support for communicating
for multidimensional databases, federated database databases, and integrating data over the Internet. The
systems, mediation between multiple information systems, concept of mediator introduced by [Wie92].
especially distributed data warehousing systems.
In this paper we propose MetaCube-X that is an XML
The concept of multidimensionality (or n-dimensionality) instance of MetaCube concepts to provide a framework
of these datasets, and in particular, of aggregate data, as for supporting data warehouses federation.
well as the concepts of dimension (often called category
attribute, descriptive variable, character, etc.) and of
measure (often called summary attribute, quantitative
data, variable, etc.) has been already discussed [Agr95], 3 The Concept of MetaCube-X
[Bla98], [Cab98], [Cha97], [Eck00], [Gra96], [Gys97],
[Leh98], [Li96], [Man99], [Ngu00], [Ola97], [Vas98],
3.1 MetaCube-X Protocol
[Wan97]. Recently, in literature, many authors proposed
multidimensional data models and query languages. Gray Figure 2 shows the architecture of MetaCube-X to provide
et al. in [Gra96] proposed the data cube operator as abilities for interoperability search among web-data
extension to SQL, which generalized the histogram, cross- warehouses. The architecture of MetaCube-X systems
tabulation, roll-up, drill-down, and sub-total constructs consists of clients, server protocol, i.e. MetaCube-X
found in most report writers. In [Li96] the authors repository, local MetaCube-X, and local data warehouses.
formalized a multidimensional data model for OLAP, and Thus, the MetaCube–X protocol is to provide services and
developed an algebra query language called Grouping to manage accessing to local DWHs corresponding to
Algebra. The relative multidimensional cube algebra is local MetaCube-X and to global MetaCube-X. Local
proposed in order to facilitate the data derivation. Gyssens MetaCube-X is a metadata to describe multidimensional
et al. in [Gys97] presented a tabular database model and data model for each local data warehouse and it is stored
discussed a tabular algebra as a language for querying and in the local data warehouse. Global MetaCube-X is a
restructuring tabular data. Lehner in [Leh98] discussed the global metadata that provides information integration of
design problem that arose when the OLAP scenarios local MetaCube-X’s from local data warehouses and it is
became very large and they proposed a nested stored in the server. Both local MetaCube-X and global
multidimensional data model useful during schema MetaCube-X are represented using XML documents to
designing and multidimensional data analysis phases. In support search facility to the local data warehouse.
this context, we proposed a multidimensional data model
N.T. Binh, A M. Tjoa, O. Mangisengi 8-2
Client
organized in hierarchy of levels, corresponding to
different levels of granularity. It also allows us to consider
a dimension schema as a poset of levels. In this concept, a
Web Data Warehouse
dimension hierarchy is a path along the dimension
Queries
schema, beginning at the root level and ending at a leaf
level. Moreover, the definitions of two dimension
XML MetaCube-X Services
MetaCube-X Server
operators, namely O
ancestor
and O
descendant
, provide
MetaCube-X
abilities to navigate along a dimension structure. In a
Repository
consequence, dimensions with any complexity in their
structures can be captured with our data model.
locatorDB
XML XML XML 3.2.2 The Concepts of Measures
MetaCube-X MetaCube-X MetaCube-X The concepts of measures, which are the objects of
analysis in the context of multidimensional data model,
have been also introduced in [Ngu00]. First, the notion of
measure schema is a tuple MSchema(M) = Fname, O .
In that case that O is ”NONE”, then the measure stands
Data Warehouse
Data Warehouse
n for a fact, otherwise it stands for an aggregation.
1
3.2.3 The Concepts of MetaCubes
Figure 2: MetaCube-X architecture
In [Ngu00], a MetaCube schema is defined by a triple of a
MetaCube name, an x tuple of dimension schemas, and a y
3.2 MetaCube Conceptual Data Model tuple of measure schemas. Afterwards, each data cell is an
intersection among a set of dimension members and
In [Ngu00], a conceptual multidimensional data model measure data values, each of which belongs to one
that facilitates a precise rigorous conceptualization for dimension or one measure. Furthermore, data cells of
OLAP has been introduced and presented. First, our within a MetaCube domain are grouped into a set of
approach has strong relation with mathematics by associated granular groups, each of which expresses a
applying some mathematic concepts, i.e. partial order, mapping from the domains of x-tuple of dimension levels
partially ordered set (poset). The mathematic soundness (independent variables) to y-numerical domains of y-tuple
provides a foundation to handle natural hierarchical of numeric measures (dependent variables). Hereafter, a
relationships among data elements along dimensions with MetaCube is constructed based on a set of dimensions,
many levels of complexity in their structures. Afterwards, and consists of a MetaCube schema, and is associated
the multidimensional data model organizes data in the with a set of groups.
form of MetaCubes. Instead of containing a set of data
cells, each MetaCube is associated with a set of groups
each of which contains a subset of the data cell set.
e
or
Furthermore, MetaCube operators (e.g. jumping, Mexico
St
USA
rollingUp and drillingDown) are defined in a very elegant Alcoholic 10
manner. Formally, the multidimensional data model is Dairy 50
Product
Beverage 20
constructed based on a set of dimensions Baked Food
D = {D1 ,.., D x }, x ∈ N , a set of measures
12
Meat 15
M = {M1 ,.., M y }, y ∈ N and a set of MetaCubes
Seafood 10
1 2 3 4 5 6
C = {C1 ,.., C z }, z ∈ N , each of which is associated with a Time
{ }
set of groups Groups (C i ) = G1 ,.., G p , p, i ∈ N ,1 ≤ i ≤ z .
Figure 3: Sales MetaCube is constructed from three
dimensions: Store, Product and Time and one measure:
3.2.1 The Concepts of Dimension TotalSale.
First, hierarchical relationships among dimension
members have been introduced by means of one
hierarchical domain per dimension [Ngu00]. A
hierarchical domain is a poset of dimension elements,
N.T. Binh, A M. Tjoa, O. Mangisengi 8-3
Has Child
+Chi l d 0..*
+Father NestedElelement
Des cri ption : String;
0..* +Father
+Chi ld
Has Father
MDElement
belongs to Gro upby
Cell
Gnam e : String;
1..*
MeasureValue DimensionElement
Des cription : type;
1..*
b elongs to
GSchema
IntergerValue floatValue refers to Gnam e : String;
Level
Des cription : int; Des cription : float;
Lnam e : String; 1..* 1..*
1.. *
1..* refers to
refers to 1 .. *
MeasureSchema refers to
Hierarchy
Fnam e : Str ing;
Hnam e : Stri ng;
belongs to AggFunc ti on : Str ing;
1..*
1..*
refers to belongs to
Cube
DimensionSchema belo ngs to refers to
Dimension Cnam e : String;
Dname : String; Bas icGroupby : Groupby;
1..*
Figure 4: The MetaCube-X model with UML
3.3 Modeling MetaCube-X with UML defining other classes, i.e. DimensionElement, Level,
GSchema, Groupby. In addition, other classes, such as:
The common or MetaCube-X is a model used for DimensionSchema, Hierachy, Dimension, MSchema,
expressing all schema objects available in the different MValue, Groupby, Cube classes are defined in order to
local data warehouses. The MetaCube-X(s) in a data represent dimension schema, dimension hierarchy,
warehouse federation allow handling the design, dimension, measure schema, measure values, groupby,
integration, and maintenance of heterogeneous schemas of and cube schema. The modeling will be implemented into
the local data warehouses. It serves for describing each XML schema based on the Meta Data Interchange
local schema including dimensions, dimension Specification (MDIS) [Met99a], and the Open
hierarchies, dimension levels, cubes, and measures and it Information Model (OIM) [Met97] of the Meta Data
should be possible to describe any schema represented by Coalition (MDC).
any multidimensional data model, such as star schema
model, snow-flake model, and the like. 3.4 Implementation with XML
To model the MetaCube-X, UML is used to model The MetaCube-X is an XML instance of MetaCube
dimensions, measures and data cubes in context of concept for supporting interoperability of different
MetaCube data model (figure 4). We introduce a class, multidimensional data models. It covers heterogeneity
namely NestedElement that provides a framework for
N.T. Binh, A M. Tjoa, O. Mangisengi 8-4
problems, such as syntactical, data model, semantic,
schematic, and structural heterogeneities.
The use of XML for representing MetaCube concept is to
model data to any level of complexity, to check data for
structural correctness, to define new tags as needed
corresponding to a new dimension, and to show
hierarchical information corresponding to dimension
hierarchies. These requirements are completely required
Number
for data warehouse schema and OLAP application. In Number
addition, XML can make easy it for extensibility, offers Number
promise for applying data management technology to Number
documents, for providing a neutral syntax for
interoperability among different systems, and is very
useful for exchanging data.
3.3.1 Mediation
Mediation resolves problems of semantic interoperation. It
..........
recognizes the autonomy and diversity of data warehouses.
Therefore, in this architecture we need one mediator for
each local data warehouse. A mediator is an independent
module located in each local data warehouse and it
supports flexible application interfaces, reusability, share
ability, and simple to increase maintainability.
Number
In this concept, each local data warehouse has a local Number
MetaCube-X and a local mediator. The mediator receives Number
the sub-query from the server managed by MetaCube-X Number
protocol.
3.3.2 Schema Integration
String
For supporting interoperability in the MetaCube-X
protocol, local MetaCube-Xs must be integrated into the
global MetaCube-X. The global MetaCube-X provides
global views for clients. In addition, because of the
integration of local MetaCube-Xs into the global
MetaCube-X, we need mapping information. The Figure 5: An Example of local MetaCube-X
following section discusses issues concerning local
MetaCube-X(s), the global MetaCube-X, and the mapping
information. Global MetaCube-X
Global MetaCube-X is the integration of local MetaCube-
Xs. The global MetaCube-X provides the logic to
Local MetaCube-X reconcile differences, and drive Web warehousing systems
The concept of MetaCube-X is to provide a common conforming to the global schema. The global MetaCube-X
multidimensional data model for Web warehouses in term is a metadata for query processing. If there is a query
of XML docoments. This local MetaCube-X is stored in a posted by users, the MetaCube-X service receives the
local Web warehouse. Furthermore, a local MetaCube-X query from the user, parses, checks, and compares it with
provides schema of each local Web warehouse. With the global MetaCube-X, and distributes it to selected local
reference to the MetaCube design, depicted in UML given Web warehouses. Therefore, the global MetaCube-X must
in figure 4, local MetaCube-X is represented in XML be able to represent heterogeneity of local data warehouse
document supports multidimensional data model, such as schema including dimensions and measures. In addition, to
cube, dimension, dimension schema, hierarchy, measures simplify the integration of local MetaCube-X(s) from local
for each data warehouse. An example of the MetaCube-X Web warehouses into global MetaCube-X, we use XML.
of local Web warehouse is given as follows. An example of global MetaCube-X is given in the
following figure.
N.T. Binh, A M. Tjoa, O. Mangisengi 8-5
Mapping Information
Mapping information is to provide information of
mapping between local MetaCube-X(s) and the global
MetaCube-X, when they are integrated. This information
is responsible for supporting translation information of
global queries into local queries in query processing. It is
parsed by search service of the MetaCube-X protocol and
compared with the global MetaCube-X, if there is a query
posted from the user. An example of mapping information
Number
Number is given as follows.
Number
Number
Dname1
Dname2
Dname3
..........
Dname1
Dname2
Dname4
...........
Figure 7: An example of mapping information
4. Conclusion and future works
In this paper we have presented the concept of MetaCube-
.......... X for supporting data warehouses federation. The
MetaCube-X is an XML instance of the MetaCube, the
extended MetaCube concepts introduced in [Ngu00], as a
conceptual multidimensional data model that facilitates a
precise rigorous conceptualization for OLAP. The
MetaCube-X metadata based on object-oriented model is a
semantically rich for interoperability among different data
String warehouse systems.
We focus on metadata for data warehouses federation,
Dname1
Dname2
especially Web warehousing system. Thus, we address
Dname3 query processing for Web warehouses by exploring the
use of XML and MetaCube-X protocol. They are
designed and implemented for federated queries as well as
String data exchange for retrieving the results from local Web
warehousing islands and offering them to federated users.
Dname1 Currently, we implement incremental prototypes
Dname2
demonstrating the feasibility of our approach to data
Dname4
warehouse federation.
Acknowledgements
Figure 6: An Example of global MetaCube-X
N.T. Binh, A M. Tjoa, O. Mangisengi 8-6
This work is partly supported by the ASEAN European
Union Academic Network (ASEA-Uninet), Project EZA [GMo99] H. Garcia-Molina, W. Labio, J.L. Wiener, Y.
894/98. Zhuge. Distributed and Parallel Computing Issues in
References Data Warehousing. In Proceedings of ACM Principles
of Distributed Computing Conference, 1999. Invited
[Agr95] R. Agrawal , A. Gupta, A. Sarawagi. Modeling Talk.
Multidimensional Databases. IBM Research Report,
IBM Almaden Research Center, September 1995. [Gra96] J. Gray, A. Bosworth, A. Layman, H. Pirahesh.
Data Cube: A Relational Aggregation Operator
[Alb98a] J. Albrecht, H. Guenzel, W. Lehner. An Generalizing Group-By, Cross-Tabs, and Sub-Totals.
Architecture for Distributed OLAP. Conference Proceedings of ICDE '96, New Orleans, February
Parallel and Distributed Processing Techniques and 1996.
Applications (PDPTA), Las Vegas, USA, July 13-16,
1998. [Gys97] M. Gyssens, L.V.S. Lakshmanan. A foundation
for multi-dimensional databases. Proc. VLDB'97.
[Alb98b] J. Albrecht, W. Lehner. On-Line Analytical
Processing in Distributed Data Warehouses. [Hüm00] W. Hümmer, J. Albrecht, H. Günzel. Distributed
International Databases Engineering and Applications Data Warehousing Based on CORBA. IASTED
Symposium (IDEAS), Cardiff, Wales, U.K, July 8-10, International Conference on Applied Informatics
1998. (AI'2000, Innsbruck, Austria, February 2000.
[And00] R. Anderson, M. Birbeck, M. Kay, S. [Leh98] W. Lehner. Modeling Large Scale OLAP
Livingstone, B. Loesgen, D. Martin, S. Mohr, N. Ozu, Scenarios. 6th International Conference on Extending
B. Peat, J. Pinnock, P. Stark, K. William. Professional Database Technology (EDBT'98), Valencia, Spain,
XML. Wrox Press Ltd., January 2000. 23-27, March 1998.
[Bau97] A. Bauer, W. Lehner. The Cube-Query- [Li96] C. Li, X.S. Wang. A Data Model for Supporting
Language (CQL) for Multidimensional Statistical and On-Line Analytical Processing. CIKM 1996.
Scientific Database Systems. Proceedings of the 5th.
International Conference on Database Systems for [Man99] O. Mangisengi, A M. Tjoa, R.R. Wagner.
Advanced Applications (DASFAA), Melbourne, Multidimensional Modelling Approaches for OLAP.
Australia, April 1-4, 1997. Proceedings of the Ninth International Database
Conference “Heterogeneous and Internet Databases”
[Bla98] M. Blaschkam, C. Sapia, G. Höfling, B. Dinter. 1999, ISBN 962-937-046-8. Ed. J. Fong, Hong Kong,
Finding your way through multidimensional data 1999
models. In 9th Intl. DEXA Workshop, Vienna, Austria,
August 1998. [Mat99] Mattison R. Web Warehousing and Knowledge
Management. McGraw-Hill, 1999
[Cab98] L. Cabibbo, R. Torlone. A Logical Approach to
Multidimensional Databases. EDBT 1998 [Met97] Meta Data Coalition. Metadata Interchange
Specification (MDIS) Version 1.1, August 1997.
[Cha97] S. Chaudhuri, U. Dayal. An Overview of Data
Warehousing and OLAP Technology. SIGMOD [Met99a] Meta Data Coalition. Open Information Model.
Record Volume 26, Number 1, September 1997. Version 1.1, August 1999. http://www.mdcinfo.com/.
[Cod93] E.F. Codd, S.B. Codd, C.T. Salley. Providing [Ngu00] T.B. Nguyen, A M. Tjoa, R.R. Wagner.
OLAP (On-Line Analytical Processing) to User Conceptual Multidimensional Data Model Based on
Analysts: An IT Mandate, White Paper, Arbor MetaCube. In Proc. of First Biennial International
Software Corporation, 1993. Conference on Advances in Information Systems
(ADVIS'2000), Izmir, TURKEY, October 2000.
[Eck00] W.W. Eckerson. Data Warehousing in the 21st. Lecture Notes in Computer Science (LNCS), Springer,
Century. The Data Warehousing Institute, 2000. 2000.
http://www.dw-institute.com/
[Ola97] OLAP Council. OLAP AND OLAP Server
[Gar99] L. Garber, M. Stonebraker. On the Importance of Definitions. 1997.
Data Integration. IT Professional, Vol. 1, No. 3, pp. http://www.olapcouncil.org/research/glossaryly.htm
80, 77-79, May, June 1999.
N.T. Binh, A M. Tjoa, O. Mangisengi 8-7
[Ros99] A. Rosenthal, L. Seligman, R. Costello. XML,
Databases, and Interoperability. The MITRE
Corporation. Federal Database Colloquium, AFCEA,
San Diego, 1999.
[She90] A.P. Sheth, J.A. Larson. Federated Database
Systems for Managing Distributed, Heterogeneous,
and Autonomous Databases. ACM Computing
Surveys, Vol. 22, No. 3, September 1990.
[Vas97] V. Vassalos, Y. Papakonstantinou. Describing
and Using Query Capabilities of Heterogeneous
Sources. Proceedings of the 23rd. VLDB Conference
Athens, Greece, 1997.
[Vas98] P. Vassiliadis. Modeling Multidimensional
Databases, Cubes and Cube operations. In Proc. 10th
Scientific and Statistical Database Management
Conference (SSDBM '98), Capri, Italy, June 1998.
[Wan97] M. Wang, B. Iyer. Efficient roll-up and drill-
down analysis in relational database. In 1997
SIGMOD Workshop on Research Issues on Data
Mining and Knowledge Discovery, 1997.
[Wid95] J. Widom. Research Problems in Data
Warehousing. Proceedings of the 4th. International
Conference on Information and Knowledge
Management (CIKM), November 1995.
[Wie92] G. Wiederhold. Mediators in the Architecture of
Future Information Systems. The IEEE Computer
Magazine, March 1992.
N.T. Binh, A M. Tjoa, O. Mangisengi 8-8