MetaCube-X: An XML Metadata Foundation for Interoperability Search among Web Warehouses Nguyen Thanh Binh, A Min Tjoa Oscar Mangisengi Institute of Software Technology, Dept. of Computer Science Vienna University of Technology National University of Singapore, Favoritenstrasse 9-11/188, S16 Level 5, 3 Drive 2, A-1040 Vienna, Austria Singapore 117543 {binh,tjoa}@ifs.tuwien.ac.at oscar@comp.nus.edu.sg Abstract [Wan97]. However, each approach presents its own view of multidimensional analysis requirements, terminology OLAP (Online Analysis Processing) applications and formalism. Consequently, there is no commonly have very special requirements to the underlying accepted formal multidimensional data model established. multidimensional data that differs significantly Such a model is necessary to serve as a foundation for from other areas of application (e.g. the existence standardization and future research. This has been the of highly structured dimensions). In addition, main motivation for us to invest and focus on a new providing access and search among multiple, multidimensional data model that is suitable for OLAP heterogeneous, distributed and autonomous data applications. Since these applications have very special warehouses, especially web warehouses, has requirements to the underlying multidimensional data that become one of the leading issues in data differ significantly from other areas of application (e.g. warehouse research and industry. This paper the existence of highly structured dimensions). In this proposes MetaCube-X to provide interoperability context, the concepts of MetaCube have been introduced search among Web data warehouses. in [Ngu00]. On the other hand, the World Wide Web is a distributed 1 Introduction global information resource that contains a large amount The concept of On-Line Analytical Processing (OLAP), of information placed on the web independently by first introduced by [Cod93] to enable business decision different organizations. Therefore, related information makers to work with data warehouses, supports dynamic may appear across different web sites. Furthermore, Web synthesis, analysis, and consolidation of large volumes of warehousing is a novel and very active research area, multidimensional data. OLAP tools are frequently used as which combines two rapidly developing technologies, i.e. front-end in data warehouse environments. They allow the data warehousing and Web technology depicted in figure interactive analysis of multidimensional data. Independent 1 [Mat99] and provides a suitable approach to from the different possible architectures concerning data systematically discover and acquire strategic information storage and query processing, they all present the data to from the Web. This information may be identified, the user in a multidimensional data model and queries are cataloged, managed and then accessed by the end users formulated using the multidimensional paradigm. The [Mat99], via search engines or some Web information research community for different areas of applications has management system. proposed several formal multidimensional metadata models and corresponding query languages [Agr95], Data Warehousing contributes: [Bla98], [Cab98], [Cha97], [Eck00], [Gra96], [Gys97], Data management warehousing approach [Leh98], [Li96], [Man99], [Ngu00], [Ola97], [Vas98], Web Warehousing The copyright of this paper belongs to the paper’s authors. Permission to copy without fee all or part of this material is granted provided that the copies are not The Web made or distributed for direct commercial advantage. contributes: Web technology Proceedings of the International Workshop on Design and text and multimedia managament Management of Data Warehouses (DMDW'2001) Interlaken, Switzerland, June 4, 2001 (D. Theodoratos, J. Hammer, M. Jeusfeld, M. Staudt, eds.) Figure 1: The hybrid of Web warehousing systems. http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-39/ N.T. Binh, A M. Tjoa, O. Mangisengi 8-1 To provide the user with a powerful and friendly query namely MetaCube in [Ngu00], the concept of which is a mechanism for accessing information on the web, the generalization of other former multidimensional data critical problem is to find an effective way to build web models, i.e. relational and multidimensional OLAP data models. The key objective of our approach is to models. First, the MetaCube model is able to represent design and implement a web warehousing system based and capture natural hierarchical relationships among on MetaCube-X protocol given in Figure 2, which members within a dimension as well as the relationships provides access and search capability among multiple, between dimension members and measure data values. heterogeneous, distributed and autonomous web Hereafter, dimensions and data cubes with their operators warehouses. The MetaCube-X is an XML (eXtensible are formally introduced. Each MetaCube is associated Markup Language) instance of the MetaCube concept with a set of groups each of which contains a subset of the [Ngu00] for supporting data warehouses federation. As a MetaCube domain, which is a poset of data cells. result, the MetaCube-X provides a neutral syntax for Furthermore, MetaCube operators (e.g. jumping, interoperability among different Web warehousing rollingUp and drillingDown) are defined in a very elegant systems. In this concept, we define a global MetaCube-X manner. stored in a server and local MetaCube-Xs stored in local [Gmo99] presents distributed and parallel computing Web warehouses. issues in data warehousing. [Alb98a], [Alb98b], [Bau97], The remainder of this paper is organized as follows. In [Hüm00], [Leh98] present the prototypical distributed section 2, we discuss about related works. Then in section OLAP system developed in the context of the CUBE- 3, we introduce MetaCube-X: from conceptual data model STAR project. [Hüm00] presents distributed data to the XML implementation. The paper concludes with warehousing based on the Common Object Request section 4, which presents our current and future works. Broker Architecture (CORBA). A variety of approaches for interoperability have been proposed, aiming at different levels of integration in 2 Related works related to federated database management systems [She98]. According to [Gar99], data federations will be Our work is related to research within the area of metadata very important and XML will support for communicating for multidimensional databases, federated database databases, and integrating data over the Internet. The systems, mediation between multiple information systems, concept of mediator introduced by [Wie92]. especially distributed data warehousing systems. In this paper we propose MetaCube-X that is an XML The concept of multidimensionality (or n-dimensionality) instance of MetaCube concepts to provide a framework of these datasets, and in particular, of aggregate data, as for supporting data warehouses federation. well as the concepts of dimension (often called category attribute, descriptive variable, character, etc.) and of measure (often called summary attribute, quantitative data, variable, etc.) has been already discussed [Agr95], 3 The Concept of MetaCube-X [Bla98], [Cab98], [Cha97], [Eck00], [Gra96], [Gys97], [Leh98], [Li96], [Man99], [Ngu00], [Ola97], [Vas98], 3.1 MetaCube-X Protocol [Wan97]. Recently, in literature, many authors proposed multidimensional data models and query languages. Gray Figure 2 shows the architecture of MetaCube-X to provide et al. in [Gra96] proposed the data cube operator as abilities for interoperability search among web-data extension to SQL, which generalized the histogram, cross- warehouses. The architecture of MetaCube-X systems tabulation, roll-up, drill-down, and sub-total constructs consists of clients, server protocol, i.e. MetaCube-X found in most report writers. In [Li96] the authors repository, local MetaCube-X, and local data warehouses. formalized a multidimensional data model for OLAP, and Thus, the MetaCube–X protocol is to provide services and developed an algebra query language called Grouping to manage accessing to local DWHs corresponding to Algebra. The relative multidimensional cube algebra is local MetaCube-X and to global MetaCube-X. Local proposed in order to facilitate the data derivation. Gyssens MetaCube-X is a metadata to describe multidimensional et al. in [Gys97] presented a tabular database model and data model for each local data warehouse and it is stored discussed a tabular algebra as a language for querying and in the local data warehouse. Global MetaCube-X is a restructuring tabular data. Lehner in [Leh98] discussed the global metadata that provides information integration of design problem that arose when the OLAP scenarios local MetaCube-X’s from local data warehouses and it is became very large and they proposed a nested stored in the server. Both local MetaCube-X and global multidimensional data model useful during schema MetaCube-X are represented using XML documents to designing and multidimensional data analysis phases. In support search facility to the local data warehouse. this context, we proposed a multidimensional data model N.T. Binh, A M. Tjoa, O. Mangisengi 8-2 Client organized in hierarchy of levels, corresponding to different levels of granularity. It also allows us to consider a dimension schema as a poset of levels. In this concept, a Web Data Warehouse dimension hierarchy is a path along the dimension Queries schema, beginning at the root level and ending at a leaf level. Moreover, the definitions of two dimension XML MetaCube-X Services MetaCube-X Server operators, namely O ancestor and O descendant , provide MetaCube-X abilities to navigate along a dimension structure. In a Repository consequence, dimensions with any complexity in their structures can be captured with our data model. locatorDB XML XML XML 3.2.2 The Concepts of Measures MetaCube-X MetaCube-X MetaCube-X The concepts of measures, which are the objects of analysis in the context of multidimensional data model, have been also introduced in [Ngu00]. First, the notion of measure schema is a tuple MSchema(M) = Fname, O . In that case that O is ”NONE”, then the measure stands Data Warehouse Data Warehouse n for a fact, otherwise it stands for an aggregation. 1 3.2.3 The Concepts of MetaCubes Figure 2: MetaCube-X architecture In [Ngu00], a MetaCube schema is defined by a triple of a MetaCube name, an x tuple of dimension schemas, and a y 3.2 MetaCube Conceptual Data Model tuple of measure schemas. Afterwards, each data cell is an intersection among a set of dimension members and In [Ngu00], a conceptual multidimensional data model measure data values, each of which belongs to one that facilitates a precise rigorous conceptualization for dimension or one measure. Furthermore, data cells of OLAP has been introduced and presented. First, our within a MetaCube domain are grouped into a set of approach has strong relation with mathematics by associated granular groups, each of which expresses a applying some mathematic concepts, i.e. partial order, mapping from the domains of x-tuple of dimension levels partially ordered set (poset). The mathematic soundness (independent variables) to y-numerical domains of y-tuple provides a foundation to handle natural hierarchical of numeric measures (dependent variables). Hereafter, a relationships among data elements along dimensions with MetaCube is constructed based on a set of dimensions, many levels of complexity in their structures. Afterwards, and consists of a MetaCube schema, and is associated the multidimensional data model organizes data in the with a set of groups. form of MetaCubes. Instead of containing a set of data cells, each MetaCube is associated with a set of groups each of which contains a subset of the data cell set. e or Furthermore, MetaCube operators (e.g. jumping, Mexico St USA rollingUp and drillingDown) are defined in a very elegant Alcoholic 10 manner. Formally, the multidimensional data model is Dairy 50 Product Beverage 20 constructed based on a set of dimensions Baked Food D = {D1 ,.., D x }, x ∈ N , a set of measures 12 Meat 15 M = {M1 ,.., M y }, y ∈ N and a set of MetaCubes Seafood 10 1 2 3 4 5 6 C = {C1 ,.., C z }, z ∈ N , each of which is associated with a Time { } set of groups Groups (C i ) = G1 ,.., G p , p, i ∈ N ,1 ≤ i ≤ z . Figure 3: Sales MetaCube is constructed from three dimensions: Store, Product and Time and one measure: 3.2.1 The Concepts of Dimension TotalSale. First, hierarchical relationships among dimension members have been introduced by means of one hierarchical domain per dimension [Ngu00]. A hierarchical domain is a poset of dimension elements, N.T. Binh, A M. Tjoa, O. Mangisengi 8-3 Has Child +Chi l d 0..* +Father NestedElelement Des cri ption : String; 0..* +Father +Chi ld Has Father MDElement belongs to Gro upby Cell Gnam e : String; 1..* MeasureValue DimensionElement Des cription : type; 1..* b elongs to GSchema IntergerValue floatValue refers to Gnam e : String; Level Des cription : int; Des cription : float; Lnam e : String; 1..* 1..* 1.. * 1..* refers to refers to 1 .. * MeasureSchema refers to Hierarchy Fnam e : Str ing; Hnam e : Stri ng; belongs to AggFunc ti on : Str ing; 1..* 1..* refers to belongs to Cube DimensionSchema belo ngs to refers to Dimension Cnam e : String; Dname : String; Bas icGroupby : Groupby; 1..* Figure 4: The MetaCube-X model with UML 3.3 Modeling MetaCube-X with UML defining other classes, i.e. DimensionElement, Level, GSchema, Groupby. In addition, other classes, such as: The common or MetaCube-X is a model used for DimensionSchema, Hierachy, Dimension, MSchema, expressing all schema objects available in the different MValue, Groupby, Cube classes are defined in order to local data warehouses. The MetaCube-X(s) in a data represent dimension schema, dimension hierarchy, warehouse federation allow handling the design, dimension, measure schema, measure values, groupby, integration, and maintenance of heterogeneous schemas of and cube schema. The modeling will be implemented into the local data warehouses. It serves for describing each XML schema based on the Meta Data Interchange local schema including dimensions, dimension Specification (MDIS) [Met99a], and the Open hierarchies, dimension levels, cubes, and measures and it Information Model (OIM) [Met97] of the Meta Data should be possible to describe any schema represented by Coalition (MDC). any multidimensional data model, such as star schema model, snow-flake model, and the like. 3.4 Implementation with XML To model the MetaCube-X, UML is used to model The MetaCube-X is an XML instance of MetaCube dimensions, measures and data cubes in context of concept for supporting interoperability of different MetaCube data model (figure 4). We introduce a class, multidimensional data models. It covers heterogeneity namely NestedElement that provides a framework for N.T. Binh, A M. Tjoa, O. Mangisengi 8-4 problems, such as syntactical, data model, semantic, schematic, and structural heterogeneities. The use of XML for representing MetaCube concept is to model data to any level of complexity, to check data for structural correctness, to define new tags as needed corresponding to a new dimension, and to show hierarchical information corresponding to dimension hierarchies. These requirements are completely required Number for data warehouse schema and OLAP application. In Number addition, XML can make easy it for extensibility, offers Number promise for applying data management technology to Number documents, for providing a neutral syntax for interoperability among different systems, and is very useful for exchanging data. 3.3.1 Mediation Mediation resolves problems of semantic interoperation. It .......... recognizes the autonomy and diversity of data warehouses. Therefore, in this architecture we need one mediator for each local data warehouse. A mediator is an independent module located in each local data warehouse and it supports flexible application interfaces, reusability, share ability, and simple to increase maintainability. Number In this concept, each local data warehouse has a local Number MetaCube-X and a local mediator. The mediator receives Number the sub-query from the server managed by MetaCube-X Number protocol. 3.3.2 Schema Integration String For supporting interoperability in the MetaCube-X protocol, local MetaCube-Xs must be integrated into the global MetaCube-X. The global MetaCube-X provides global views for clients. In addition, because of the integration of local MetaCube-Xs into the global MetaCube-X, we need mapping information. The Figure 5: An Example of local MetaCube-X following section discusses issues concerning local MetaCube-X(s), the global MetaCube-X, and the mapping information. Global MetaCube-X Global MetaCube-X is the integration of local MetaCube- Xs. The global MetaCube-X provides the logic to Local MetaCube-X reconcile differences, and drive Web warehousing systems The concept of MetaCube-X is to provide a common conforming to the global schema. The global MetaCube-X multidimensional data model for Web warehouses in term is a metadata for query processing. If there is a query of XML docoments. This local MetaCube-X is stored in a posted by users, the MetaCube-X service receives the local Web warehouse. Furthermore, a local MetaCube-X query from the user, parses, checks, and compares it with provides schema of each local Web warehouse. With the global MetaCube-X, and distributes it to selected local reference to the MetaCube design, depicted in UML given Web warehouses. Therefore, the global MetaCube-X must in figure 4, local MetaCube-X is represented in XML be able to represent heterogeneity of local data warehouse document supports multidimensional data model, such as schema including dimensions and measures. In addition, to cube, dimension, dimension schema, hierarchy, measures simplify the integration of local MetaCube-X(s) from local for each data warehouse. An example of the MetaCube-X Web warehouses into global MetaCube-X, we use XML. of local Web warehouse is given as follows. An example of global MetaCube-X is given in the following figure. N.T. Binh, A M. Tjoa, O. Mangisengi 8-5 Mapping Information Mapping information is to provide information of mapping between local MetaCube-X(s) and the global MetaCube-X, when they are integrated. This information is responsible for supporting translation information of global queries into local queries in query processing. It is parsed by search service of the MetaCube-X protocol and compared with the global MetaCube-X, if there is a query posted from the user. An example of mapping information Number Number is given as follows. Number Number Dname1 Dname2 Dname3 .......... Dname1 Dname2 Dname4 ........... Figure 7: An example of mapping information 4. Conclusion and future works In this paper we have presented the concept of MetaCube- .......... X for supporting data warehouses federation. The MetaCube-X is an XML instance of the MetaCube, the extended MetaCube concepts introduced in [Ngu00], as a conceptual multidimensional data model that facilitates a precise rigorous conceptualization for OLAP. The MetaCube-X metadata based on object-oriented model is a semantically rich for interoperability among different data String warehouse systems. We focus on metadata for data warehouses federation, Dname1 Dname2 especially Web warehousing system. Thus, we address Dname3 query processing for Web warehouses by exploring the use of XML and MetaCube-X protocol. They are designed and implemented for federated queries as well as String data exchange for retrieving the results from local Web warehousing islands and offering them to federated users. Dname1 Currently, we implement incremental prototypes Dname2 demonstrating the feasibility of our approach to data Dname4 warehouse federation. Acknowledgements Figure 6: An Example of global MetaCube-X N.T. Binh, A M. Tjoa, O. Mangisengi 8-6 This work is partly supported by the ASEAN European Union Academic Network (ASEA-Uninet), Project EZA [GMo99] H. Garcia-Molina, W. Labio, J.L. Wiener, Y. 894/98. Zhuge. Distributed and Parallel Computing Issues in References Data Warehousing. In Proceedings of ACM Principles of Distributed Computing Conference, 1999. Invited [Agr95] R. Agrawal , A. Gupta, A. Sarawagi. Modeling Talk. Multidimensional Databases. IBM Research Report, IBM Almaden Research Center, September 1995. [Gra96] J. Gray, A. Bosworth, A. Layman, H. Pirahesh. Data Cube: A Relational Aggregation Operator [Alb98a] J. Albrecht, H. Guenzel, W. Lehner. An Generalizing Group-By, Cross-Tabs, and Sub-Totals. Architecture for Distributed OLAP. Conference Proceedings of ICDE '96, New Orleans, February Parallel and Distributed Processing Techniques and 1996. Applications (PDPTA), Las Vegas, USA, July 13-16, 1998. [Gys97] M. Gyssens, L.V.S. Lakshmanan. A foundation for multi-dimensional databases. Proc. VLDB'97. [Alb98b] J. Albrecht, W. Lehner. On-Line Analytical Processing in Distributed Data Warehouses. [Hüm00] W. Hümmer, J. Albrecht, H. Günzel. Distributed International Databases Engineering and Applications Data Warehousing Based on CORBA. IASTED Symposium (IDEAS), Cardiff, Wales, U.K, July 8-10, International Conference on Applied Informatics 1998. (AI'2000, Innsbruck, Austria, February 2000. [And00] R. Anderson, M. Birbeck, M. Kay, S. [Leh98] W. Lehner. Modeling Large Scale OLAP Livingstone, B. Loesgen, D. Martin, S. Mohr, N. Ozu, Scenarios. 6th International Conference on Extending B. Peat, J. Pinnock, P. Stark, K. William. Professional Database Technology (EDBT'98), Valencia, Spain, XML. Wrox Press Ltd., January 2000. 23-27, March 1998. [Bau97] A. Bauer, W. Lehner. The Cube-Query- [Li96] C. Li, X.S. Wang. A Data Model for Supporting Language (CQL) for Multidimensional Statistical and On-Line Analytical Processing. CIKM 1996. Scientific Database Systems. Proceedings of the 5th. International Conference on Database Systems for [Man99] O. Mangisengi, A M. Tjoa, R.R. Wagner. Advanced Applications (DASFAA), Melbourne, Multidimensional Modelling Approaches for OLAP. Australia, April 1-4, 1997. Proceedings of the Ninth International Database Conference “Heterogeneous and Internet Databases” [Bla98] M. Blaschkam, C. Sapia, G. Höfling, B. Dinter. 1999, ISBN 962-937-046-8. Ed. J. Fong, Hong Kong, Finding your way through multidimensional data 1999 models. In 9th Intl. DEXA Workshop, Vienna, Austria, August 1998. [Mat99] Mattison R. Web Warehousing and Knowledge Management. McGraw-Hill, 1999 [Cab98] L. Cabibbo, R. Torlone. A Logical Approach to Multidimensional Databases. EDBT 1998 [Met97] Meta Data Coalition. Metadata Interchange Specification (MDIS) Version 1.1, August 1997. [Cha97] S. Chaudhuri, U. Dayal. An Overview of Data Warehousing and OLAP Technology. SIGMOD [Met99a] Meta Data Coalition. Open Information Model. Record Volume 26, Number 1, September 1997. Version 1.1, August 1999. http://www.mdcinfo.com/. [Cod93] E.F. Codd, S.B. Codd, C.T. Salley. Providing [Ngu00] T.B. Nguyen, A M. Tjoa, R.R. Wagner. OLAP (On-Line Analytical Processing) to User Conceptual Multidimensional Data Model Based on Analysts: An IT Mandate, White Paper, Arbor MetaCube. In Proc. of First Biennial International Software Corporation, 1993. Conference on Advances in Information Systems (ADVIS'2000), Izmir, TURKEY, October 2000. [Eck00] W.W. Eckerson. Data Warehousing in the 21st. Lecture Notes in Computer Science (LNCS), Springer, Century. The Data Warehousing Institute, 2000. 2000. http://www.dw-institute.com/ [Ola97] OLAP Council. OLAP AND OLAP Server [Gar99] L. Garber, M. Stonebraker. On the Importance of Definitions. 1997. Data Integration. IT Professional, Vol. 1, No. 3, pp. http://www.olapcouncil.org/research/glossaryly.htm 80, 77-79, May, June 1999. N.T. Binh, A M. Tjoa, O. Mangisengi 8-7 [Ros99] A. Rosenthal, L. Seligman, R. Costello. XML, Databases, and Interoperability. The MITRE Corporation. Federal Database Colloquium, AFCEA, San Diego, 1999. [She90] A.P. Sheth, J.A. Larson. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, Vol. 22, No. 3, September 1990. [Vas97] V. Vassalos, Y. Papakonstantinou. Describing and Using Query Capabilities of Heterogeneous Sources. Proceedings of the 23rd. VLDB Conference Athens, Greece, 1997. [Vas98] P. Vassiliadis. Modeling Multidimensional Databases, Cubes and Cube operations. In Proc. 10th Scientific and Statistical Database Management Conference (SSDBM '98), Capri, Italy, June 1998. [Wan97] M. Wang, B. Iyer. Efficient roll-up and drill- down analysis in relational database. In 1997 SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997. [Wid95] J. Widom. Research Problems in Data Warehousing. Proceedings of the 4th. International Conference on Information and Knowledge Management (CIKM), November 1995. [Wie92] G. Wiederhold. Mediators in the Architecture of Future Information Systems. The IEEE Computer Magazine, March 1992. N.T. Binh, A M. Tjoa, O. Mangisengi 8-8