Metadata for Object-Relational Data Warehouse Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa Institute of Software Technology (E188) Vienna University of Technology Favoritenstrasse 9-11/188, A-1040 Vienna, Austria ({thanh, oscar, tjoa}@ifs.tuwien.ac.at) Abstract [Ston95], [Ston97], [OHUS96], [KrBN99] is preferred to get over those ones. For developing data warehouse (DW) and On- In these systems, metadata plays an important role and Line Analytical Processing (OLAP) systems, the provides the foundation for all actions in all stages. It can dominant relational database reaches its be considered as glue sticking together all individual parts limitations. On the way of the development, of these systems. object-relational (O-R) database is preferred to In this paper, we propose our O-R data warehouse get over those ones. This paper introduces architecture with new metadata layer and describe the metadata for data warehouse system on O-R design and implementation new kind of metadata to bridge database and specifies new kind of metadata for gap between object-oriented environment and relational mapping from object-oriented environment to database. relational environment. We also present the The paper is constructed as follow. Section 2 discusses storage structure for repository this new kind of the related works, which cover an overview on data metadata in O-R database. warehouse modeling and metadata for data warehouse. The next section shortly reviews O-R databases and its query. Keywords : Metadata, OLAP, Data warehouse, Object- An O-R data warehouse is presented in section 4. Section 5 relational database. discusses the metadata for the O-R DW. The last section comes with the conclusion. 1 Introduction 2 Related Works The data stored in DW and OLAP systems is collected, integrated and centralized from various operational data There has been a substantial amount of work on the store systems. For analysis purpose of the enterprise, the general topic of data warehouse and OLAP. For the sake of data are usually stored in multidimensional structures relevance and brevity, we discuss generally here only the [TrPa98], [ReBS97], [Kimb98]. These structures are works that propose metadata for the data warehouse and suitable for analysis purposes since they represent in an data warehouse modeling. intuitive way the factual data according to the Orr in [Orr96] introduces data warehouse architecture characteristics that are considered relevant to the analysis. with 8 layers including a metadata layer. These layers For developing the DW and OLAP systems, the represent the overall structure of data, communication, dominant relational database reaches its limitations processing and presentation that exists for end user [GoLK99]. On the way of the development, O-R database computing within the enterprise. Gupta proposed opinion that “the data warehouse model needs to be extensible and structured such that the data from different applications can The copyright of this paper belongs to the paper’s authors. Permission to copy be added as a business” [Gupt97]. Different approaches to without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage. develop a data warehouse were suggested in [Fire97b]. Proceedings of the International Workshop on Design and These approaches show us various data warehouse models. Management of Data Warehouses (DMDW'2000) Furthermore, Wu and Buchmann proposed logical and Stockholm, Sweden, June 5-6, 2000 physical data warehouse architectures in [WuBu97]. The (M. Jeusfeld, H. Shu, M. Staudt, G. Vossen, eds.) logical architecture is independent from application and front-end tools. The physical architectures are a mapping http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-28/ Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa 3-1 of the logical architecture to multidimensional database interface components ensure a transparent access from management system (MDBMS) and relational DBMS application outside the system to data storage in its (RDBMS). databases. Kimball et al. proposed data warehouses with a “bus The Object-Relational engine is object-based architecture” based on “conformed dimension” and environment and bridges the object environment and “standard fact” definitions. This is a practical, flexible relational database. It not only manages the native SQL architecture for data warehouse systems. Furthermore, they data types (such as integer, number, date, char) but also proposed a centralized metadata using for the both front object data types, which are user-defined or system- room and back room [KRRT98]. predefined object types. Like any classes in an object- Architecture for distributed OLAP is also investigated oriented programming language, these object types include in ongoing CubeStar project. In this project, also dynamic ‘attributes’ holding the data and ‘methods’ manipulating metadata is distributed in the system [AlGL98]. their behaviors. Consequently, the object-relational query Different extended relational concepts to model language trends to support user-defined functions and metadata for data warehousing are introduced in operators. Up to now, the SQL3 [Kulk94], [FDCM+99] is [MaTW99]. The differences of the models show a huge preferred to become a standard for object-relational query advantage of the extended relational model. language but it is still not powerful enough to play this role. 3 Object-Relational Database For example, given the object-relational schema and a typical object-relational query [Ston97]: Nowadays, leading DBMS vendors have committed to O-R DBMS, e.g., Oracle with Oracle8i or Informix issuing Create EMP-OR (name=C12, age=int, salary=int, Informix-Universal server. Nearly all of them support the dept=C12, location=point, picture=image); Java programming language, which provides an object environment to users of their system. Select name Form EMP-OR Where beard (picture) > 0.7 and Age > 60 and Location in circle (“10,10”, 5); Comparing with traditional relation, the two new additional fields that hold data in two new data types are “geographic point” and “image”. In the query, Object-Relational Interface “beard(picture)” and “in” are user-defined operators. 4 O-R Data Warehouse In this section we propose O-R DW architecture, given Object-Relational Engine in figure 2, based on logical architecture proposed in [WuBu97]. The differences of these architectures are “the object-orientation” approach and the new metadata layer. With the object-oriented approach, most layers of this architecture -but the “Data Store” layer- consist of many Relational Database objects of various object types, which perform underlying functions of each component. In this architecture, the data flow is similar to other data Object-Relational Database warehouse architectures [KRRT98], [Fire97a], [Orr96], [WuBu97] where data is collected from diverse operational database systems, summarized, aggregated and integrated Figure 1: Object-Relational Database in a data warehouse, and used as read-only data to supports complex analysis. In general, we can regard O-RDBMS architecture as This architecture consists of the components described shown in figure 1. Like any other systems, O-R interfaces as follows: obtain requirement data and deliver the corresponding 1. Application interface layer: object data from the O-RDBMS to the applications. These Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa 3-2 In the application interface layer, the objects of this updating, maintaining processes of the data component hide complex data processes from the data warehouse. That means, new user services can be warehouse users. The objects of this component are added in this layer to support new user requirements if classified into various groups serving different needed. services. Based on their functionalities, each service 2. Data Acquisition: responds to corresponding requests of third party applications, data analyzers, or other users of the data The Data Acquisition component can be considered warehouse system. as a tool that constructs the data engine of the data For the usage of the data warehouse, the main warehouse. The data acquisition objects will extract, functions of the object types of this layer are to transform and transfer data from different legacy receive users’ queries, preprocess these queries and operational data stores (ODS) to the data warehouse then send final request to the Data Warehouse O-R database. Management component. Afterward, they obtain the The functions of this component are divided into queries results from the deeper layer. In this suitable sub-function levels that are performed by architecture, a query is not directly executed at this pattern object types, e.g., this component has various layer. classes, such as: ExtractingService, Transforming- For administrating the operations of the data Service, LoadingService, etc. warehouse, the objects of this layer will provide functions to manage user services, control the Applications DWBrowser QueryData service service QueryData StandardReporting service service Service Management (1) executeQuery(orQuery):QueryResult * Application Interface Layer Hierarchy Dimension Metadata Extracting DataAcquisition PurgingService Service Process QueryAgains Data tORDBMS Sources (1) Extract(fromDatasource):Data * PurgingService Cleansing Extracting Service Service Data Acquisition Data Warehouse Management O-R Database Figure 2: The O-RDW physical Architecture Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa 3-3 3. Data Warehouse Management: warehouse system. In section 5.1, we summarize some metadata classification in the data warehouse system, As a component of the data management layer, this proposing new kinds of metadata that exist only in O-R component directly accesses data of the data environment. A short description in section 5.2 discusses warehouse from the O-R database. It provides the multidimensional star schema in relational database. services, which bridge the application interface layer The star schema is used as a frame describing new kind of and the O-R database. metadata in the next two sections, 5.3 and 5.4, which give In this component, different methods can be applied the realization way to design and implement the new kind to access data stored in the O-R database. of metadata, store their attributes in O-R data warehouse. Furthermore, the database access methods can be updated or added to improve the performance of the 5.1 Metadata Classifications data warehouse. The division of the data management layer into two There are many kinds of metadata in a data warehouse individual components allows us to clearly system [KRRT98], [Kimb98]. Instead of listing them, we distinguish between read-only data processing in data prefer to generally summarize existing metadata warehouse and data input processing. The functions classifications in various points of views. of this comp onent are mainly to read available data, In [CoBA99], metadata is classified based on the and to create new materialized views based on this datawarehouse architecture layers as follow: data. • Metadata associated with data loading and transformation. It describes the source data and any 4. Metadata: changes that were made to the data. With regard to metadata in an object-oriented way, • Metadata associated with data management. It defines we define the behaviors for metadata objects the data store in the data warehouse. Every object in depended on its roles. For instance, metadata can the database needs to be described including the data itself count its accessed frequency, make statistics of in each table, index, and view, and any associated query usages, and so on. That means that many constraints. This information is held in the DBMS questions about the warehouse operations can be system catalog; however, there are additional easily answered by directly querying metadata, e.g., requirements for the purposes of the warehouse. how many reports were created in a day? How often • Metadata used by the query manager to generate an is one kind of data used? appropriate query. The query manager generates This metadata layer will be discussed in more detail additional metadata about the queries that are run, later, in section 5, “O-R Data Warehouse Metadata”. which can be used to generate a history on all the queries and a query profile for each user, group of 5. Data store: users, or the data warehouse. The data stored in O-R DW differs primarily from DW in relational environment and object-oriented data The other classification divides metadata into technical warehouse. Depending on the requirements and data metadata, business metadata and information navigator types, O-R DW designers can decide to model it as a metadata [Que97]: “cube”, like MOLAP (Multidimensional OLAP), or as • Technical metadata primarily supports technical staff object hierarchy, like O3LAP (Object-Oriented that must implement and deploy the data warehouse. OLAP). For instance, in O-R DW, simple data can be The information contained within the technical modeled in multidimensional structures looking like directory is compatible with this kind of audience and what have done in relational database systems contains the term and definition of metadata, exactly [KRRT98], [WuBu97], [Fire97b]. Otherwise, complex as they appear in operational databases. data, user-define data can be modeled in object • The business metadata primarily supports business end hierarchical structures as suggested for OODBMS users who do not have a technical background, and [BeMa93]. Furthermore, the objects of any layers, cannot use the technical metadata to determine what particularly metadata objects, can be modeled in the O- information is stored inside the data warehouse. R database. • The information navigator metadata is a facility that allows users to browse through both the business 5 O-R Data Warehouse Metadata metadata and the data inside the data warehouse. “Metadata is data about data”, this definition is too Moreover, the metadata can be considered as two general to give someone the concept of metadata in a data classes, namely static and dynamic. Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa 3-4 • Static metadata: This kind of metadata is used to approach to realize this kind of metadata for O-R DW. document or browse in this system. E.g., metadata of Based on the facility with supporting of O-R database a dimension. The content of this metadata is fixed in vendors to Java programming language, we also code all the data warehouse. our examples in this language. • Dynamic metadata: vice versa to static metadata, Generally, the Metadata class is the base of any other dynamic metadata is metadata that can be generated metadata subclasses. It includes essential attributes and and maintained in run time. For instance, metadata of methods of the metadata subclass. Given the Metadata a new frequent access query. class in Java language as follow: Similarly to any data warehouse in relational, public class Metadata { multidimensional or object-oriented databases, the O-R // Base attribute and method of metadata data warehouse also has these kinds of metadata. Referring String Description; to the O-R database section, the Object-Relational Engine int AccessTime=0; is object-based environment; in the meanwhile, the data is … stored in relational database. Therefore, a new kind of private void increaseAccessTime () {…}; metadata that takes care of the mapping between object public int getAccessTime () {…}; environment and relational database must be held in this public String getDescription (){…}; system. …. } 5.2 A Star Schema in Relational Database In a metadata object, meta-information of this metadata object can be created and maintained in the object itself, ProductDimension e.g., the AccessTime attribute in metadata class. TimeDimension Product_ID GroceryStoreFact Starting from the atom item of a relation, a column, we Produce_Family Time_ID define Column class. An object of this class will be on Product_ID Day Product_Category behalf of an attribute in a relation. More details of this Store_ID Month Product_Name Time_ID Quarter class can be found in Column class document in JBuilder Quantity_Sold Week software. Dollar_Revenue Year StoreDimension Customer_Count Store_ID public class Column extends Metadata { Store_Name // Describing a column of a relation Store_City String columnName; Store_Country String dataType; …. Figure 3: An example of the star schema } The name “dimensional modeling” is considered as a In a relation table, there is no distinction from the order way to make database simple and understandable, of an attribute. However, in data warehouse and OLAP particularly for business information analysts. This systems, the presentation of data in hierarchy is needed for modeling includes fact and dimensions, which usually analytical processing. Therefore we need a mechanism to describe in a star schema. In relational database, every describe the data structure. In our approach, the metadata dimension or fact is stored in table. For example, to Hierarchy class realizes this function. It holds a link list of describe a star schema in figure 3 we have 4 tables: attributes, see figure 4, in a predefined order, which quite ProductDimension, StoreDimension, TimeDimension and depends on the point of view on the structures of a GroceryStoreFact with their corresponding attribute dimension. columns. Furthermore, the schema metadata that represents the dimension structures must be stored public class Hierarchy extends Metadata { somewhere in a table. // Describing a hierarchy of a dimension 5.3 Metadata for Star Schema in O-R DW // link list of Column object Vector ListOfLevels; Beside many kinds of metadata, in O-R environment, public void insertColumnAt(int at, Column col) {…}; we propose a new kind of metadata that maps and bridge …. gap between object environment and relational } environment. In the limit of this paper, we suggest an Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa 3-5 Now, in turn of FactTable class, it is defined to hold two lists of attributes. They are a dimension list being as a Produce_Family Produce_Category Produce_Name column column column list of Dimension objects and a fact list being as a list of Column objects. public class FactTable extends Metadata { Level 1 Level 2 Level 3 String factTableName; Vector listOfDimension; Vector listOfFact; Figure 4:A hierarchy of ProductDimension object … } TimeDimension Based on the definition of these classes, a star schema of a fact table can be formed in object schema. Given in figure 6, we have the object schema of the Year GroceryStoreFact fact table. The highest level is GroceryStoreFact object, which associates to three dimensions, TimeDimension, StoreDimension and Quarter ProductDimension. Each dimension has its own Hierarchy Week object(s). Moreover, dynamic metadata for O-R data warehouse Month can be also created and managed, for instance, to manage some frequent accessed queries. A metadata object Day mapping the query-to-query result is defined as follow: public class QueryResult extends Metadata { Figure 5: Hierarchies of TimeDimension String queryString; String tableName; With a multi-hierarchy dimension, e.g. multi-hierarchy … TimeDimension (figure 5), it requires a dimension object public boolean matchQuery( String qString) {…}; of Dimension class holding more than one Hierarchy … objects. In Java language, we can realize this requirement } by using a link list of objects. Let define the Dimension class as follow: 5.4 Metadata Storage in Relational Database public class Dimension extends Metadata { For storage, status of metadata objects are also stored // Describing a dimension and managed in the relational database. Although, some String dimensionName; database vendors support to work with O-O programming String dimensionTableName; languages, e.g., Java, storing codes of object methods in // link list of attributes of the dimension relational database usually require a complex process to Vector dimensionAttributeList; load or restore these codes. In our approach, only attributes // link list of Hierarchy object of these objects are stored. The following tables (from Vector listOfHierarchies ; table 1 to table 9) describe the storage repository. … States of Metadata objects are stored in the Metadata public Hierarchy getHierarchyAt(int at) {…}; table. At defining, all objects of subclasses of Metadata public Column getAttribute(String attName) {…}; class are Metadata objects, i.e., beside their additional … attributes; they also include all attributes as a Metadata } object. The values of Metadata object attributes are stored in table 1. In a table stored attributes of a sub-class objects, The one-to-one mapping from attributes of the there is a column, named M_id, used to store id of the dimension relation to the dimension attribute list is held in corresponding Metadata super-objects of these objects. dimensionAttributeList attribute, and the listOfHierarchies attribute is used to store list of Hierarchy objects of the dimension. Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa 3-6 GroceryStoreFact Object TimeDimension Object StoreDimension ProductDimension Object Object Time Hierarchy1 Time Store Hierarchy Produce Hierarchy Object Hierarchy2 Object Object Object Year Level 1 Level 1 Level 1 Store_Country Level 1 Product_Famely Quarter Level 2 Level 2 Level 2 Store_City Level 2 Product_Category Month Level 3 Level 3 Level 3 Level 3 Store_Name Product_Name Week Level 4 Day Figure 6: Object schema of the fact table 8 P_Family 0 Table 1: Storage attributes of Metadata objects … M_id Description AccessTime Table 2: Storage attributes of Column objects 1 Year 0 2 Quarter 0 C_id Name Datatype … M_id 3 Month 0 1 Year Char … 1 4 Day 0 2 Quarter Number … 2 5 Hierarchy_level1 0 3 Month Char … 3 6 Hierarchy_level2 0 4 Day Number … 4 7 P_Name 0 5 ProductName Char … 7 Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa 3-7 6 ProductFamily Char … 8 Table 7: FactMetadata table 7 StoreName Char … 17 8 StoreCity Char … 18 F_id M_id … 1 12 2 13 Table 3 and 4, together hold attributes of all hierarchy … objects. The Hierarchy table manages to store a link list of attributes ordered in hierarchy objects. The Table 8: Factlist table Hierarchy_Metadata table presents the relation of Hierarchy object with its Metadata super class object. F_id C_id 1 20 Table 3: Storage of attributes of Hierarchy object 1 10 1 22 H_id C_id H_id_next … 1 1 2 2 2 3 Table 9: Fact_Dimensionlist table 3 3 4 4 4 Null F_id D_id 5 6 6 1 1 … 1 2 1 3 Table 4: Hierarchy_Metadata table … H_id M_id 6 Conclusions 1 5 5 14 In this paper, we propose to realize the metadata that 9 19 shows a mapping between object environment and … relational environment in metadata layer of an O-R data warehouse. Various metadata classes are defined and The two next tables, 5 and 6, are used to manage the discussed their roles in the O-R data warehouse. The attributes of all dimensions metadata objects. metadata layer and the object-oriented approach together allow us to obtain many powerful characteristics for Table 5: Dimension table building an O-R data warehouse. Comparing to metadata of relational or multidimensional data warehouse systems, this metadata D_id Name … M_id layer plays an active role in maintaining the data in data 1 Time … 9 warehouse. With this mapping metadata, an O-R data 2 Store … 10 warehouse can be really designed and implemented 3 Product … 15 comparing to the object-oriented data warehouse … [BuSH98]. Table 6: Dimension_Hierarchy table References: D_id H_id 1 1 [AlGL98] J. Albrecht, H. Guenyel, W. Lehner, An 1 9 Architecture for Distributed OLAP, 2 7 International Conference on Parallel 3 5 and Distributed Processing Techniques … and Applications, 1998 (PDPTA'98) [BeMa93] E. Bertino, L. Martino, Object-Oriented The last three tables, 7, 8 and 9, store the attributes of Database Systems: Concepts and the FactTable metadata objects. Architectures, Addison-Wesley Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa 3-8 Publishing Com., 1993. [MaTW99] O. Mangisengi, A M. Tjoa, R. R. [BuSH98] J. W. Buzydlowski, Il-Yeol Song, Wagner, Metadata for Data Warehouses Lewis Hassell, A Framework for Using Extended Relational Models Object-Oriented On-Line Analytic Proc. of third IEEE Computer Society Processing, DOLAP 98 workshop, Metadata Conference, April 1999. 1998. [OHUS96] M. A. Olson, W. M. Hong, M. Ubell, [CoBA99] C Connolly, C. Begg, and A. Strachan, M. Stonebraker, Query Processing in a Database Systems, A practical Parallel Object-Relational Database Approach to Design, Implementation, System, Data Engineering Bulletin and Management, 2nd edition, Addision- 19(4): 3-10 (1996) Wesley, 1999. [Orr96] Ken Orr, Data Warehousing [FDCM+99] Y. Fuh, S. Deßloch, W. Chen, N. M. Technology, The Ken Orr Institute, A Mattos, B. T. Tran, et al: white paper, 1996. Implementation of SQL3 Structured [Que97] Que, The Official Client/Server Types with Inheritance and Value Computing Guide to Data Substitutability. VLDB 1999. Warehousing, Que Books, 1997. [Fire97a] J. M. Firestone, Object-Oriented Data [ReBS97] Red Brick Systems Inc., “Star Schema Warehousing, Executive Information Processing for Complex Queries”, Systems, Inc., white paper No. five, white paper 1997. August 1997. [Ston95] M. Stonebraker, Object-Relational [Fire97b] J. M. Firestone, Data warehouses and DBMS- The Next Great Ware, Morgan Data Marts: A Dynamic View, Kaufman Publishers, San Francisco, Executive Information Systems, Inc., CA, 1995. White paper No. three, March 1997. [Ston97] M. Stonebraker, Architectural Options [GoLK99] V. Gopalkrishnan, Q. Li, K. Karlapalen, for Object-Relational DBMSs, Informix Star/Snow-Flake Schema Driven White Paper, 1997. Object-Relational Data Warehouse [TrPa98] J. Trujillo, M. Palomar, An Object Design and Query Processing Oriented Approach to Multi- Strategies, Data Warehousing and dimensional Database Conceptual Knowledge Discovering First inter. Modeling (OOMD), DOLAP 98 Conf., DaWak’99. workshop, 1998. [Grim98] S. Grimes, Modeling Object/ Relational [WuBu97] M. Wu, A. P. Buchmann, Research Databases, DBMS, 1998. Issues in Data Warehousing, BTW [Gupt97] R. Gupta, An Introduction to Data 1997: 61-82. Warehousing, white paper, august 1997, http://www.system-services.com. [HuTj00] T. N. Huynh, A M. Tjoa, Architecture for Object-Relational Data Warehouse, accepted paper of the ICS2000 in Beijing. [Kimb98] R. Kimball, Meta Meta Data Data, DBMS magazine, March 1998. [KrBN99] V. Krishnamurthy, S. Banerjee, A. Nori, Bringing Object-Relational Technology to The Mainstream, SIGMOD Conference 1999: 513-514 [KRRT98] R. Kimball, L. Reeves, W. Thornthwaite, M. Ross, The Data Warehouse Lifecycle Toolkit, John Wiley & Sons, Inc., 1998. [Kulk94] K. G. Kulkarni: Object-Oriented Extensions in SQL3: A Status Report. SIGMOD Conference 1994: 478 Thanh N. Huynh, Oscar Mangisengi, A Min Tjoa 3-9