Repository Support for Data Warehouse Evolution Christoph Quix Informatik V, RWTH Aachen, Germany quix@informatik.rwth-aachen.de analysts using the data warehouse system to support them in decision making in the daily business work of an enter- Abstract prise. The nature of the work of managers and analysts im- plies that their requirements are often changing and do not Data warehouses are complex systems consist- reach a final state, i.e. their requirements are dynamic and ing of many components which store highly- subjective. They do not only demand faster response time aggregated data for decision support. Due to the to their queries (which may be achieved by ordering new role of the data warehouses in the daily business and faster hardware), they also want more information, e.g. work of an enterprise, the requirements for the access to data which are currently not present in data ware- design and the implementation are dynamic and house, or a higher quality of their data, e.g. query results subjective. Therefore, data warehouse design is a with less incorrect values. continuous process which has to reflect the chang- ing environment of a data warehouse, i.e. the data Therefore, a data warehouse can not be designed in warehouse must evolve in reaction to the enter- one step, usually it evolves over many years. A common prise’s evolution. Based on existing meta mod- methodology to construct data warehouses is to start with els for the architecture and quality of a data ware- some local data marts (e.g., one data mart for each depart- house, we propose in this paper a data warehouse ment). The knowledge acquired during this phase can be process model to capture the dynamics of a data used to construct in parallel a global enterprise schema for warehouse. The evolution of a data warehouse is the data warehouse. Data marts are usually easier to im- represented as a special process and the evolution plement than an enterprise-wide data warehouse, and after operators are linked to the corresponding architec- a relatively short time analysts can work with the system. ture components and quality factors they affect. The requirements of the analysts will grow in time, and We show the application of our model on schema after some time they want to make queries across several evolution in data warehouses and its consequences data marts of the departments. At this point, the enterprise- on data warehouse views. The models have been wide data warehouse comes into play: it can either be a implemented in the metadata repository Concept- virtual/distributed data warehouse, i.e. there is common in- Base which can be used to analyze the result of terface to data warehouse but the queries are delegated to evolution operations and to monitor the quality of the data marts, or a materialized data warehouse, which has a data warehouse. loaded the data from the data marts and other information sources. 1 Introduction In data warehouses, changes may happen or be required in many different situations. The data warehouse is usu- Data warehouses are complex systems consisting of many ally separated from the OLTP systems and the OLTP sys- components which store highly-aggregated data for deci- tems are important for the daily business of the enterprise. sion support. Most requirements stem from managers and Therefore, the data warehouse must be adapted to any The copyright of this paper belongs to the paper’s authors. Permission to changes which occur in the underlying data sources, e.g. copy without fee all or part of this material is granted provided that the changes of the schemata, changes of the physical location copies are not made or distributed for direct commercial advantage. of a data source, or a change of the time window for the ex- Proceedings of the International Workshop on Design and traction of source data. Beside these changes on the source Management of Data Warehouses (DMDW’99) level, the client level (analysts) often change their require- Heidelberg, Germany, 14. - 15.6. 1999 ments as already mentioned above. Furthermore, new ver- (S. Gatziu, M. Jeusfeld, M. Staudt, Y. Vassiliou, eds.) sions of software components may also require a change in http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-19/ the data warehouse. C. Quix 4-1 Data quality is also important in traditional online trans- agation. On the other hand, the process model is special- action processing systems (OLTP). In the research on these ized to deal also with evolution processes, which are pro- systems, techniques were developed to ensure a certain data cesses which evolve the data warehouse like the material- quality level. For example, most relational databases today ization of a new view or the addition of a new source. support referential integrity constraints and the SQL stan- The advantage of our proposed approach is that all rel- dard, so that data can be easily queried from the database, evant metadata of a data warehouse (architecture, quality, and the semantics of the result is well understood. How- process and evolution information) are stored in a central ever, online analytical processing (OLAP) has refocused repository. The different types of information are inter- the attention on data quality, because of several reasons. related, and therefore provide a semantically rich repre- First, in data warehouses the data are loaded from many sentation of the data warehouse. The query facilities of different sources and often problems with the format, en- our metadata repository ConceptBase [JGJ+95] enable data coding or interpretation of data are encountered. Further- warehouse users to analyze the data warehouse and to find more, data quality is always relative since the quality of deficiencies in architecture, quality, processes or the evolu- data depends on how the data are suited for a particular tion of the data warehouse. use. In OLTP systems, the intended use is known before This paper is structured as follows. In section 2, we first the system is designed and implemented and usually does recall the principles of the architecture and quality mod- not change over time. In contrast, the use of OLAP sys- els shown in [JJQV99] and [JQJ98] before we present the tems is not as static as in OLTP systems and may even be data warehouse process model. Section 3 specializes the not known at design time of the warehouse. The informa- process model to the case of data warehouse evolution. In tion demand of managers and analysts changes very often, section 4, we present related work which addresses evo- and if new information is required, it must be delivered in lution in data warehouses, in particular schema evolution. a short time to be useful [TB98]. Finally, we provide a summary and conclusions and give In the European DWQ project (Foundations of Data an outlook to future work. Warehouse Quality) [JV97], we have developed an archi- tecture and quality model for data warehouses [JJQV99]. 2 A meta model for Data Warehouse Archi- This model allows the representation of the data ware- tecture, Quality and Processes houses in three different perspectives: This section summarizes the nature of metadata used in the  the conceptual perspective, which represents an over- DWQ framework and gives an overview of the DWQ qual- all business perspective on the information resources ity model. In section 2.3, the framework is extended by a and analysis tasks of an enterprise, process model for data warehouses.  the logical perspective, which describes the schemata 2.1 Data Warehouse Architecture used in the sources, the data warehouse, and the data In the DWQ project we have advocated the need for en- marts, and riched metadata facilities for the exploitation of the knowl-  the physical perspective, which shows where the data edge collected in a data warehouse. In [JJQV99], it is is physically stored (host, disk, etc.). shown that the data warehouse metadata should track both architecture components and quality factors. Each perspective has three different levels: the source, The proposed categorization of the DW metadata is enterprise and client level. A central role in this model based on a 3x3 framework, depicted in figure 1: we iden- plays the enterprise model which should be a conceptual tified three perspectives (conceptual, logical and physical) representation of the data which is available in the enter- and three levels (source, data warehouse, client). We made prise. In [JJQV99], we also presented a preliminary ap- the observation, that the conceptual perspective, which rep- proach of linking quality information to the architecture resents the real world of an enterprise, is missing in most model. This approach was extended and more formally data warehousing projects, with the risk of incorrectly rep- presented in [JQJ98] and is shortly summarized in section resenting or interpreting the information found in the data 2. warehouse. Our models represent only a “snapshot” of a data ware- The proposed metamodel (i.e. the topmost layer in fig- house system without taking into account anything of the ure 1) provides a notation for data warehouse generic en- dynamics in a data warehouse environment. In this paper, tities, such as schema or agent, including the business per- we want to describe how our repository approach devel- spective. Each box shown in figure 1 is decomposed into oped in [JJQV99] and [JQJ98] can be extended to deal also more detailed data warehouse objects in the metamodel of with the dynamics of a data warehouse. On the one hand, [JJQV99]. This metamodel is instantiated with the meta- this includes a process model which represents the usual data of the data warehouse (i.e. the second layer in figure data warehouse processes like data loading or update prop- 1), e.g. relational schema definitions or the description of C. Quix 4-2 Conceptual Logical Physical Purpose Perspective Perspective Perspective direction description String Quality Client Level ! GoalType forPerson Stake Meta holderType concreteBy DW Level Model dimension prefers imposedOn Level Quality Quality Source Level ? QuestionType Dimension evaluates isSubDimOf dependsOn dimension in ObjectType Quality expected FactorType PowerSet Models/ onObject achieved of Meta Data when unit Level Quality Measuring DomainType Timestamp MetricUnit Agent Type hasResult in in Figure 2: DWQ Quality Meta Model [JQJ98] goal resolution is based on the evaluation of the compos- Real ing quality factors, each corresponding to a given quality World question, (iii) quality questions are implemented and exe- cuted as quality queries on the semantically rich metadata Figure 1: The Data Warehouse Architecture Meta Model repository.  Figure 2 shows the DWQ Quality Model . The class the conceptual data warehouse model. The lowest layer in “ObjectType” refers to any meta-object of the DWQ frame- figure 1 represents the real world where the actual data re- work depicted in the first layer of figure 1. A quality goal side: in this level the metadata are instantiated with data is an abstract requirement, defined on an object types, and instances, e.g. the tuples of a relation or the objects of the documented by a purpose and the stakeholder interested in. real world which are represented by the entities of the con- A quality goal roughly expresses natural language require- ceptual model. ments like “improve the availability of source s1 until the end of the month in the viewpoint of the DW administra- 2.2 Quality Meta Model tor”. Quality dimensions (e.g. “availability”) are used to classify quality goals and factors into different categories. Each object in the three levels and perspectives of the ar- Furthermore, quality dimensions are used as a vocabulary chitectural framework can be subject to quality measure- to define quality factors and goals; yet each stakeholder ment. Since quality management plays an important role might have a different vocabulary and different preferences in data warehouses, we have incorporated it into our meta- in the quality dimensions. Moreover, a quality goal is op- modeling approach. Thus, the quality model is part of the erationally defined by a set of questions to which quality metadata repository, and quality information is explicitly factor values are provided as possible answers. As a result linked with architectural objects. This way, stakeholders of the goal evaluation process, a set of improvements (e.g. can represent their quality goals explicitly in the metadata design decisions) can be proposed, in order to achieve the repository, while, at the same time, the relationship be- expected quality [VBQ99]. A quality factor represents an tween the measurable architecture objects and the quality actual measurement of a quality value, i.e. it relates quality values is retained. values to measurable objects. A quality factor is a special The DWQ quality metamodel [JQJ98] is based on the property or characteristic of the related object with respect Goal-Question-Metric approach (GQM) of [OB92] orig- to a quality dimension. It also represents the expected range inally developed for software quality management. In of the quality value, which may be any subset of a qual- GQM, the high-level user requirements are modeled as ity domain. Dependencies between quality factors are also goals. Quality metrics are values which express some mea- stored in the repository. Finally, the method of measure- sured property of the object. The relationship between ment is attached to the quality factor through a measuring goals and metrics is established through quality questions. agent. The main difference in our approach resides in the fol- The quality meta-model is not instantiated directly with lowing points: (i) a clear distinction between subjective  The different colors in this and the following figures refer to the ab- quality goals requested by stakeholder and objective qual- straction level of the object: meta-meta classes are white, meta-classes are ity factors attached to data warehouse objects, (ii) quality light-gray, simple classes are dark-gray, and data objects are black. C. Quix 4-3 composed concrete quality factors and goals, it is instantiated with next Of patterns for quality factors and goals. The use of this inter-  hasResult Process forPerson Stake- mediate instantiation level enables data warehouse stake- Domain Type holderType holders to define templates of quality goals and factors. For isA example, suppose that the analysis phase of a data ware- works executed By house project has detected that the availability of the source affects On Process database is critical to ensure that the daily online transac- StepType onObject tion processing is not affected by the loading process of QualityFactor ObjectType the data warehouse. A source administrator might later in- Type stantiate this template of a quality goal with the expected availability of his specific source database. Thus, the pro- Figure 3: A process model for data warehouses grammers of the data warehouse loading programs know processes or process steps which may be further decom- the time window of the update process. posed. Process steps and the processes itself are executed Based on the meta-model for data warehouse architec- in a specific order which is described by the “next” relation tures, we have developed a set of quality factor templates between processes. A process works on an object type, e.g. which can be used as a initial set for data warehouse qual- data loading works on a source data store and a data ware- ity management. The exhaustive list of these templates house data store. The process itself must be executed by can be found in [QJJ+98]. In [VBQ99], we have shown some object type, usually an agent which is represented in a methodology for the application of the architecture and the physical perspective of the architecture model. The re- quality model. The methodology is an adaptation of the sult of a process is some value of a domain, the execution of Total Quality Management approach [BBBB95] and con- further processes may depend on this value. For example, sists of the following steps: the data loading process returns as a result a boolean value representing the completion value of the process, i.e. if it  design of object types, quality factors and goals, was successful or not. Further process steps like data clean-  evaluation of the quality factors, ing are only executed if the previous loading process was successful. The process is linked to a stakeholder which  analysis of the quality goals and factors and their pos- controls or has initiated the process. Moreover, the result sible improvements, and of a process is the data which is produced as an outcome of the process, e.g. the tuples of a relation.  re-evaluation of a quality goal due to the evolution of Processes affect a quality factor of an object type, e.g. data warehouse. the availability of data source or the accuracy of a data store. It might be useful to store also the expected effect on The basic idea of [VBQ99] is to add (analytical) func- the quality factor, i.e. if the process improves or decreases tions to the quality model which formalize the dependen- the quality factor. However, the achieved effect on the qual- cies between the quality factors. Their inverse functions are ity factor can only be determined by a new measurement of use to find possibilities for the improvement of data ware- this factor. A query on the metadata repository can then house quality. search for the processes which have improved the quality of a certain object. 2.3 A Quality-Oriented Data Warehouse Process The processes can be subject to quality measurement, Model too. Yet, the quality of a process is usually determined by As described in the previous section it is important that the quality of its output. Therefore, we do not go into detail all relevant aspects of a data warehouse are represented with process quality but quality factors can be attached to in the repository. Yet the described architecture and qual- processes, too. ity model does not represent the workflow which is nec- As an example for a data warehouse process we have essary to build and run a data warehouse, e.g. to integrate partially modeled the data warehouse loading process in data source or to refresh the data warehouse incrementally. figure 4. The loading process is composed of several steps, Therefore, we have added a data warehouse process model of which one in our example is data cleaning. The data to our meta modeling framework. Our goal is to have a sim- cleaning process step works on a data store, where the data ple process model which captures the most important issues which have to be cleaned reside. It is executed by some of data warehouses rather than building a huge construction data cleaning agent. It affects among others the quality fac- which is difficult to understand and not very useful due to tors accuracy and availability, in the sense that accuracy is its complexity. hopefully improved and availability is decreased because Figure 3 shows the meta model for data warehouse pro- of locks due to read-write operations on the data store. The cesses. A data warehouse process is composed of several data cleaning process may also store some results of its ex- C. Quix 4-4 Boolean completed TRUE completed Successfully Successfully numChanged composed numChanged Tuples Integer Cleaning Of Tuples Cleaning composed Cleaning 5465 Of Cleaning April 15,99 Cleaning improves Cleaning DW Loading Cleaning DW Loading Cleaning April 15, 99 DataStore decreases works executed Accuracy On By works executed On By DataCleaning DataStore DataStore Agent Employee Employee Availability DataStore Cleaner Figure 4: An example for a data warehouse process pattern Figure 5: Trace of a data warehouse process ecution in the metadata repository, for example, a boolean value to represent the successful completion of the process 3 Data Warehouse Evolution and the number of changed tuples in the data store. As al- This section presents a framework for data warehouse evo- ready mentioned in section 2.2, the first instantiation level lution. It is based on the process model for data warehouses provides only a pattern for data warehouse processes, and presented in the previous section. We will first discuss what not the “real” processes. The data of a “real” process is types of evolution may occur in data warehouse. Finally, stored as an instance of this pattern (see below). we will present the application of our framework to the evo- The information stored in the repository may be used lution of data warehouse views. to find deficiencies in data warehouse. To show the use- fulness of this information we use the following query. It 3.1 Evolution in a Data Warehouse Environment returns all data cleaning processes which have decreased A data warehouse is a very complex system whose compo- the availability of a data store according to the stored mea- nents evolve frequently independently of each other. Users surements. The significance of the query is that it can show can create new views or update old ones. Some sources that the implementation of data cleaning process has be- may disappear while others are added. The enterprise come inefficient. model can evolve with the enterprise objectives and strate- GenericQueryClass DecreasedAvailability gies. The technical environment changes with evolution of isA DWCleaningProcess with parameter products and updates. Design choices at the implementa- ds : DataStore tion level can also evolve to achieve users requirements and constraint c : $ exists qf1,qf2/DataStoreAvailability administration requirements. t1,t2,t3/TransactionTime v1,v2/Integer (qf1 onObject ds) and (qf2 onObject ds) and The data stores can produce changes due to rea- (this worksOn ds) and (this executedOn t3) and sons of schema evolution in the logical and concep- (qf1 when t1) and (qf2 when t2) and (t1 v2) $ the source (e.g. location, performance etc.), insertions or end deletions of data stores, and other reasons particular to The query has a data store as parameter, i.e. the query their nature (e.g. in the sources, the time window for will return only cleaning processes which are related to the extraction or the data entry process can change). The specified data store. The query returns the processes which software components can be upgraded, completed, de- have worked on the specified data store and which were bugged, etc. The propagation agents of all types (load- executed between the measurements of quality factors qf1 ers/refreshers/wrappers/mediators/source integrators) can and qf2, and the measured value of the newer quality fac- obtain new schedules, new algorithms, rules, physical tor is lower than the value of the older quality factor. The properties, etc. Needless to say that the user requirements query can be formulated in a more generic way to deal with continuously change, too. New requirements arise, while all types of data warehouse processes but for reasons of old ones may become obsolete, new users can be added, simplicity and understandability, we have shown this more priorities and expected/acceptable values change through special variant. the time, etc. Moreover, the business rules of an organiza- Finally, figure 5 shows the trace of a process at the in- tion are never the same, due to changes in the real world. stance level. The process pattern for DW Loading has been As a result of evolution and errors, our goals, compo- instantiated with a real process, which has been executed nents, and quality factors are never to be fully trusted. Each on the specified date “April 15, 1999”. An instantiation of time we reuse previous results we must always consider the links to the quality factors is not necessary, because the cases like: lack of measurement of several objects, errors information that “data cleaning” affects the accuracy and in the measurement procedure (e.g. through an agent which the availability of a data store is already recorded in the is not appropriate), outdated information of the repository process pattern shown in figure 4. with respect to the data warehouse, etc. C. Quix 4-5 next composed DW affects Add DW Of Completeness Relation composed hasResult Process forPerson Stake- Of Domain next Type holderType DataStore Evaluate isA Availability View Materialize works next View affects On Process DataStore Store View StepType Minimality Extent worksOn ... QualityFactor onObject ObjectType next Source DW isA Type DW Store View DataStore DataStore Evolution Interpretability Meta Data ProcessType isA Figure 7: DW Evolution: Materialization of a view Evolution OperatorType warehouse views. Our goal is to provide a framework for the quality-oriented evolution of a data warehouse and the Figure 6: DW process model specialized for DW Evolution existing techniques are integrated into our framework to make use of the semantically rich meta database. 3.2 A Meta Model for Data Warehouse Evolution One application of our framework is the monitoring of A way to control the evolution in data warehouses is to pro- data warehouse quality under the evolving environment of vide complementary meta-data which tracks the history of a data warehouse. As described in section 2.2 quality mea- changes and provides a set of consistency rules to enforce surements should be repeated periodically to monitor how when a quality factor has to be re-evaluated. To do so, it the quality of the data warehouse evolves. In addition to is necessary to link quality factors to evolution operators the architecture and quality model, the meta model for data which affect them. The idea behind this is to enrich the warehouse evolution keeps track of the (evolution) pro- meta-data repository in order to ease the impact analysis of cesses which have changed the configuration of the data each evolution operator and its consequences on the quality warehouse or have changed the data of the warehouse. factor measures. With this information, it is possible to trace the evolution of Our meta model for data warehouse evolution is a spe- the data warehouse. If a quality problems occurs, the meta- cialization of the data warehouse process model (see figure data repository can be used to find the (evolution) process 6). An evolution process is composed of evolution opera- which has caused the quality problem. In the rest of this tors, but also of “normal” data warehouse processes. For section, we will use the example for the evolution of data example, the materialization of new data warehouse is an warehouse views to show the usefulness of our approach. evolution process of the data warehouse (cf. figure 7). This In [CNR99] a taxonomy for schema evolution operators process includes the schema evolution operations such as in object-oriented databases is given. We have adapted this “Add a new relation to the data warehouse schema” as well taxonomy to relational databases, which are often used in as the loading, extraction and writing process to evaluate data warehouses. Table 1 summarizes the evolution oper- the view and store its extent. ators for base relations and views, and relates them to the The example shown in figure 7 is also a pattern for an quality factors which are affected by this evolution opera- evolution process like the example in figure 4 is a pattern tor. for a data warehouse process. Therefore, the pattern has to The evolution operators for base relations and views in be further instantiated with an evolution process which has data warehouse mainly work on the representation of the been executed on the data warehouse system. The infor- relation in the logical perspective of the architecture model, mation stored in the metadata repository can then be used i.e. the relation itself and the logical schema it belongs to. to analyze the impact of certain evolution operations on the Moreover, they affect the physical objects where the data data warehouse. of the relation is stored or where the view is materialized, i.e. the data stores. In addition, if there exists another view 3.3 Evolution of Data Warehouse Views which is based on the evolved relation or view, then the To be useful, the described framework for data warehouse view definition, the materialization of the view, and the evolution must be filled with patterns of evolution pro- maintenance procedure must be updated, too. cesses. As an example, we will discuss the evolution of The completeness, correctness and consistency of the views in data warehouses. The evolution of data ware- logical schema with respect to conceptual model are the house views has been studied recently in the research fields most important quality factors affected by these evolution of schema evolution [RLN97, Bell98, Blas99] and main- operators. Furthermore, the deletion of a base relation or an tenance of data warehouse views under view redefinition attribute might have a positive impact on the minimality or [GMR95]. In this section, we do not provide a new tech- the redundancy of the logical schema. The renaming of at- nique for schema evolution or view maintenance of data tributes and relations to more meaningful names improves C. Quix 4-6 Table 1: Evolution Operators for base relations and views in DWs and their effect on DW quality Evolution Operator Affects Quality Factor Works On Add base relation/view - Completeness, correctness and consistency of the log- - Relation ical schema wrt. the conceptual model - Logical Schema - Usefulness of schema - Data Store - Availability of the data store Delete base relation/view - Minimality of logical schema - Relation, Log. Schema - Completeness, correctness and consistency of the log- - Data Store ical schema wrt. the conceptual model - View - Availability of data store - View Maintenance Agent Add attribute to base - Completeness, correctness and consistency of the log- - Relation relation/view ical schema wrt. the conceptual model - Data Store - Interpretability of the relation - View - Redundancy of the attributes - View Maintenance Agent Delete attribute from base - Completeness, correctness and consistency of the log- - Relation relation/view ical schema wrt. the conceptual model - Data Store - Interpretability of the relation - View - Redundancy of the attributes - View Maintenance Agent Rename Relation, View, or - Interpretability and understandability of the relation - Relation, View Attribute and their attributes - Data Store, VM Agent Change of attribute domain - Interpretability of data - Relation, View - Data Store, VM Agent Add Integrity Constraint - Credibility and Consistency of data in data store - Logical Schema - Data Store Delete Integrity Constraint - Consistency of data wrt. integrity constraints - Logical Schema - Data Store Change to view definition - Completeness, correctness and consistency of the log- - View ical schema wrt. the conceptual model - Data Store - Usefulness of schema - View Maintenance Agent the interpretability and the understandability of the logical are instances of the process type RelationalEvolutionPro- schema. The change of the domain of an attribute to a more cess. A similar view might be useful for data warehouse applicable domain, e.g. changing the domain from string to administrators which notifies them if base relations have date, improves the interpretability of data. New integrity changed. Our repository system ConceptBase is able to constraints in the logical schema may improve the credi- maintain views on the metadata and supports the notifica- bility and the consistency of the data. Finally, if the view tion of external client applications if a view has changed definition is changed without an impact on the structure of they are interested in [SQJ98]. the view (e.g. the WHERE clause in a SQL statement is changed) the view may become useful for more client ap- 3.4 Case Study plications. As an example to show the usefulness of the data ware- In [Lehm97], a commercial case study is described in house evolution model, we suppose that an analyst has de- which an early version of the approach described above has tected that the views he is using are changed often, and that been used to link changes in the definition of materialized he wants to get notified about future changes. We can es- views of the data warehouse to changes in the view main- tablish a view on the metadata repository for the analyst tenance strategy. which monitors the changes to the view he is interested in. We have developed a tool for data warehouse design at the relational level, aiming at several data warehouse qual- View EvolutionOperationsOnView ity goals like reusability of solutions, sufficient and flexible isA RelationalEvolutionProcess with parameter freshness of data, ability for evolution of source or data v : DWView warehouse schemas, and clear process definitions for data constraint integration and refreshment. Prior to the development of c: $ (this worksOn v) $ end this tool, especially the goal of flexibility was hampered by the need to re-program scripts whenever schema or policy This view returns all evolution operations which are changes happened. made to the given data warehouse view assuming that all Due to the constraints in the project, we decided to evolution processes concerning relational schema evolution decompose the data warehouse views into several self- C. Quix 4-7 maintainable views [HZ96]. The design tool records the The second perspective which addresses the problem of schema definitions of the source systems and the view defi- evolution of data warehouse views, is maintenance of the nitions of the warehouse. It then decomposes the views and extent of a view. In [GMR95], the problem of incremental creates automatically the SQL statements to initialize and view maintenance under view redefinition is studied. An incrementally maintain the views. overview and a taxonomy of view maintenance problems is If the schema of the sources or of the data warehouse given in [GM95]. [HMV99] studies the problem of main- has changed, only little effort is necessary to update the taining multi-dimensional data cubes under dimension up- maintenance processes, i.e. only the SQL code has to be dates. They define a basic set of operators why modify the re-generated. However, we did not support the adaptation dimensions of a data cube. Moreover, they provide an al- of the tuples in the relations to the new schema. gorithm for maintaining the data cube under these update The tool has been integrated into a commercial product operations. suite for sales force automation and has significantly re- duced the effort of data warehouse maintenance [JQB+99]. 5 Conclusions We have extended our meta modeling framework for data 4 Related Work warehouse architecture and quality by a model for data An approach for the management of views in a federated warehouse processes and have specialized this model to the database system is proposed in [KGF98]. The approach case of data warehouse evolution. In detail, we have ad- is based on a knowledge base which stores what informa- dressed the problem of evolution of data warehouse views. tion is available in the federated database, how it has been The management of the metadata in our repository sys- combined previously, and how the information is related tem ConceptBase allows us to query and analyze the stored semantically. A workbench of tools assist users to create metadata for errors and deficiencies. In addition, features and evolve the knowledge base and their views on the fed- like client notification and active rules of ConceptBase sup- erated database system. port the maintenance of the data warehouse components Research in data warehouses addresses the evolution and keep data warehouse users up-to-date on the status of problem from two different perspectives. The first as- the data warehouse. pect is the schema evolution of base relations and views, In the DWQ project, we are currently studying some which has been studied in [Bell98], [RLN97] and [Blas99]. data warehouses processes like update propagation, query- [Bell98] provides a set of algorithms to maintain the defini- ing, and conceptual design. Furthermore, the different tions of views if the schema of the base relations is chang- types of data warehouse evolution mentioned in section 3.1 ing. Furthermore, different versions of a view are con- have to be studied in more detail. In this context, the pro- structed and maintained if the view definition has changed. posed models will be refined and extended to cover new The versioning of views is necessary because not every aspects of data warehouse processes. A validation of the client application of the data warehouse can be adapted to data warehouse process model with one of our industrial the new version of the view. cooperation partners - a small data warehouse application [RLN97] provides a taxonomy of view adaptations vendor - is also planned for the future. problems. The taxonomy is based upon the types of changes to the view, the desired level of view adaptabil- Acknowledgments ity in the context of changes, and the changes related to This research is sponsored by the European Esprit Project the base information system, e.g. data updates, capability “DWQ: Foundations of Data Warehouse Quality”, No. changes or metadata changes. They present an environ- 22469. We would like to thank all our DWQ partners ment for the view synchronization problem, i.e. the view who contributed to the progress of this work, and es- definition adaptation is triggered by capability changes of pecially Matthias Jarke, Manfred A. Jeusfeld, Mokrane information systems. Other work in the context of schema Bouzeghoub, and Panos Vassiliadis. evolution has been devoted to the evolution of schemas in object-oriented databases like [CNR99]. [Blas99] presents a framework for the evolution of con- References ceptual multidimensional schemata. In this approach, the [BBBB95] D.H. Besterfield, C. Besterfield-Michna, data warehouse is designed and maintained at a concep- G. Besterfield and M. Besterfield-Sacre, Total tual level. Each evolution operation at the conceptual level Quality Management, Prentice Hall, 1995. has well-defined semantics and is mapped to a physical im- plementation level. The framework supports among other [Bell98] Z. Bellahsène. Structural View Maintenance in features the automatic adaptation of instances, change no- Data Warehousing Systems. Journées Bases de tification for applications, and forward compatibility of Données Avancées (BDA ’98), Tunis, October schemata. 1998. C. Quix 4-8 [Blas99] M. Blaschka. FIESTA: A Framework for [JQJ98] M.A. Jeusfeld, C. Quix, M. Jarke. Design Schema Evolution in Multidimensional Infor- and Analysis of Quality Information for Data mation Systems. In Proc. 6th CAiSE Doctoral Warehouses. In Proc. of the 17th Interna- Consortium, Heidelberg, Germany, June 1999. tional Conference on the Entity Relationship Approach (ER’98), Singapore, 1998. [CNR99] K.T. Claypool, C. Natarajan, E.A. Run- densteiner. Optimizing the Performance of [JV97] M. Jarke, M. Vassiliou. Foundations of data Schema Evolution Sequences. Technical Re- warehouse quality: an overview of the DWQ port WPI-CS-TR-99-06, Worcester Polytech- project. In Proc. of the 2nd International Con- nic Institute, Dept. of Computer Science, ference on Information Quality, Cambridge, March 1999. Mass, 1997. [KGF98] D.D. Karunaratna, W.A. Gray, N.J. Fid- [GM95] A. Gupta, I.S. Mumick. Maintenance of Ma- dian. Organising Knowledge of a Feder- terialized Views: Problems, Techniques, and ated Database System to Support Multiple Applications. IEEE Data Engineering Bulletin, View Generation. In Proc. 5th KRDB Work- Special Issue on Materialized Views and Ware- shop (Knowledge Representation meets Data housing, 18(2), 1995. Bases), Seattle, May 1998. [GMR95] A. Gupta, I.S. Mumick, K.A. Ross. Adapt- [Lehm97] D. Lehmann. View Maintenance in a Data ing Materialized Views after Redefinitions. In Warehouse Environment. Diploma Thesis, Proc. ACM SIGMOD International Confer- RWTH Aachen, November 1997 (in german). ence on Management of Data, pp. 211–222, 1995. [OB92] M. Oivo, V. Basili. Representing software en- gineering models: the TAME goal-oriented ap- [HMV99] C.A. Hurtado, A.O. Mendelzon, A.A. Vais- proach. IEEE Transactions on Software Engi- man. Maintaining Data Cubes under Dimen- neering, 18(10), 1992. sion Updates. In Proc. 15th Intl. Conference on Data Engineering (ICDE ’99), Sidney, Aus- [QJJ+98] C. Quix, M. Jarke, M.A. Jeusfeld, tralia, 1999. M. Bouzeghoub, D. Calvanese, E. Franconi, M. Lenzerini, U. Sattler, P. Vassiliadis. Quality [HZ96] R. Hull, G. Zhou. A framework for support- Oriented Evolution of DW Designs. Technical ing data integration using the materialize and Report, DWQ 9.1, DWQ Consortium, 1998. virtual approaches. Proc. ACM SIGMOD Intl. [RLN97] E.A. Rundensteiner, A.J. Lee, A. Nica. On Pre- Conf. Management of Data, Montreal, Canada, serving Views in Evolving Environments. In 1996. Proc. 4th KRDB Workshop (Knowledge Rep- [JGJ+95] M. Jarke, R. Gallersdörfer, M.A. Jeusfeld, resentation meets Data Bases), Athens, 1997. M. Staudt and S. Eherer. ConceptBase - a [SQJ98] M. Staudt, C. Quix, M.A. Jeusfeld. View Main- deductive object base for meta data manage- tenance and Change Notification for Applica- ment, Journal of Intelligent Information Sys- tion Program Views. ACM Symposium on Ap- tems, 4(2), 1995, pp. 167–192. plied Computing, Atlanta, Georgia, 1998. [JJQV99] M. Jarke, M.A. Jeusfeld, C. Quix, P. Vassil- [TB98] G.K. Tayi, D.P. Ballou. Examining Data iadis. Architecture and Quality in Data Ware- Quality. Communications of the ACM, 41(2), houses: An Extended Repository Approach. pp. 54–57, Feb. 1998. Information Systems, 24(3), pp. 229–253, 1999 (a previous version appeared in Proc. of the [VBQ99] P. Vassiliadis, M. Bouzeghoub, C. Quix. To- 10th Conference on Advanced Information wards Quality-oriented Data Warehouse Usage Systems Engineering (CAiSE ’98), pp. 93–113, and Evolution. In Proc. of the 11th Conference Pisa, Italy, 1998). on Advanced Information Systems Engineering (CAiSE ’99), Heidelberg, Germany, 1999. [JQB+99] M. Jarke, C. Quix, G. Blees, D. Lehmann, G. Michalk, S. Stierl. Improving OLTP Data Quality Using Data Warehouse Mechanisms. In Proc. ACM SIGMOD Intl. Conf. Manage- ment of Data, Philadelphia, PA, 1999. C. Quix 4-9