<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Gulliver in the land of data warehousing: practical experiences and observations of a researcher</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Panos Vassiliadis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Database Systems Laboratory</institution>
          ,
          <addr-line>Zografou 15773, Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Electrical and Computer Engineering, Computer Science Division</institution>
          ,
          <addr-line>Knowledge</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Technical University of Athens</institution>
        </aff>
      </contrib-group>
      <fpage>12</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>The gap between researchers and practitioners is widely discussed in the IT community. The purpose of this paper is towards showing the issues which occupy both research and practice, and the extent to which these issues have any overlap, in the field of data warehousing. To achieve this goal we first present the current status and tendencies in data warehouse research. Then we list several practical problems as they appear in the relevant literature, based also on our personal experience. Finally, we try to give the relationship of research and practice into a unified big picture.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The gap between researchers and practitioners is widely
discussed in the IT community. The situation regarding
data warehousing seems to follow the general pattern
where practitioners complain that their practical problems
are overlooked by research and researchers are generally
unsatisfied by the acceptance of their ideas in industry.
Let us quote some abstracts from the results of the
previous DMDW workshop [GJSV99]: «Although many
solutions were developed for interesting subproblems...
combining these partial and often very abstract and formal
solutions to an overall design methodology and
The copyright of this paper belongs to the paper’s authors. Permission to copy
without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage.</p>
      <sec id="sec-1-1">
        <title>Proceedings of the International Workshop on Design and</title>
      </sec>
      <sec id="sec-1-2">
        <title>Management of Data Warehouses (DMDW'2000)</title>
        <p>Stockholm, Sweden, June 5-6, 2000
(M. Jeusfeld, H. Shu, M. Staudt, G. Vossen, eds.)
warehousing strategy is still left to the practitioners...»,
«... the influence of the research results on the commercial
stream of data warehouse products is very limited...»,
«The gap between data warehouse practice and research
became obvious ...». The purpose of this paper is towards
showing the issues which occupy research and practice,
and the extent to which these issues have any overlap.
The ultimate goal is to show possible new areas of
research, based on practical problems and at the same
time to give an idea of how practice could benefit from
research results which seem to be rather ignored.
To this end we will divide the paper in three parts. The
first part appears in Section 2, where we present the
«good news» for data warehousing and more specifically,
the current status of the data warehouse industry in terms
of profit and sales, as well as the status of research. To
present the status of the research we have listed and
classified the papers relevant to data warehousing in three
major database conferences during the last five years and
tried to show the tendencies of the research based on this
study. The second part of the paper deals with problems
and failures during data warehouse projects and appears
in Section 3. The discussion is based both on the relevant
literature (which is surprisingly small) and on the author’s
personal experiences. Based on the problems which we
detect in the previous paragraphs, we then proceed to
relate the data warehouse lifecycle with potential
problems and solutions proposed by the research
community. Finally, we give some concluding remarks on
the reasons for the gap between the research and practice
communities.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. The Good News: Money and Research</title>
      <p>There are good news for the data warehouse field: sales
are increasing with high rates and research is achieving a
standard focus on the field. We will briefly summarize the
importance of the field by mentioning the financial
figures in subsection 2.1 and quickly proceed to
subsection 2.2 where we discuss the main subject of this
section, which is the status and the tendencies of the
research in data warehousing.
2.1</p>
      <sec id="sec-2-1">
        <title>The Money</title>
        <p>Selling products related to data warehousing is a business
making money. As mentioned in a report by Merril Lynch
at the end of 1998 [ShTy98], the estimation was that the
data warehousing market was going to expand in the next
few years. The numbers are surprisingly large: the data
mart market was expected to have a 40% compounded
annual growth rate (CAGR) and the RDBMS sales for
data warehouse purposes a CAGR of 25%, reaching total
sales of $2.2 billion dollars. The OLAP report [Pend00]
mentions that the sales have reached $2.5 billion dollars
for OLAP tools (including implementation services) and
they are expected to grow with 20% rate in 2000 and a
CAGR of 19% for a five-year period. Fig. 1 shows the
estimated sales, along with the CAGR for six categories
of tools.
papers could fit in more than one categories; still we
followed a naïve approach and attributed each paper to
only one category. Naturally, we do not claim to be
perfect: it is possible that some papers can be left out of
our study, or classified under a category which was not
the most suitable. We apologize in advance for any such
occurrences, although we scrutinized the proceedings to
avoid this kind of problems. Also, it is possible that the
contribution of a paper in one category, could be
accompanied by results in another “correlated” category.
We believe that the results which we present are not far
from the ones which could be produced from a more
elaborate categorization of the paper, which would take
this issue into consideration. Still, there is no proof for
this statement and the issue remains open (although we
believe it is outside the scope of this paper).</p>
        <p>As one can see in Fig. 2, the number of papers seems to
reach stability. Although the research interest is rather
young (only 5 years old) we anticipate that the tendency is
Research in the field of data warehousing is flourishing.
Sessions dedicated to data warehousing have appeared in
most of the major conferences of the data management
discipline. Several workshops have appeared [GJSV99,
DOLAP] and there is even a dedicated conference for
data warehouse issues [DaWaK].</p>
        <p>To obtain an overview of the tendencies of research in the
past five years we have selected three prestigious database
conferences, namely PODS, SIGMOD and VLDB and
classified their papers which are relevant to the data
warehouse area. We included any papers we found
relevant to data warehousing, except for the ones relevant
to data mining (to retain a clear-cut separation between
the two fields). We restricted ourselves to just three
conferences, since our goal is to give a general feeling of
the situation in the research field, rather than conduct a
thorough survey of the topic. Based on the content of the
papers, we classified them to several categories, shown in
Fig. 3. For reasons of better presentation and
understanding, we group these categories to larger groups,
referred to as “super-categories”. Of course, several
to keep a standard number of papers in the major
conferences. The drop in the number of papers in 1998
could be easily justified due to the strange explosion in
the number of papers relevant to data mining during that
particular year. It is very interesting to see that during the
last five years there have been 99 relevant papers relevant
to data warehousing, which makes 20 papers per year on
average.</p>
        <p>We have identified 22 categories of research fields where
the interest of the researchers has been drawn. In the
sequel, we list the most popular out of them (Fig. 4).
- Data warehouse design: the problem lies in detecting
the set of views to materialize in the data warehouse, in
order to achieve the optimal operational cost (i.e., the
combined cost of querying and refreshing the contents
of the warehouse).
- Query rewriting: the problem lies in reusing existing
views, to rewrite a query posed over the sources. An
alternative name for the problem could be ‘Answering
queries using views’.
- Integration: this is a wide area covering several issues.</p>
        <p>The general context is that several sources containing
operational data exist in the environment of the data
warehouse and a unique interface must be provided in
order to query / update them. The problem of
integration is definitely larger than the area of data
warehousing, especially with the current advances in
the Web technology. Note that in our survey we
excluded all papers on integration that seemed clearly
oriented towards semi-structured or Web data.
time. One can see a dropping interest in the view
technology issues, which is rather normal since people
originally thought of data warehouses as collections of
materialized views. Although we believe that this attitude
is still present in the research community, there seems to
be a level of saturation in the problems regarding view
technology.</p>
        <p>Number of Papers by Year
30
25
20
15
10
5
0
No. of Papers
- Processing for relational aggregates: the area includes
structures and algorithms for the efficient processing of
aggregate queries. We discriminate this area from
query rewriting, in the sense that these papers deal with
results that could directly be implemented in a DBMS.
We also discriminate the area from the papers
involving processing for cubes, which we found more
focused in MOLAP databases.
- View maintenance: the problem lies in keeping the data
warehouse views in accordance with the changes
happening in the source data.</p>
        <p>The big picture of the area is made clear in Fig. 5,
classifying the papers in higher-level super-categories.
The classification is based on the grouping of Fig. 3.
The most popular super-categories so far have been Query
Processing, View technology, Integration and
Redundancy Exploitation. Query processing involves all
techniques to efficiently process requests and answer
queries. It involves six categories and 29% percent of the
research performed in the past years. View technology is
also a large category, focused on view maintenance
techniques as well as the physical data warehouse design
process. Integration, which has been previously described,
involves producing a single interface for the processing of
distributed heterogeneous data, along with query
processing techniques for that cause and resolution of
conflicts at the schema level. Redundancy exploitation is
a field where theoreticians are mostly interested,
involving query containment and rewriting.</p>
        <p>Probably the most interesting graph is depicted in Fig. 6,
grouping the papers by year and super-category. In this
figure we see the evolution with respect to the passing of</p>
      </sec>
      <sec id="sec-2-2">
        <title>Category</title>
        <p>Incomplete information
Data integration
Integration in general
Query processing over integrated data
Schema integration
OLAP modeling
Caching
Iceberg queries
Processing for aggregate queries
Processing for cubes
Query processing in general
Top N queries
Query containment</p>
        <sec id="sec-2-2-1">
          <title>Query rewriting</title>
          <p>Clustering
Indexing
Storage for cubes
Storage in general
Detecting changes in the sources
Data warehouse design
Size estimation for views
View maintenance
At the same time, the interest in query processing rises
continuously from year to year, probably due to the
standard tendency of database researchers towards this
field.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Super-Category</title>
        <p>Incomplete information
Integration</p>
        <sec id="sec-2-3-1">
          <title>OLAP modeling Query Processing</title>
        </sec>
        <sec id="sec-2-3-2">
          <title>Redundancy Exploitation</title>
        </sec>
        <sec id="sec-2-3-3">
          <title>Storage Management</title>
          <p>View Technology</p>
          <p>TopN</p>
          <p>ries
que</p>
          <p>integration
Data</p>
          <p>plete
Incom
inform
ation
OLAP</p>
          <p>ries
Icebergque</p>
          <p>Indexing
general
inStoragefor cubes
in</p>
          <p>a
Schem
There are areas like incomplete information and storage
management which seem to lose interest as time passes.
Redundancy exploitation keeps a standard interest due to
its dedicated audience of theoreticians. Integration and
OLAP modeling seem to gain interest at the same time.
The probable reasons for the former are due to the
criticism against the materialized nature of data
warehousing. As for the latter, it is possible that the lack
of a standard OLAP model plays its role to the increasing
interest in this category.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Data Warehouse Problems and Failures</title>
      <p>An objective observer facing the facts of the previous
section would directly conclude that the area of data
warehousing thrives and the potential for further growth is
more than probable. Although this seems to be a quite
accurate description of the situation, we argue that a data
warehouse project is a great risk and is definitely
endangered by several factors. We intend to back up this
statement by concrete arguments based both on our
personal practical experience in the field and relevant
literature.</p>
      <p>A very good discussion on the problems of data
warehousing projects is found in [Dema97]. The paper
mentions the logical fact that nobody really speaks about
data warehousing failures and goes on to group the
reasons for the failure of a data warehousing project into
four categories, namely design, technical, procedural and
sociotechnical factors (Fig. 7).</p>
      <p>According to [ShTy98], the average time for the
construction of a data warehouse is 12 to 36 months and
the average cost for its implementation is between $1
million to $1.5 million. Data marts are a less risky
expenditure, since they cost hundreds of thousands of
dollars and take less than a year to implement. Still, if a
project of such nature is dependent on so many factors in
order to succeed, then the self-contemplating statements
on the state-of-the-art on data warehouse management are
rather unrealistic. In the sequel, we will take a short look
to the particular factors of failure for data warehouse
projects. As far as the design factors are concerned, there
is an obvious deficit in the part of a “textbook”
methodology for the design of a data warehouse. There
are no standard, or even widely accepted, metadata
management techniques1 or languages, data engineering
techniques or design methodologies for data warehouses.
Rather, proprietary solutions from vendors, or
do-ityourself advice from experts seem to define the
landscape. If we look to the relevant research papers, the
picture is disappointing: the three major conferences on
data management are not really concerned with issues like
metadata management or design methodologies for data
warehouses. There exist, though, relevant areas such as
the research on the physical data warehouse design and
the integration issues. Still, a closer look will reveal that
the research seems to target problems not really close to
the practical ones. For example, the assumptions made for
the design problem are rather unrealistic (knowledge of
user queries, their sizes and frequencies) with respect to
practical cases. Also, the integration problem is definitely
oriented toward a uniform API to distributed sources, i.e.,
to languages and mechanisms that enable the querying of
data. Still, problems like extraction, transformation and
cleaning which can take up to 80% of the time spent in
the development of a data warehouse [Dema97], seem to
be ignored by the research community.</p>
      <p>
        The technical factors also reveal the absence of research
in the confrontation of practical problems. There exist, of
course, standards for the evaluation of software
components, but there is a gap in the evaluation and
choice of hardware components. As one can see in Fig. 8,
hardware costs up to 60% of a data warehouse budget
(disk, processor and network costs). Critical software
(DBMS and client tools) which is purchased (and not
developed in-site) take up to 16% of the budget. There are
no papers to our knowledge that deal with issue of
hardware/software selection for data warehouse
environments. As for the estimation of the sizes of
queries, data sets and network traffic, a closer look to the
1 [ShTy98] reports that the lack of a common metadata
standard (despite the existence of the MDIS standard at
the end of 1998) is the basic source for concern for
metadata management tools.
appendix will reveal only one (!) paper on the estimation
of view sizes [SDNR96]. The fact that the average size of
data warehouses increases year by year makes the
        <xref ref-type="bibr" rid="ref1">problem even tougher. Back in 1996</xref>
        the average data
warehouse size was estimated to be around 250 GB. In
today’s data explosion there is even talk about scientific
data warehouses of 40 TB [SGKT00]. This means that
despite Moore’s law and the drop in the cost of storage
units, size is still a problem for data warehousing. The
increasing number of users increases the complexity of
the problem. [ShTy98] mentions the case of a data
warehouse involving 20.000 users with an annual increase
of 2.000 users per year. Obviously, estimating the size of
materialized views or user queries is of great importance,
in this context.
The procedural and sociotechnical reasons are not really
technical reasons with which we should expect the
research society to deal with. We mention them for
reasons of completeness and in order to show how
sensitive a project like the construction of a data
warehouse is. The procedural factors involve reasons for
deficiencies concerning the deployment of the data
warehouse. Apart from classical problems in IS
management, it is important to notice that the role of user
communities is crucial: the end-users must be trained to
the new technologies and included in the design of the
The author has been involved in both research and
practical data warehouse projects, during the last six
years. Our research experience was mainly the European
basic research project “DWQ: Foundations for Data
Warehouse Quality” [JaVa97]. Obviously, some of the
criticism and comments in this paper are influenced by
the research conducted in this project. We apologize for
this clear bias; still, since this paper presents the author’s
personal judgments we believe that we should make clear
what has possibly influenced our opinion.
      </p>
      <p>The author has also been involved in three rather small
practical data warehouse projects. The first involved
loading data from all the health centers (i.e., hospitals,
provincial medical centers and other special kinds of
warehouse. We refer the interested reader to [Gree00,
Dema97] for further probing on this very interesting
issue.</p>
      <p>As for the sociotechnical issues, it is also very interesting
to briefly discuss the relevant factors, since there is very
little reference to this kind of problems in the literature.
According to [Dema97], breaking the organizational
treaties is a consequence of the fact that the data
warehouse may reorganize the way the organization
works and intrude the functional or subjective domain of
the stakeholders. For example, imposing a particular
client tool to the users invades the users’ desktop, which
is considered to be their personal “territory”. The
problems due to the data ownership and access are
grouped in two categories. First, data ownership is power
within an organization. Any attempt to share or take
control over somebody else’s data is equivalent with loss
of power of this particular stakeholder. Secondly, no
division or department can claim to possess 100% clean,
error-free data. The possibility of revealing the data
quality problems within the information system of the
department is definitely frustrating for the affected
stakeholders. Finally, the invasion in the work practice
reduces to the psychological reason that no user
community seems to be really willing to shift from gut
feeling or experience to objective, data driven
management (see [Dema97] for a broader discussion). To
top the entire skepticism about the non-technical
problems and reasons of failure, ethical considerations
can be added to the big picture of data warehousing. In
[Smit97] several such thoughts are presented: Is it fair to
use customers’ data to harm their relationships with their
suppliers/customers? Is it fair to use such data to intrude
your customers’ know-how? Is it fair to use customers’
data to change the structure of your organization in a way
that is detrimental to your customers? Is it fair to use
personal data of individual customers without any prior
notice?
Most of the aforementioned reasons for failure are backed
up from other testimonial literature (e.g., [Paul97],
[ShTy98]).
centers) in Greece into an enterprise data warehouse. The
loading of data was performed annually and the querying
was supposed to be performed mostly by pre-canned
reports. Still, quite a lot of flexibility was provided to the
user to filter, roll-up drill-down and drill-through the data.
The data warehouse was rather small and its construction
took around 12 months. The major problems encountered
were not technical, since (a) the size of data was not so
big, (b) the refreshment window was not a problem and
(c) there was no real problem in reconciling the source
data. Still, there were major problems with the
administration team of the legacy system due to the
following reasons:
- Lack of training of the target administration team.</p>
      <p>The people administering the legacy COBOL-based
system were the ones who would administer the new
system, too. Still, this was their first experience with
the relational technology and this was definitely a
cultural shock for them.
- Involvement of the administration team of the legacy
system in the design of the new system. Although it is
clear that no data warehouse can be built without the
involvement of the source administrators, our personal
experience suggests that this should be limited to the
construction of the data warehouse enterprise model
(or even only to the reverse engineering of legacy
data). Any attempt to include people without the
proper background in a process they do not really
understand, seems to jeopardize the while effort,
rather than train / accustom them to the new system.
- Poor quality of legacy data. The toughest problem in
this particular problem was the cleaning of data. Each
circuit in the schema seemed to be a sui generis
situation. Most important, we faced big difficulties
trying to convince the administrators of the legacy
system for the poor quality of their data. Another big
problem was the detection of which sources were
reliable. In a COBOL system there is too much
redundancy, since each application uses its own data
store. Every now and then, the different COBOL files
are synchronized, although this is not always 100%
successful. When building the data warehouse, it is a
hard task to determine the quality of each candidate
data source.
- Data warehouse evolution. The business rules for the
data warehouse are likely to change even during the
construction of the warehouse itself. The problem is
hard, since it (a) brings the whole project back in
schedule and cost, (b) it psychologically frustrates the
development team and (c) the lack of a metadata
management repository makes it almost
insurmountable to detect which part of the database or
the applications has to be synchronized with the new
situation. Imagine, for example, the case where the
primary key of a fact table has to change a couple of
weeks before completing the project. In our case, we
had to detect and evolve around 50 pre-canned reports
as well as all the refreshment processes of the fact
table and the materialized views that used it. It was
only the consistent naming of all the software
components that helped us perform this task.</p>
      <p>Note that the project experienced no political problems.
The data warehouse was requested by the same
department that previously owned the data. The new
system would still be under the control of this particular
department and would thus synchronize and clean the
information they provided to higher management. Note
also that we never came to direct contact with the
endusers: this was supposed to be a task undertaken by the
administration team of this particular department. Thus,
we have no knowledge for the real success of this project.
In a second occasion, we had to build a data warehouse
with pension data. The data were to be updated monthly
and used by pre-canned reports. The size of data involved
a few million rows per month. The source data relied
again on a COBOL-based legacy system. The project
lasted nine months and could be characterized more as the
construction of a data mart rather than the construction of
a full data warehouse. In this case, the major problem was
of political nature: different departments were involved in
the ownership of the information. The people
administering the legacy system were definitely affected
by the construction of the warehouse. These people
- would lose the full ownership of the information
(which translates to sheer power in the IT
department);
- would have to take care of the transportation and
conversion of the data in their own system (which
means extra workload for both people and systems)
and
any deficiencies of the information they produced
would be revealed (a fact of enormous importance and
effect in the public sector).</p>
      <p>Bearing all this in mind, it quite straightforward to
understand the difficulties raised. Moreover, it was
interesting to see that the higher management, although
committed to the idea of constructing the data warehouse,
was unable to force things to happen and had to take an
approach that peacefully resolved any problems that
occurred, in order to salvage the project from total failure.
Another problem we had to face in this project was the
difficulty in constructing the extraction and cleaning
software. The extraction of data from the legacy systems
is a highly complex, error-prone and tiring procedure. To
give an idea of the problem, let us mention the case where
the problem involved detecting relevant data from a
COBOL file, converting EBCDIC to ASCII format,
unpacking the packed numbers, reducing all address fields
to a standard format and loading the result into a table in
the data warehouse. Apart from the standard tool offered
by Oracle for these purposes (SQL*Loader) we did not
use any commercial tool for these tasks. This seems to be
the tactics followed by the majority of data warehousing
projects. According to [ShTy98] most of the companies
contacted for their survey, estimate that more than 1/3 of
the cost and time are spent to ETL tasks during the
development process. Still, in spite the obvious
importance of this process, the vast majority of them
developed their own application instead of using a tool to
facilitate the process. [ShTy98] also reports that data
quality products are expensive and hard to use. Based on
the problem of time and budget constraints for a data
warehouse project, [ShTy98] estimates that such products
are going to modestly foster in the next few years (with
the almost the lowest CAGR of all the product
categories).</p>
      <p>Political problems were apparent in a third case where the
project failed. The organization possessed four legacy
systems, all of different kind (COBOL, Excel and dBase
files as well as a relational system). A pilot data mart
involving a subset of one of the legacy systems had
already been successful and the management was
enthusiastic about the whole idea. Still, the project failed,
before it even started. As we had also observed in the
previous case, it seems to be a common phenomenon that
the people administrating the legacy system take a little
time until they understand what is politically happening to
them once a data warehouse is built. In this particular case
the reaction was quick and absolute: no data were to be
given from the largest legacy system, since its
administrators simply refused to provide them. The
project was thus canceled. The lesson we learnt in this
case is that it takes more than an enthusiastic management
and a successful pilot for a data warehouse to succeed.
Later, we learned that the warehouse project started again,
still we have no knowledge for the fate of this new effort.</p>
      <sec id="sec-3-1">
        <title>3.2 Relationship between Practical Problems and</title>
      </sec>
      <sec id="sec-3-2">
        <title>Research Issues</title>
        <p>In this section we would like to relate the data warehouse
lifecycle with potential problems and solutions offered by
technology to tackle this particular problems. The first
problem in this task is the lack of a concrete
“textbookstyle” methodology. Reading the two classical books on
data warehousing [Inmo96, Kimb96] one gets the feeling
that they provide tips and solutions for fragments of the
whole process, rather than a concrete methodology for the
data warehouse practitioner. We use as a template
methodology the one proposed in an Appendix of
[Inmo96] and try to relate it to potential problems and
technological solutions offered by research. We list only
the aforementioned problems and research categories.</p>
        <p>Again, we do not claim that either list is exhaustive, but
rather indicative.</p>
        <p>As we can see in Fig. 10 there are areas where research
has contributed a lot to the practical problems. For
example, several issues of the view technology
supercategory are (or at least, can be) somehow used by
practitioners in data warehouse design and
implementation. Also, several topics of the integration
super-category can be exploited in practical cases.</p>
        <p>Apart from these successes, there are two issues that
clearly depict the gap between research and practice. On
the one hand, there is an unclear picture with respect to
the extent that practice has exploited the results of
research. Query processing and storage management are
two research fields aiming to empower the technology
providers (i.e., the software and hardware vendors) with
better techniques for the storage and acquisition of
information. To our knowledge, it is not clear to which
extent have this results been incorporated in commercial
products. The extent to which results in the field of
incomplete information and redundancy exploitation can
be exploited is another pending issue. The former seemed
to be a rather promising research field but the lack of
research interest in the later years seems to be
discouraging for its further exploitation. The latter is a
clear field but we believe that its practical exploitation
will take time to be implemented. As far as the data
warehouse designer is concerned, the cases where the
determination of the intentional subsumption of two data
stores is useful is rather limited. Instead, it is the
extensional properties of the data source that count (an
issue not really apparent in database research). Finally,
OLAP modeling could be very useful in the logical
definition of the data warehouse, but the lack of a
standard multidimensional hierarchical model seems to
drive designers to ad-hoc, proprietary solutions. Still, the
relational counterpart, in the form of the ER diagram and
the relational model, seems to be a promising precedent.</p>
        <p>On the other hand of course, there seem to be rather big
gaps in the table of Fig. 10, with respect to steps in the
data warehouse lifecycle which are not supported by the
conducted research. The data model analysis could be
clearly helped by improved techniques of metadata
management (and standards) as well as by data
engineering methods that enable the designer to
understand and model data and processes better.</p>
        <p>Breadbox analysis and technical assessment are clearly
under-estimated by the research community. Techniques
to analyze data volume, network traffic, relevance and
quality of software components would greatly be
appreciated by data warehouse designers. The extraction
process is also suffering from lack of help from the
research community: as already mentioned, most research
performed has been dedicated to what should be extracted
(instead of how this extraction is performed). The
practical aspects of extraction are clearly neglected (e.g.
declarative languages and visual interfaces for the
management of the extraction process, automation of the
extraction programs, etc.). The problem is vast due to the
sui generis nature of each kind of source (ASCII data are
different from ISAM or database data) and of each
particular source itself. The peculiarities of the conversion
process are also –more or less- neglected.
e
h
t
s
w
e
i
v
r
o
f
n
o
i
t
a
m
i
t
s
e
e
z
i
S
d f
te o d
s c n
lilttsreeaoobnPm jtrrrseceecpopoppom ilttjfrsseacedupoopo tititltilrrrrsssseeaaacazaeaaeuohogonnoynw itirrrrsssseaaaeccaaeeceenohpndodndw tiitirrfrrffsssacceeeceaeaekhopuoonuwmm tiissseeengodhoddum ititiltiirrssseeceaeeeonuonnvovdhndgnmm tttfaeaacaaaeekodgnnmmm ittliirrcaaaeeeebodngnngm iittttittiitrfrfrsaacaeeaaeakonngohgdnonmm ilittitiittfrssseeceeeaaeavxvnovonhdnonmmm tiitttfssssseaceeeeeygyngdhnohnywmm ,tiilittffrsssaeaeeeaaeaoxnonouovoqudmm ittffrracekow tttitfrrsseaeeeaeeeoongnnqunwmmm llitttlittirrssaeeeeceeeaeoonndgoodnhdgnm itrfsceceoohongonponwm iltttrrssaaceeeendoondm ittliiiiiirffrrsseeccecaudnygnnhhuopoppwm tiltlrsaeeuuoqypobm tliiirssscceeaaenhgndm ttilrseeeaaauohvonuow itltlfraeacaaooquoygdy itliititt/ltffrrrscccececyunongnuhoySW iitliittlittlffsscaeaayuunngdhquyoo
P I B D D T N U L P L E le N n N C C V D d U D P D D
e
h
t
y
b
d
e
r
e
f
f
o
d n n
a h h
e c c
r e e
B T T
a
s e
r
e s
d e
l r
o
h e
e</p>
        <p>h
k t
a
t
s y
r b
o
f</p>
        <p>d
g e
in r
n e
i f
a f
r
t o
f
o</p>
        <p>n
k
ac ito
L
y
t
i
n
u
m
m
o
c
h
c
r
s
u
l
o
s
d
n
a
s
m
e
l
b
h
e
r
a
w
a
t
a
D
0
1
.
g
i
F
We believe that a turn in the interest of the research
community from the virtual querying of distributed
heterogeneous data sources and the intentional
reconciliation to practical aspects of extraction of
materialized data could benefit the practitioners a lot.</p>
        <p>Finally, it seems to be unclear, to which extent procedural
and sociotechnical factors (involved mostly at the
beginning and the end of a data warehouse project) could
benefit from the use of new technology, suggested by
research results. This fuzziness alone, is a very good
reason for research from the part of academia. As reported
in [SJSV99] significant contribution could also be made
from business administration sciences, e.g., in the way the
data warehouse in introduced in the corporation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>Normally, this is the place for an optimistic message, or
the ringing of the bell. For a change, we will do neither.
There are two issues, though, we would like to touch, as
concluding remarks. First, is it really the case, that
research and practice are so much apart? In our humble
opinion, the answer is negative. Although research has
targeted only a fraction of the possible areas where
practitioners could need assistance, the technological
contribution of the research society is significant. For
example, let us mention the case of data warehouse
refreshment. Despite the problems in the extraction step,
which we have already mentioned, the refreshment
process is of significant importance for the proper
operation of the data warehouse. The recurring costs for
data warehouse refreshment come up to 55% of the
overall cost for running a data warehouse (Fig. 9). Still,
the contribution is only in areas where the existing
technology could be enhanced, without any
methodological results or groundbreaking research in new
fields.</p>
      <p>Secondly, why is it that researchers are found away from
the practical problems of data warehousing? This is a
widely discussed issue (e.g., there is a standard debate in
the Communications of the ACM magazine). We point
only a few reasons that have come to our attention:
- It is possible that several researchers are not aware of
the real-world problems. The major motivation for
writing this paper was a discussion with a visiting
researcher to our department. This person has
devoted too much time, programming and energy to
the data warehouse design problem. Still, he believed
that the data warehouse is simply a set of
“DECLARE VIEW” statements. Clearly, this was a
problem of lack of direct contact with practical
problems.
- It is not always rewarding, in terms of research, to
deal with practical problems. The extraction process
of our case study, which we mentioned in Section 3
might give an example for this statement. Which
researcher would feel happy to work on such a ‘dirty’
problem, knowing that it will be too hard to make
publications out of such an effort. It is not strange,
thus, that so much theoretical work has been devoted
to view maintenance issues, with respect to what
should be propagated to the warehouse, while few
research efforts have been made as to how this
extraction and propagation is to be made. We believe
that it would be really hard for a paper concerning
practical automation techniques for the data
extraction task to convince an academic audience.
The last Asilomar report [BBC+98] states the need
for “groundbreaking” instead of “delta” research;
still, it is not clear which practical issues concerning
data warehousing are qualified under this definition.
- The rules that govern the behavior of science are
applied also in the case of data warehousing. It is
commonly agreed that it is the Paradigm that
determines the interesting problems and not
viceversa. In our case, the paradigm set by the papers of
Codd and Selinger et al., has –more or less- set the
landscape for the research in the data warehouse
field, too. For example, although too much work has
been devoted to query processing for aggregate
queries, these queries are still treated in isolation.
Still, an OLAP session is a sequence of steps, which
have some logical interrelationship. How many
papers do you know dealing with this particular
property of OLAP? As another example, we simply
remind the technical and design problems mentioned
in Section 3, which although being of great
importance are not addressed by the research. We
believe that one of the reasons for this situation is the
non-standard nature of these problems, which puts
them outside the scope of the relational paradigm.
As for the future, it is hard to make any predictions. Is
data warehousing going to be virtual (making all our
comments on the integration problem void, and the
research conducted in this field highly useful)? Is there
going to be a shift towards methodological issues in data
warehouses? Are the gaps in Fig. 10 going to be filled?
Although the answer is ‘I don’t know’ –at least from our
part- it is a challenging issue to work on these issues,
contributing thus, to the closing of the gap between
research and practice and making data warehousing an
easier and less risky endeavor for practitioners and
organizations.</p>
    </sec>
    <sec id="sec-5">
      <title>5. References</title>
      <p>[BBC+98]
[Comp96]</p>
      <p>P.A. Bernstein, M.L. Brodie, S. Ceri,
D.J. DeWitt, M.J. Franklin, H.
GarciaMolina, J. Gray, G. Held, J.M.</p>
      <p>Hellerstein, H.V. Jagadish, M. Lesk, D.</p>
      <p>Maier, J.F. Naughton, H. Pirahesh, M.</p>
      <p>Stonebraker, J.D. Ullman. The Asilomar
Report on Database Research. SIGMOD
Record 27(4): 74-80 (1998)
ComputerWire Inc. Data Warehouse
[SDNR96]
[SGKT00]
[ShTy98]
[Smit97]
Paper Category
1995 – PODS
Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, Divesh Srivastava. Answering Query rewritting
Queries Using Views. 95-104.</p>
      <p>Anand Rajaraman, Yehoshua Sagiv, Jeffrey D. Ullman. Answering Queries Using Templates Query rewritting
with Binding Patterns. 105-112.</p>
      <p>H. V. Jagadish, Inderpal Singh Mumick, Abraham Silberschatz. View Maintenance Issues View maintenance
for the Chronicle Data Model. 113-124.
1996 - SIGMOD
Richard Hull, Gang Zhou. A Framework for Supporting Data Integration Using the Data integration
Materialized and Virtual Approaches. 481-492.</p>
      <p>Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman. Implementing Data Cubes DW design
Efficiently. 205-216.</p>
      <p>Leonid Libkin, Rona Machlin, Limsoon Wong. A Query Language for Multidimensional Processing for cubes
Arrays: Design, Implementation, and Optimization Techniques. 228-239.</p>
      <p>Sudhir Rao, Antonio Badia, Dirk Van Gucht. Providing Better Support for a Class of Query
Decision Support Queries. 217-227. general
Kenneth A. Ross, Divesh Srivastava, S. Sudarshan. Materialized View Maintenance and View maintenance
Integrity Constraint Checking: Trading Space for Time. 447-458.</p>
      <p>
        Latha S. Colby, Timothy Griffin, Leonid Libkin, Inderpal Singh Mumick, Howard Trickey. View maintenance
Algorithms for Deferred View Maintenance. 469-480.
        <xref ref-type="bibr" rid="ref1">processing in
1996</xref>
        - VLDB
Peter Scheuermann, Junho Shim, Radek Vingralek. WATCHMAN : A Data Warehouse Caching
Intelligent Cache Manager. 51-62.
      </p>
      <p>Alon Y. Levy, Anand Rajaraman, Joann J. Ordille. Querying Heterogeneous Information Data integration
Sources Using Source Descriptions. 251-262.</p>
      <p>Alon Y. Levy. Obtaining Complete Answers from Incomplete Databases. 402-412. Data integration
Wilburt Labio, Hector Garcia-Molina. Efficient Snapshot Differential Algorithms for Data Detecting changes in
Warehousing. 63-74. the sources
Curtis E. Dyreson. Information Retrieval from an Incomplete Data Cube. 532-543. Incomplete information
for
for
Laks V. S. Lakshmanan, Fereidoon Sadri, Iyer N. Subramanian. SchemaSQL - A Language Integration in general
for Interoperability in Relational Multi-Database Systems. 239-250.</p>
      <p>Yannis Papakonstantinou, Serge Abiteboul, Hector Garcia-Molina. Object Fusion in Integration in general
Mediator Systems. 413-424.</p>
      <p>Mark W. W. Vermeer, Peter M. G. Apers. The Role of Integrity Constraints in Database Integration in general
Interoperation. 425-435.</p>
      <p>Damianos Chatziantoniou, Kenneth A. Ross. Querying Multiple Features of Groups in Processing
Relational Databases. 295-306. aggregates
Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Processing
Raghu Ramakrishnan, Sunita Sarawagi. On the Computation of Multidimensional aggregates
Aggregates. 506-521.
for
for
Divesh Srivastava, Shaul Dar, H. V. Jagadish, Alon Y. Levy. Answering Queries with Query rewritting
Aggregation Using Views. 318-329.</p>
      <p>Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton, Karthikeyan Ramasamy. Storage Size estimation
Estimation for Multidimensional Aggregates in the Presence of Hierarchies. 522-531. views
Martin Staudt, Matthias Jarke. Incremental Maintenance of Externally Materialized Views. View maintenance
75-86.
1997 - PODS
Ching-Tien Ho, Jehoshua Bruck, Rakesh Agrawal. Partial-Sum Queries in Data Cubes Using Processing for cubes
Covering Codes. 228-237.</p>
      <p>Catriel Beeri, Alon Y. Levy, Marie-Christine Rousset. Rewriting Queries Using Views in Query rewritting
Description Logics. 99-108.</p>
      <p>Oliver M. Duschka, Michael R. Genesereth. Answering Recursive Queries Using Views. Query rewritting
109-116.
for
1997 - SIGMOD
Joseph M. Hellerstein, Peter J. Haas, Helen Wang. Online Aggregation. 171-182.</p>
      <p>Incomplete information
Patrick E. O’Neil, Dallan Quass. Improved Query Performance with Variant Indexes. 38-49. Indexing
Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, Ramakrishnan Srikant. Range Queries Processing for cubes
in OLAP Data Cubes. 73-88.</p>
      <p>Yihong Zhao, Prasad Deshpande, Jeffrey F. Naughton. An Array-Based Algorithm for Processing for cubes
Simultaneous Multidimensional Aggregates. 159-170.</p>
      <p>Nick Roussopoulos, Yannis Kotidis, Mema Roussopoulos. Cubetree: Organization of and Storage for cubes
Bulk Updates on the Data Cube. 89-99.</p>
      <p>Michael J. Carey, Donald Kossmann. On Saying “Enough Already!” in SQL. 219-230. Top N queries
Inderpal Singh Mumick, Dallan Quass, Barinderpal Singh Mumick. Maintenance of Data View maintenance
Cubes and Summary Tables in a Warehouse. 100-111.</p>
      <p>Brad Adelberg, Hector Garcia-Molina, Jennifer Widom. The STRIP Rule System For View maintenance
Efficiently Maintaining Derived Data. 147-158.</p>
      <p>Dallan Quass, Jennifer Widom. On-Line Warehouse View Maintenance. 393-404. View maintenance
Latha S. Colby, Akira Kawaguchi, Daniel F. Lieuwen, Inderpal Singh Mumick, Kenneth A. View maintenance
Ross. Supporting Multiple View Maintenance Policies. 405-416.</p>
      <p>Divyakant Agrawal, Amr El Abbadi, Ambuj K. Singh, Tolga Yurek. Efficient View View maintenance
Maintenance at Data Warehouses. 417-427.
1997 - VLDB
Dimitri Theodoratos, Timos K. Sellis. Data Warehouse Configuration. 126-135. DW design
Jian Yang, Kamalakar Karlapalem, Qing Li. Algorithms for Materialized View Design in DW design
Data Warehousing Environment. 136-145.</p>
      <p>Elena Baralis, Stefano Paraboschi, Ernest Teniente. Materialized Views Selection in a DW design
Multidimensional Database. 156-165.</p>
      <p>Christos Faloutsos, H. V. Jagadish, Nikolaos Sidiropoulos. Recovering Information from Incomplete information
Summary Data. 36-45.
Vasilis Vassalos, Yannis Papakonstantinou. Describing and Using Query Capabilities of Integration in general
Heterogeneous Sources. 256-265.</p>
      <p>Mary Tork Roth, Peter M. Schwarz. Don’t Scrap It, Wrap It! A Wrapper Architecture for Integration in general
Legacy Data Sources. 266-275.</p>
      <p>Marc Gyssens, Laks V. S. Lakshmanan. A Foundation for Multi-dimensional Databases. OLAP modeling
106-115.</p>
      <p>Kenneth A. Ross, Divesh Srivastava. Fast Computation of Sparse Datacubes. 116-125.</p>
      <p>Processing for
aggregates
Damianos Chatziantoniou, Kenneth A. Ross. Groupwise Processing of Relational Queries. Processing for
476-485. aggregates
Laura M. Haas, Donald Kossmann, Edward L. Wimmers, Jun Yang. Optimizing Queries Query processing over
Across Diverse Data Sources. 276-285. integrated data
H. V. Jagadish, P. P. S. Narayan, S. Seshadri, S. Sudarshan, Rama Kanneganti. Incremental Storage in general
Organization for Data Recording and Warehousing. 16-25.</p>
      <p>Nam Huyn. Multiple-View Self-Maintenance in Data Warehousing Environments. 26-35. View maintenance
1998 - PODS
John R. Smith, Chung-Sheng Li, Vittorio Castelli, Anant Jhingran. Dynamic Assembly of DW design
Views in Data Cubes. 274-283.</p>
      <p>Phokion G. Kolaitis, David L. Martin, Madhukar N. Thakur. On the Complexity of the Query containment
Containment Problem for Conjunctive Queries with Built-in Predicates. 197-204.</p>
      <p>Phokion G. Kolaitis, Moshe Y. Vardi. Conjunctive-Query Containment and Constraint Query containment
Satisfaction. 205-213.</p>
      <p>Werner Nutt, Yehoshua Sagiv, Sara Shurin. Deciding Equivalences Among Aggregate Query containment
Queries. 214-223.</p>
      <p>Serge Abiteboul, Oliver M. Duschka. Complexity of Answering Queries Using Materialized Query rewritting
Views. 254-263.
1998 - SIGMOD
Chee Yong Chan, Yannis E. Ioannidis. Bitmap Index Design and Evaluation. 355-366. Indexing
Prasad Deshpande, Karthikeyan Ramasamy, Amit Shukla, Jeffrey F. Naughton. Caching Processing for
Multidimensional Queries Using Chunks. 259-270. aggregates
Yihong Zhao, Prasad Deshpande, Jeffrey F. Naughton, Amit Shukla. Simultaneous Processing for
Optimization and Evaluation of Multiple Dimensional Queries. 271-282. aggregates
Jun Rao, Kenneth A. Ross. Reusing Invariants: A New Strategy for Correlated Queries. 37- Query processing in
48. general
Subbu N. Subramanian, Shivakumar Venkataraman. Cost-Based Optimization of Decision Query processing over
Support Queries Using Transient Views. 319-330. integrated data
Renée J. Miller. Using Schematically Heterogeneous Structures. 189-200. Schema integration
Yannis Kotidis, Nick Roussopoulos. An Alternative Storage Organization for ROLAP Storage for cubes
Aggregate Views Based on Cubetrees. 249-258.
1998 - VLDB
Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton. Materialized View Selection for DW design
Multidimensional Datasets. 488-499.</p>
      <p>Min Fang, Narayanan Shivakumar, Hector Garcia-Molina, Rajeev Motwani, Jeffrey D. Iceberg queries
Ullman. Computing Iceberg Queries Efficiently. 299-310.</p>
      <p>Frédéric Gingras, Laks V. S. Lakshmanan. nD-SQL: A Multi-Dimensional Language for Integration in general
Interoperability and OLAP. 134-145.</p>
      <p>Fernando de Ferreira Rezende, Klaudia Hergula. The Heterogeneity Problem and Integration in general
Middleware Technology: Experiences with and Performance of Database Gateways.
146157.</p>
      <p>Guido Moerkotte. Small Materialized Aggregates: A Light Weight Index Structure for Data Processing
Warehousing. 476-487. aggregates
for
12-14
Michael J. Carey, Donald Kossmann. Reducing the Braking Distance of an SQL Query Top N queries
Engine. 158-169.</p>
      <p>Hector Garcia-Molina, Wilburt Labio, Jun Yang. Expiring Data in a Warehouse. 500-511.
1999 - PODS
Howard J. Karloff, Milena Mihail. On the Complexity of the View-Selection Problem. 167- DW design
173.</p>
      <p>Sara Cohen, Werner Nutt, A. Serebrenik. Rewriting Aggregate Queries Using Views. 155- Query rewritting
166.</p>
      <p>Stéphane Grumbach, Maurizio Rafanelli, Leonardo Tininini. Querying Aggregate Data. 174- Query rewritting
184.</p>
      <p>View maintenance
1999 - SIGMOD
H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava. Snakes and Sandwiches: Clustering
Optimal Clustering Strategies for a Data Warehouse. 37-48.</p>
      <p>Yannis Kotidis, Nick Roussopoulos. DynaMat: A Dynamic View Management System for DW design
Data Warehouses. 371-382.</p>
      <p>Kevin S. Beyer, Raghu Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg Iceberg queries
CUBEs. 359-370.</p>
      <p>Ramana Yerneni, Chen Li, Hector Garcia-Molina, Jeffrey D. Ullman. Computing Integration in general
Capabilities of Mediators. 443-454.</p>
      <p>Peter J. Haas, Joseph M. Hellerstein. Ripple Joins for Online Aggregation. 287-298.
for</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Vassiliadis</surname>
          </string-name>
          <article-title>Economics: ROI doubts? Data Warehouse Tools Bulletin</article-title>
          ,
          <year>November 1996</year>
          . Available at http://www.computerwire.com/dwtb/free /2112_182.htm International Conference on
          <article-title>Data Warehousing and Knowledge Discovery (DaWaK)</article-title>
          . http://www.informatik.unitrier.de/~ley/db/conf/dawak/index.html
          <string-name>
            <given-names>M.</given-names>
            <surname>Demarest</surname>
          </string-name>
          .
          <article-title>The politics of data warehousing</article-title>
          . Available at http://www.hevanet.com/demarest/marc/ dwpol.html
          <source>International Workshop on Data Warehousing and OLAP (DOLAP).</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Vassiliou</surname>
          </string-name>
          .
          <source>Design and Management of Data Warehouses - Report on the DMDW'99 Workshop. SIGMOD Record</source>
          <volume>28</volume>
          (
          <issue>4</issue>
          ),
          <year>December 1999</year>
          . Refers to the International Workshop DMDW'99 at CAiSE'99, Heidelberg, Germany,
          <year>June 1999</year>
          . Online version available at http://sunsite.informatik.rwthaachen.de/Publications/CEUR-WS/Vol19 L.
          <article-title>Greenfield. Data Warehousing Political Issues</article-title>
          .
          <source>February</source>
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          Available at http://www.dwinfocenter.ord/politics.ht
          <string-name>
            <given-names>ml W.H.</given-names>
            <surname>Inmon</surname>
          </string-name>
          .
          <article-title>Building the Data Warehouse</article-title>
          . John Wiley &amp; Sons, March
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>DM Review Magazine</surname>
          </string-name>
          ,
          <year>January 1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          Available at http://www.dmreview.com/master.cfm? NavID=55&amp;
          <string-name>
            <surname>EdID=1315 M. Jarke</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Vassiliou</surname>
          </string-name>
          .
          <article-title>Foundations of data warehouse quality - a review of the DWQ project</article-title>
          .
          <source>In Proc. 2nd Intl.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>Conference Information Quality (IQ-97)</source>
          , Cambridge, Mass.,
          <year>1997</year>
          . Available in http://www.dblab.ece.ntua.gr/~dwq
          <string-name>
            <given-names>R.</given-names>
            <surname>Kimbal</surname>
          </string-name>
          .
          <article-title>The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses</article-title>
          . John Wiley &amp; Sons,
          <year>February 1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>L.G. Paul.</surname>
          </string-name>
          <article-title>Anatomy of a failure</article-title>
          .
          <source>CIO Magazine. November 15</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          Available at http://www.cio.com/archive/enterprise/1 11597_data_content.
          <source>html N. Pendse, February</source>
          <volume>24</volume>
          ,
          <year>2000</year>
          .
          <source>The OLAP Report</source>
          . Available at http://www.olapreport.com/Market.htm.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.F.</given-names>
            <surname>Naughton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramasamy</surname>
          </string-name>
          .
          <article-title>Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies</article-title>
          .
          <source>In Proceedings of 22nd International Conference on Very Large Databases (VLDB)</source>
          ,
          <source>Mumbai India</source>
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Designing</surname>
          </string-name>
          and
          <article-title>Mining Multi-Terabyte Astronomy Archives</article-title>
          . SIGMOD Conference 2000. Also available at http://www.research.microsoft.com/~gra y/ C. Shilakes,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tylman</surname>
          </string-name>
          .
          <source>Enterprise Information Portals. Enterprise Software Team. November</source>
          <year>1998</year>
          . Available at www.sagemaker.com/company/downloa ds/eip/indepth.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Smith</surname>
          </string-name>
          .
          <source>Do Data Warehouses Challenge Fair Play? Beyond Computing</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ), May
          <year>1997</year>
          . Available at www.beyondcomputingmag.com/archive /1997/5-97/ethics.html
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>