<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Processing Relational OLAP Queries with UB-Trees and Multidimensional Hierarchical Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Volker Markl</string-name>
          <email>volker.markl@forwiss.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rudolf Bayer</string-name>
          <email>bayer@in.tum.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bayerisches Forschungszentrum, für Wissensbasierte Systeme</institution>
          ,
          <addr-line>(FORWISS), Orleansstraße 34, D-81667, München</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institut für Informatik, Technische Universität München</institution>
          ,
          <addr-line>Orleanstraße 34, D-81667, München</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>1</fpage>
      <lpage>1</lpage>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Multidimensional access methods like the
UBTree can be used to accelerate almost any query
processing operation, if proper query processing
algorithms are used: Relational queries or SQL
queries consist of restrictions, projections,
ordering, grouping and aggregation, and join
operations. In the presence of multidimensional
restrictions or sorting, multidimensional range
query or Tetris algorithms efficiently process
these operations. In addition, these algorithms
also efficiently support queries that generate
some hierarchical restrictions (for instance by
following 1:n foreign key relationships). In this
paper we investigate the impacts on query
processing in RDBMS when using UB-Trees and
multidimensional hierarchical clustering for
physical data organization. We illustrate the
benefits by performance measurements of
queries for a star schema from a real world
application of a SAP business information
warehouse. The performance results reported in
this paper were measured with our prototype
implementation of UB-Trees on top of Oracle 8.
We compare the performance of UB-Trees to
native query processing techniques of Oracle,
namely access via an index organized table,
which essentially stores a relation in a clustered
B*-Tree, and access via a full table scan of an
entire relation. In addition we measure the
performance of the intersection of multiple
bitmap indexes to answer multidimensional
range queries.</p>
      <p>The copyright of this paper belongs to the paper’s authors. Permission to copy
without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage.</p>
    </sec>
    <sec id="sec-2">
      <title>Proceedings of the International Workshop on Design and</title>
    </sec>
    <sec id="sec-3">
      <title>Management of Data Warehouses (DMDW’2000)</title>
      <p>Stockholm, Sweden, June 5-6, 2000
(M. Jeusfeld, H. Shu, M. Staudt, G. Vossen, eds.)</p>
      <sec id="sec-3-1">
        <title>1 Introduction</title>
        <p>The most established relational data models for data
warehousing applications are the star schema and the
snowflake schema. In both approaches there is a central
fact table that contains the measures and the dimension
tables are situated around it. The connection between a
fact tuple and the corresponding dimension members is
realized via foreign key relationships. In the star schema
the dimension tables are completely denormalized while
in the snowflake schema they may be normalized.
Queries usually contain restrictions on multiple dimension
tables (e.g., only sales for specific customer group and for
a specific time period are asked) that are then used as
restrictions on the usually very large fact table.
In this paper we investigate, how UB-Trees and the
Tetris algorithm in combination with our technique of
multidimensional hierarchical clustering by hierarchy
interleaving (MHC/HI) may be used to accelerate relational
query processing with a special focus on star-joins, the
most frequent operation of query processing for relational
data warehouses. With our technique of MHC/HI we
specifically cluster the data with respect to the foreign key
relationships defined by the star- or snowflake schema of
a data warehouse. We also illustrate the physical data
modeling with MHC/HI for a SAP business information
warehouse and give a performance analysis for this
realworld application. Our performance analysis gives
insight into the real world data distribution of 6 GB of sales
data from a fruit juice company and shows, why MHC/HI
in combination with UB-Trees is superior to classical
bitmap indexes, clustering B*-Trees and parallel full
table scans.</p>
        <p>Our paper is organized as follows: Section 2 surveys
related work, in Section 3 we present the basic concepts of
UB-Trees, their algorithms and MHC/HI. Section 4
presents the SAP business warehouse schema that we use for
our analysis. In Section 5 we analyze the data
distribution of the real world data. Section 6 gives a performance
analysis and performance measurements with our
prototype implementation. Section 7 concludes our paper and
gives an outlook on future work.
Performance optimization has been well studied in the
field of OLTP systems [Gra93]. Due to the completely
different query characteristics of OLAP applications in
comparison to OLTP new questions have to be addressed
here. The performance problem is heavily linked to the
physical data model.</p>
        <p>The index selection problem for ROLAP application is
widely discussed in the research community [GHR+97,
Sar97]. Especially bitmap indexes have been proposed to
speed up ROLAP applications because of their
compactness and support of star joins [CI98]. A common way of
performance improvement is the usage of materialized
views - often in combination with indexing methods
[TS97, Moe98, WB98]. Due to the large number of
possible views a selection problem exists besides the
maintenance issue [Gup97, SDN+96, SDN+98]. Clustering of
OLAP data plays a key role in providing good
performance. Clustering has been well researched in the field of
access methods. B-Trees, for instance, provide
one-dimensional clustering. Multidimensional clustering has
been discussed in the field of multidimensional access
methods. See [GG97] and [Sam90] for excellent surveys
of almost all of these methods. [ZSL98] addresses the
issue of hierarchical clustering for the one-dimensional
case.</p>
        <p>Most work on applying multidimensional indexes to
RDBMS discusses restrictions by range queries [SRF97,
NHS84, Gut84, OM84, LS90]. [JL98] accelerates range
queries with aggregations by storing aggregated data in
R-Trees. [MRB99] and [Bay97b] are the basis of our
approach, where joins and sorted processing of data
organized by a multidimensional index are investigated.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3 The UB-Tree</title>
        <p>The basic idea of the UB-Tree [Bay97a, Mar99] is to use
a space-filling curve to partition a multidimensional
universe. Using the Z-Curve (Figure 1a) the UB-Tree
preserves the multidimensional clustering. A Z-Address α =
Z(x) is the ordinal number of the key attributes of a tuple
x on the Z-Curve, which can be efficiently computed by
bit-interleaving [OM84]. A standard B-Tree is used to
index the tuples taking the linear Z-Address of the tuples
as keys.
Z-Region [α : β ] is the space covered by an interval on
the Z-Curve and is defined by two Z-Addresses α and β.
We call β the region address of [α : β ].Each Z-Region
maps exactly onto one page on secondary storage, i.e., to
one leaf page of the B-Tree.</p>
        <p>(a)</p>
        <p>(b)
For an 8×8 universe, i.e., s = 3 and d = 2, Figure1b
shows the corresponding Z-addresses. Figure2a shows
the Z-region [4: 20] and Figure2b shows a partitioning
with five Z-regions [0 : 3], [4 : 20], [21: 35], [36 : 47]
and [48 : 63]. Assuming a page capacity of 2 points,
Figure2c shows ten points, which create the partitioning
of Figure2b. The details of the UB-Tree algorithms are
described in [Bay97a, Mar99].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3.1 Bit Interleaving</title>
      <p>The performance of the UB-Tree crucially relies on an
efficient implementation of the Z-address calculation. For
tuples of positive integer numbers, bit interleaving
immediately yields an algorithm to calculate a Z-address
from the binary representation of a tuple.</p>
      <p>It is easy to incorporate varying attribute lengths, i.e.,
attributes with different resolutions, into this algorithm:
When the limit of the resolution (i.e., the last bit) in one
attribute is reached, this attribute is not used for
bit-interleaving any further. The number of bits in each further
step is reduced in this case.</p>
      <p>For each attribute Ai we denote the number of distinct
values of its domain by ri, i.e., ri = |Ai|
Definition 1: The number of steps for attribute Ai of a
domain with cardinality1 ri is determined by its
resolution: steps(i) = log2ri
Definition 2: The length of step k in bits (i.e., the
number of dimensions in step k) is:</p>
      <p>steplength(k) = |i | steps(i) ≥ k and i ∈ D}|
With r = max({steps(rj) | j ∈ D}) bit interleaving has a
CPU-complexity of O(d⋅r) bit operations (resp. O(∑id=1 ri )
1 Note that we defined ri to be 2v for some v ∈ 1R
for attributes of different length). The same holds for the
inverse algorithm Z-1 that calculates the Cartesian
coordinates of a tuple from its address.</p>
    </sec>
    <sec id="sec-5">
      <title>3.2 Range Queries on UB-Trees</title>
      <p>To answer a range query, only those Z-regions, which
properly intersect the query box, must be fetched from
the database and thus from the disk. Initially the range
query algorithm calculates and retrieves the first Z-region
that is overlapped by the query-box. Then the next
intersecting Z-region is calculated and retrieved. This is
repeated until a minimal cover for the query box has been
constructed, i.e., the region that contains the ending
point of the query box has been retrieved.</p>
      <p>It is important to note that for any Z-region the
calculation of the next point intersecting the query box is
performed solely in main memory. [Bay97a] describes an
algorithm that for a Z-region [α : β ] and a query box Q
calculates the smallest Z-address γ &gt; β that intersects Q
in time exponential to the number of dimensions. A
linear algorithm which is solely based on bit operations is
described in [Mar99] and [RMF+00].</p>
    </sec>
    <sec id="sec-6">
      <title>3.3 The Tetris Algorithm</title>
      <p>With the Tetris algorithm [MZB99], tables organized by
a UB-Tree can be read in any key sort order in O(n) disk
accesses where n is the number of pages of the table or
the minimal number of regions covering a query box.
The Tetris algorithm is a generalization of the
multidimensional range query algorithm that efficiently
combines sort operations with the evaluation of
multi-attribute restrictions. The basic idea is to use the partial sort
order imposed by a multidimensional partitioning in
order to process a table in some total sort order. Essentially
a plane sweep over a query space defined by restrictions
on a multidimensionally partitioned table is performed.
The direction of the sweep is determined by the sort
attribute. Initially the algorithm calculates the first
Z-region that is overlapped by the query box, retrieves it and
caches it in main memory. Then it continues to read and
cache the next Z-regions with respect to the requested
sort order, until a complete thinnest possible slice of the
query box (in the sorting dimension) has been read. Then
the cached tuples of this slice are sorted in main memory,
returned in sort order to the caller and removed from
cache. The algorithm proceeds reading the next slice,
until all Z-regions intersecting the query box have been
processed. Only disk pages overlapping the query space
are accessed. With sufficient, but modest, cache memory
each disk page is accessed only once.</p>
    </sec>
    <sec id="sec-7">
      <title>3.4 Multidimensional Hierarchical Clustering</title>
      <p>Multidimensional Hierarchical Clustering by Hierarchy
Interleaving (MHC/HI, [MRB99]) clusters a
multidimensional data set on disk with respect to the hierarchical
relationships in multiple dimensions. This is achieved by
creating surrogate values that ensure that a hierarchical
restriction like REGION = “North America” and
NATION = “USA” is mapped to an interval restriction in
a linear domain. With this technique, a star join on a
schema with d dimensions therefore creates a
ddimensional interval restriction on the fact table which
then may efficiently be processed by the UB-Tree.
In general one can imagine any foreign key relationship
to be used for such a clustering. In the following we
illustrate MHC/HI by hierarchical relationships as they
usually occur in data warehousing applications. In
ROLAP hierarchies are usually modeled implicitly by a
set of attributes A1, ..., An where Ai corresponds to
hierarchy level i and level 1 is the root of the hierarchy.
Many attributes in relational DBMS in general and in
data warehouses in particular have an actual domain of a
very small set of values. A typical example (cf. Figure 4a)
is the attribute REGION of the dimension table
CUSTOMER, which has an actual domain of 8 values
(Southern Europe, Central Europe, Northern Europe,
Western Europe, North America, Latin America, Asia,
Australia). By MHC/HI, these long strings are replaced
by numbers. In the corresponding tables small numbers
or bit strings are stored instead of long, space wasting
character strings. If all records must be retrieved that
concern Europe, only an interval [0;3] on this number is
necessary. Otherwise every member of region must be
specified. The mapped numbers are called surrogates
because they replace the character strings. This data type
is called enumeration type.</p>
      <sec id="sec-7-1">
        <title>REGION</title>
        <p>South Europe
Middle Europe
Northern Europe
Western Europe
North America
Latin America
Asia
Australia</p>
        <p>(a)
CUSTOMER
f(REGION)
0
1
2
3
4
5
6
7
0 SouthEurope ... 4 North America ...6 Asia
7 Australia
0 Canada 1 USA</p>
        <p>0 Wholesale 1 Retail
0 Wholesale 1 Retail</p>
        <p>... Kana´s SushiBar ...
This surrogate technique is generalized to all levels of the
hierarchies. Figure 4b shows a part of the customer
hierarchy with the surrogates propagated to all levels of
the hierarchy.
1-3
To efficiently encode hierarchies, we introduce the
concept of compound surrogates for hierarchies. Since we
require hierarchies to form a disjoint partitioning of a
dimension, a uniquely identifying compound surrogate
for each child node of a hierarchy member exists and can
be recursively calculated by concatenating the compound
surrogate of the member with the running number of the
child node. Because only efficient bit operations are
necessary, the computation is extremely fast. The ord
function returns the corresponding surrogate of a hierarchy
member in the specified level.</p>
        <p>The hierarchy path North America ¯ USA ¯ Retail ¯
Bar (cf. Figure 4b) has the compound surrogate:
ordCustomer (North America) ο ordNorth America (USA) ο
ordUSA(Retail) ο ordRetail(Bar) = 4 ο 1 ο 1 ο 2.</p>
        <p>The upper limit of the domain for surrogates of level i is
calculated as the maximum fan-out (number of children)
of all members of level i–1 of a hierarchy H, i.e.,
surrogates(H, i) = max {cardinality(children(H, m)) where m
∈ level(H, i - 1)}
The compound surrogate can be interpreted as a number.
In the above example – using the surrogate length of
Figure 4a with a length of 10 bits for the customer
surrogate (3 for REGION, 3 for NATION, 1 for Trade Type
and 3 for Business Type) the hierarchy path results in the
compound surrogate 10000110102 = 538.</p>
        <p>Usually growth expectations for a hierarchy are known
well in advance. Often hierarchy trees are even static.
Therefore it is possible to determine a reasonable number
of bits for storing each surrogate of the compound
surrogate of a hierarchy. The overall number of bits necessary
to store a compound surrogate is relatively small. For
instance, a hierarchy tree with four branches on 8 levels
already represents 48 = 65536 partitions and is stored in
only 16 bits. The maximum length of the compound
surrogates for the ‘Juice &amp; More’ schema that we used for
our analysis can be computed from the maximum fan-out
of the hierarchy levels given in Figure 4a.</p>
        <p>
          This very compact fixed length encoding preserves the
lexicographic order on the hierarchy levels. Thus, point
restrictions on upper hierarchy levels result in range
restrictions on the finest granularity of a hierarchy. For
instance, the point restriction NATION = “USA” on the
second level of the CUSTOMER hierarchy with f(“North
America”) = 4 = 1002 and f(“USA”) = 1 = 0012 maps to
the range restriction cscustomer between 528 =
10000100002 and 543 = 10000111112. Thus, a star join
with this surrogate encoding for the foreign keys of a fact
table results in a range restriction on each compound
surrogate, if some hierarchy level of each dimension is
restricted to a point. In the same way intervals on the
children of one hierarchy level result in a range of the
corresponding compound surrogates
          <xref ref-type="bibr" rid="ref17 ref31 ref4">(e.g., YEAR = 1998
and MONTH between April and June)</xref>
          . A star join on a
schema with d dimensions therefore creates a
d-dimensional interval restriction on the fact table.
        </p>
        <sec id="sec-7-1-1">
          <title>4 Schema and Queries of ‘Juice &amp; More’</title>
          <p>In this section we investigate, how UB-Trees and the
Tetris algorithm in combination with MHC/HI may be
used to accelerate star-joins, the most frequent operation
of query processing for relational data warehouses.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>4.1 The Juice &amp; More Schema</title>
      <p>We use the schema of the beverages supplier ‘Juice &amp;
More’, a real customer of one of our project partners2. In
the data warehouse of ‘Juice &amp; More’ data is organized
along the following four dimensions: CUSTOMER,
PRODUCT, DISTRIBUTION and TIME. Figure 5a
shows the hierarchies over the dimensions (the number
in parentheses specifies the maximum number of level
members).</p>
      <p>PRODUCT</p>
      <p>CUSTOMER</p>
      <p>DISTRIBUTION</p>
      <p>TIME
All Products</p>
      <p>All Customer</p>
      <p>All Distributions All Time
Type (5)</p>
      <p>Region (8)
Brand (8)</p>
      <p>Nation (7)
Category (19)</p>
      <p>Trade Type (2)
Container (10)</p>
      <p>Business Type (7)</p>
      <p>Sales
Organization (5)</p>
      <p>Distribution
Channel (3)</p>
      <p>Year (3)
Month (12)
PRODUCT
2180 rows
In the following we describe some example queries
involving star joins for the ‘Juice &amp; More’ schema. These
queries were taken from the decision support system of
‘Juice &amp; More’. Thus next to the investigation of
multidimensional hierarchical clustering this section is
interesting from a second point of view: The highly
skewed data distribution of the ‘Juice &amp; More’ will prove
that UB-Trees, the Tetris algorithm and MHC/HI are not
only applicable to laboratory environment tests with
generated data, but also prove their efficiency in the
practical application scenarios. In order to show the
highly skewed data distribution we included an entire
section displaying the one-dimensional data distribution
of ‘Juice &amp; More’ for every dimension.</p>
      <p>We then present measurements performed with our
prototype implementation of the UB-Tree on top of
Oracle 8. For the evaluation of our clustering technique
we defined a benchmark with 36 queries. In comparison
we also conducted measurements with native Oracle 8
access methods: parallel full table scan (FTS) and bitmap
indexes (BII). For these measurements we used a
completely denormalized fact table, that is, no additional
joins next to the star join had to be performed to answer
the queries. The bitmap indexes were created on each
hierarchy level. We did not include secondary indexes in
our comparison measurements because earlier
experiments showed that they are neither competitive to
the UB-Tree nor to FTS or BII [MZB99].
4.2</p>
    </sec>
    <sec id="sec-9">
      <title>Queries on the ‘Juice &amp; More’ Schema</title>
      <p>In the following we present typical queries that are taken
from the ‘Juice &amp; More’ SAP business information
warehouse. We will use these queries to illustrate our
approach and we will present performance measurements
for exactly these queries in Section 6.2.</p>
      <p>Query 1 (Q1, cf. Figure 6) computes the sales for a given
product group (TYPE and BRAND specified as (X1,
X2)) and a given customer group (NATION and
REGION specified as (Y1, Y2)) for the months from
October to December of 1993. This query uses the
UBTree range query algorithm on MHC/HI surrogates.
However, surrogates are only used for clustering and
indexing and thus are transparent to the user.</p>
      <p>SELECT SUM(Sales)
FROM Fact F, Customer C, Product P, Time T
WHERE F.ProdKey = P.ProdKey AND</p>
      <p>F.CustKey = C.CustKey AND
P.Type = X1 AND P.Brand = X2 AND
C.Region = Y1 AND C.Nation = Y2 AND
F.TimeKey = T.TimeKey AND T.Year = 1993 AND
T.Month &gt;= October AND T.Month &lt;= December</p>
      <p>Figure 6: Time Interval (Q1)
Query 2 (Q2, cf. Figure 7) calculates the cost of
distribution of the products of type X for each
distribution channel. This query uses the Tetris algorithm
on MHC/HI surrogates in order to efficiently calculate
aggregations.</p>
      <p>SELECT SALESORG, CHANNEL, SUM(DistCost)
FROM Fact F, Distribution D, Product P
WHERE F.DistKey = D.DistKey AND</p>
      <p>F.ProductKey = P.ProductKey AND</p>
      <p>P.Type = X
GROUP BY D.SalesOrg,D.Channel</p>
      <p>Figure 7: Distribution cost (Q2)
Query 3 (Q3, cf. Figure 8) restricts all dimensions on the
first level of the hierarchies. This query also uses the
Tetris algorithm on MHC/HI compound surrogates.
SELECT SUM(SALES)
FROM Fact F, Distribution D, Product P,</p>
      <p>Customer C, Time T
WHERE F.DistKey = D.DistKey AND</p>
      <p>F.TimeKey = T. TimeKey AND
F.CustKey = C.CustKey AND
F.ProdKey = P.ProdKey AND
P.Type = t AND D.SalesOrg = s AND</p>
      <p>T.Year = y AND C.Region = r
Figure 8: Partial match query in first hierarchy level (Q3)</p>
      <sec id="sec-9-1">
        <title>5 Data Distributions of the ‘Juice &amp; More’</title>
      </sec>
      <sec id="sec-9-2">
        <title>Fact Table</title>
        <p>The data of ‘Juice &amp; More’ is real world data; the data
distribution of both the fact table and the dimension
tables is highly skewed: The dimensions are neither
distributed uniformly nor are independent. The original fact
table consisted of 823.464 tuples (about 175 MB). To get
a realistic large data cube, the fact table was enlarged to
26.350.848 tuples (about 6 GB). Our project partner
implemented an augmentation algorithm with minimal
impact on the data distribution (see [Pie98]).</p>
        <p>In the following we show some charts that describe the
one-dimensional data distribution for each dimension.
However, we once more stress that the dimensions are
not independent (e.g., some customers always order the
same subset of products, some customers or products only
exist for a certain time, etc.). Thus in general, the overall
selectivity of a query restricting several dimensions is not
the product of the selectivities of the one-dimensional
restrictions (which is shown in the following charts). We
will see this deviation in Section 6.</p>
        <p>The fact table of ‘Juice &amp; More’ stores several measures
(e.g., distribution cost, sales) aggregated on a daily basis
with respect to the dimensions time, customer, product
and distribution. Since the data is sensitive real-world
business data, it is not possible to show the labels/names
of the hierarchy members in the charts.
5.1</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>The Time Dimension</title>
      <p>The time dimension of ‘Juice &amp; More’ consists of a
twolevel hierarchy of months and years. The test data stored
the years from 1993 to 1995. As shown in Figure 5a, the
days of the time dimension are organized by a two level
1-5
hierarchy (year and month). Figure 9 shows the
cumulated data distribution of the fact table with respect to the
time dimension grouped by year and month. The
horizontal axis displays the hierarchy members, with “All” at
the very bottom (i.e., the lowest level), the years 1993,
1994, and 1995 in the middle and the twelve months for
each year above the year. The arrows in the horizontal
axis indicate the relationship between the members of
neighboring hierarchy levels.</p>
      <p>Thus the fact table of ‘Juice &amp; More’ is almost uniformly
distributed with respect to the time dimension. The
minimum number of facts for one month is 2,70% of the
fact table (in January 1993, December 1993 and January
1995), whereas the maximum number of facts per month
is 2,83% (in March of each of the three years). Thus with
a multidimensional clustering using 5 bits for time,
restrictions to one month in the time dimension (=1/36)
can be expected to reduce the amount of data to around
1/25 = 1/32.
2,0%
1,5%
1,0%
0,5%
The top level of the hierarchy on product has five entries.
The data distribution is quite skewed, there are three
product groups to which 93% of all tuples of the fact
table belong. 1% of the data is unclassified. The
distribution of the first level of the product hierarchy is
illustrated in Figure 10.
4%
3%
2%
1%
0%
3,0%
2,5%
2,0%
1,5%
1,0%
0,5%
0,0%
Multidimensional hierarchical clustering ensures that a
restriction in the first hierarchy level will result in a 1%,
27%, 6%, 32% respectively 34% reduction of I/Os which
are necessary to retrieve the result set. Without exactly
showing the relationship between the hierarchy members,
we show the skew of the data distribution over the first
four product levels in Figure 11.
%
296
,
%
067
,
According to business administration literature 20% of
the customers contribute to 80% of the business. The
customer dimension of ‘Juice &amp; More’ is a typical
example for a classification of customers in such a company. A
high number of customers (in this case 88%, the very left
entry of Figure 12) are not classified (maybe they are not
interesting for the company because of small turnover or
it is not possible to find a classification). The classified
customer groups contain 0% to 3% of the tuples stored in
the table.3 The hierarchical relationships on the
horizontal axis show that the hierarchy of the customer
dimension is not balanced, since several hierarchy members
just have one child.</p>
      <p>%
7833
,
%
307
,
%
416
,
%
712
,
%
% % ,208 4%
,140 ,201 ,00</p>
      <p>%
,00% ,105
0
,300% ,130% ,004% ,020% ,900% ,600%
8% 5% 6% 6% ,16%
,00 ,00 ,00 ,00 0
2% 7% ,13%
,00 ,00 0
%
721
,
%
7
7% % ,055% ,07
,02 ,70% ,201
0
3 Note that the data in the ‘Juice &amp; More’ warehouse is
aggregated on a daily basis, thus the amount of data is
usually compressed for large customers, thus the
proportion of large customers is reduced.</p>
      <p>The consequence of the small number of classified
customers is that queries the restriction CUSTOMER will
result in a selectivity of at most 3% and therefore the
overall result set will be small.
5.2</p>
    </sec>
    <sec id="sec-11">
      <title>The Distribution Dimension</title>
      <p>There are seven entries on the first level of the
distribution hierarchy. The data distribution of the fact table with
respect to the distribution dimension is highly skewed.
Figure 13 shows the distribution of the first hierarchy
level (i.e., sales organization), while Figure 14 shows the
distribution of facts in the fact table for each distribution
channel of each sales organization. Again the arrows
indicate the hierarchical relationship of the members of
neighboring hierarchy levels with the hierarchy root
“All” at the bottom of the Figure.</p>
      <p>16%
8% 13%
37%
13%
7%
6%
12,5%
10,0%
7,5%
5,0%
2,5%
0,0%
1
2
3
4
5
6
8</p>
      <sec id="sec-11-1">
        <title>6 Performance Analysis of MHC/HI for ‘Juice &amp; More’</title>
        <p>In this section we first analyze the performance of
MHC/HI on UB-Trees analytically and then, taking the
data distributions and query selectivities into account,
verify our expectations by comparing the number of
retrieved pages with the product of the selectivities over all
dimensions. We then also give response times which
show the superiority of UB-Trees and MHC/HI over
traditional indexes even for our prototype
implementation.
Figure 15 shows the compound surrogates for the ‘Juice
&amp; More’ data warehouse, which are calculated as fixed
length compound surrogates in [MRB99]. For any of the
4 hierarchies the length of the compound surrogate does
not exceed 15 bits and thus can be stored in a single
integer value. These compound surrogates are used as
attributes for each of the four dimensions of ‘Juice &amp; More’ to
calculate the Z-address for each tuple of the ‘Juice &amp;
More’ fact table.
csproduct = 1p145p2144p313 1p142p2114p310 1p9 4p482p7 4p643p5 1p44p23p423p1
level1 level 2 level3 level 4
cscustomer = c1102c9 3c8 c17 2c6 3c5 12c43c132c2 3c1</p>
        <p>level1 level 2 level3 level 4
cstime = t{6t5 t14t23t32t1</p>
        <p>level1 level 2
csdistribution = d d d
15 2433 1d22d31</p>
        <p>level1 level2
Figure 15: Compound surrogates for each dimension of
‘Juice &amp; More’
The UB-Tree for the ‘Juice &amp; More’ fact table consists of
P = 878362 pages, which corresponds to: l = log2 P =
log2878362 = 19,7 bits that are in any case necessary to
address the bit-interleaved multidimensional space. With
bit interleaving in the order of dimensions PRODUCT,
CUSTOMER, TIME, and DISTRIBUTION the Z-address
α for a tuple of the ‘Juice &amp; More’ fact table is
calculated as:
α = 1p154c140t64d5 4p144c94t5d44 4p123c84t44d3 4p124c74t3d42 p4114c63t2 d{1 1p104c54t1 p49c44p48c434p72c24p64c1 4p5 4p44p34p243p1
completelypartitioned for anydatadistribution partlysplit partitioneddependingon thedata distribution
The first 19 bits of the Z-address are guaranteed to be
used to partition the four dimensional universe of the
‘Juice &amp; More’ fact table. This means that the binary
strings p15p14p13p12p11 of the compound surrogate of
product, c10c9c8c7c6 of customer, t6t5t4t3t2 of time and
d5d4d3d2 of distribution are used to partition the universe.
For each of the four dimensions the first hierarchy level
is completely used for the partitioning. The second
hierarchy level is used to a large extent to partition the
universe. Therefore a restriction in the first hierarchy level
will result in a reduction of the number of pages as
determined by the data distribution of Section 5, i.e., a
restriction of the product main group to “910” will reduce
the number of pages to be retrieved to 27%, a restriction
to “912” will result in a reduction to 6,41%. This holds
for the restriction of the first hierarchy level in any
dimension. If the top hierarchy level is restricted in several
dimensions and the independence assumptions holds for
these dimensions, the reduction is multiplicative. Figure
16 shows the predicted selectivity calculated as the
product of the selectivity in each dimension, the actual
selectivity and the loaded pages in percent of the entire pages
in the database for two queries, which restrict the first
hierarchy level of three out of four dimensions.
0,78 6,4 37,5
However, the data is not independently distributed in the
entire 4-dimensional universe of ‘Juice &amp; More’. In this
case the predicted selectivity does not describe the actual
selectivity anymore. Thus some bits of the first 19 bits are
correlated. This means that not all combinations of these
bits occur and some partitioning will take places in the
bits below bit number 19 of the Z-address. In this case
the second level may already be completely partitioned
and even a third hierarchy level partitioning may have
started for some dimensions. A typical part of the
multidimensional space where this will happen is the customer
hierarchy “unclassified”, which stores 87,34% of the
customers. At most the three bits c10c9c8 of the customer
hierarchy are needed to distinguish these customers from
all other customers. Thus for the unspecified customers
the bits c7c6 of the first 19 bits of the Z-address are
correlated to c10c9c8 and two further bits may be used for
partitioning. Thus d1 will be split completely, p10 will be
used for the partitioning and t1 will be partly split (c5 is
also correlated to c10c9c8). Our measurements show that
this effect also holds for other dimensions. Figure 17
shows queries where the first two hierarchy levels of
CUSTOMER, PRODUCT and TIME are restricted,
whereas the DISTRIBUTION dimension is not restricted.
The selectivity predicted by the cost functions here differs
from the actual selectivity of the query because of
dependencies in the data distribution. However, the
percentage of pages loaded is similar to the actual selectivity
of each query.
scustomer in
%
2,95
2,95
sproduct in
%
sdis- stime
tribu- in %
tion
in
%</p>
        <p>Pa- predicted actual
ges selecti-
selecloa- vity in % tivity
ded in %
loaded
pages
in %
3,64
100 38,4
Each of the 878362 pages of the ‘Juice &amp; More’ fact table
stores 30 tuples. All of the measurements showed that
when restricting the first hierarchy level in each
dimension in average 99,99% of the tuples on the pages
contributed to the result set. A standard deviation of less
than 0,001 for these measurements means that the
multidimensional hierarchical clustering is perfect for
multidimensional restrictions in the first hierarchy level.
When additionally restricting the second hierarchy level
in average 557 pages were loaded, where 10,7% of the
tuples did not contribute to the result set. The standard
deviation here was 0,06. Additionally restricting the third
hierarchy level of each dimension usually created result
sets with only one page.</p>
        <p>Thus multidimensional hierarchical restrictions are very
well processed by UB-Trees storing compound surrogates
which are created by the multidimensional hierarchical
clustering technique. For star schemas as used in present
data warehousing applications this approach significantly
speeds up query performance and reduce resource
requirements in disk space and processing time.
6.2</p>
      </sec>
      <sec id="sec-11-2">
        <title>Performance Measurements</title>
        <p>The measurements for ‘Juice &amp; More’ were performed on
a SUN Enterprise with four 300 MHz UltraSPARC
processors and 2 GB RAM under Solaris 2.6. As secondary
storage a partition on a SPARCstorage array with RAID
level 0 (6 disks striping, 5-6 MB/s transfer rate per disk)
was used. All measurements were done in a single-user
environment.</p>
        <p>It is important to note that our implementation still
causes significant overhead due to the fact that we have
implemented the UB-Tree on top of a DBMS and not in
the kernel itself. First, the number of SQL statements that
have to be processed (UB: 1 statement for each page in
the result set, Oracle 8 methods: 1 statement in total)
leads to extensive inter-process communication (about
30% of the total processing time) and DBMS overhead
(e.g., parsing of statements). Second, our table is larger
than the one for the FTS and the bitmap indexes due to
unimplemented compressing techniques in the UB-Tree
(for 8 KB pages: UB: 878362 pages, FTS: 723539 pages,
BII: FTS+31134 pages).</p>
        <p>Figure 18 shows result set sizes and response times of the
three example queries (Section 4.2). Q1 shows that the
UB-Tree with multidimensional clustering is over 2
times faster than BII even for very small result sets. Q3
which is processed by the unoptimized UB-Tree at least
10 times faster than with any other access method
undermines this observation.</p>
        <p>300
s
n
ie 200
m
i
T
e
s
on 100
p
e
R
0
285
232</p>
        <p>275
2
Q1
5</p>
        <p>186
118
Q2</p>
        <p>1
Q3
10</p>
        <p>UB</p>
        <p>FTS
Bitmap
The result set of query Q2 is quite large but the almost
perfect clustering factor of the UB-Tree (in average more
than 29 out of 30 tuples/page belong to the result set) still
leads to a speed up of more than 30 % in comparison to
BII. The time for FTS for Q2 differs from the times for
Q1 and Q3 due to the less complex WHERE clause of the
statement. The number of comparison operations is
therefore much smaller than for the other queries, which
causes the faster execution.</p>
        <p>All these results on real data show how well the
multidimensional hierarchical clustering with UB-Trees works
in practice and the accuracy of our theoretical cost model
[MZB99]. In total more than 77% of all benchmark
queries (28 out of 36) showed a speed up between a factor of
1.3 and 10 over traditional techniques.</p>
        <sec id="sec-11-2-1">
          <title>Cust- Pro- Distr</title>
          <p>omer duct ibution</p>
        </sec>
        <sec id="sec-11-2-2">
          <title>Time</title>
        </sec>
        <sec id="sec-11-2-3">
          <title>Select</title>
          <p>ivity
Instance 1 in
%
0,0414
0,0070
0,0182
1,1346
6,0000</p>
        </sec>
        <sec id="sec-11-2-4">
          <title>Select</title>
          <p>ivity
Instance 2 in
%
0,3176
3,4383
2,0981
1,1346
34,000
231,65
186
118</p>
          <p>UB
BM</p>
          <p>FTS
56
44
Q4
Q5
Q6
Q7
Q8
2
1
1
1
0
300
250
200
s
d
n
o
c
enS150
i
e
m
i
T100
50
0
19 lists the number of hierarchy levels that by each query
are restricted to a point for each dimension. Since the
data is non-uniformly distributed, the selectivity of each
query depends on the exact point restriction, not only on
the number of restricted hierarchy levels. We thus
present two instances of the queries, each of which has a
different selectivity.</p>
          <p>500
400
sdn300
o
c
e
s
n
i
e
tim200
100
0
287
313
283
274</p>
        </sec>
      </sec>
      <sec id="sec-11-3">
        <title>7 Conclusions and Future Work</title>
        <p>We have presented multidimensional hierarchical
clustering by hierarchy interleaving (MHC/HI) as a physical
data modeling technique for OLAP applications in
relational databases. Our performance measurements have
shown that MHC/HI in combination with UB-Trees and
the Tetris algorithm significantly reduces the number of
random accesses to the fact table for star joins and other
queries with restrictions in multiple hierarchies. This
results in considerably lower response times for typical
data warehousing queries with star joins. For a 6 GB
retail data SAP business information warehouse, our
prototype implementation of MHC/HI accelerates the
processing of star-join queries more than a factor of two
compared to bitmap indexes, clustering B*-Trees or parallel
full table scans. For dimensionalities typical for data
warehousing, only I/O-time linear in size of the result set
prior to aggregation and sublinear temporary storage are
necessary to aggregate parts of a fact table of a star or
snowflake schema. Thus secondary storage space and
pre-computation time for many aggregates and bitmap
indexes can be avoided. In addition the widely discussed
view maintenance problem is minimized. Depending on
the query, temporary storage requirements for sorting are
reduced by several orders of magnitude. Our clustering
approach also holds not only for ROLAP but also for
MOLAP implementations of a data warehouse since both
ROLAP fact tables and MOLAP data cubes can be
clustered in this way.</p>
        <p>Our future work includes deriving a set of cost based
decision rules for how to – possibly multidimensionally
index a relation for a given set of queries. We also intend
to apply our model to cost based query optimization and
combine the model with multidimensional histograms in
2
1
1
0
1
0
1
1
0
0
1
1
0
1
1
order to take dependencies and correlations between the
attributes into account.</p>
      </sec>
      <sec id="sec-11-4">
        <title>8 References</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bay97a]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bayer</surname>
          </string-name>
          .
          <article-title>The universal B-Tree for multidimensional Indexing: General Concepts</article-title>
          .
          <source>WWCA '97</source>
          ,
          <string-name>
            <surname>Tsukuba</surname>
          </string-name>
          , Japan, p-
          <volume>10</volume>
          -11, Springer Verlag,
          <year>March</year>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Bay97b]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bayer</surname>
          </string-name>
          .
          <article-title>UB-Trees and UB-Cache - A new Processing Paradigm for Database Systems</article-title>
          .
          <source>Technical Report TUM-I9722, TU München</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [CD97]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Dayal</surname>
          </string-name>
          .
          <article-title>An Overview of Data Warehousing and OLAP Technologies</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>26</volume>
          (
          <issue>1</issue>
          ),
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [CI98]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ioannidis</surname>
          </string-name>
          .
          <article-title>Bitmap Index Design and Evaluation</article-title>
          . SIGMOD,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Dat88]
          <string-name>
            <given-names>C.J.</given-names>
            <surname>Date</surname>
          </string-name>
          .
          <article-title>A Guide to the SQL Standard,2nd edition</article-title>
          . Addison-Wesley,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [GG97]
          <string-name>
            <given-names>V.</given-names>
            <surname>Gaede</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Günther</surname>
          </string-name>
          .
          <source>Multidimensional Access Methods. ACM Computing Surveys</source>
          <volume>30</volume>
          (
          <issue>2</issue>
          ),
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [GHR+97]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Harinarayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajaraman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ullman</surname>
          </string-name>
          .
          <article-title>Index Selection for OLAP</article-title>
          . ICDE,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [GR97]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gray</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Reuter</surname>
          </string-name>
          . Transaction Processing:
          <article-title>Concepts and Techniques</article-title>
          . Morgan Kaufmann Publishers,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Gra93]
          <string-name>
            <given-names>G.</given-names>
            <surname>Graefe</surname>
          </string-name>
          .
          <article-title>Query Evaluation Techniques for Large Databases</article-title>
          .
          <source>ACM Computing Surveys 25</source>
          ,
          <year>1993</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>170</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Gup97]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gupta</surname>
          </string-name>
          .
          <article-title>Selection of Views to Materialize in a Data Warehouse</article-title>
          .
          <source>Proc. of the Intl. Conference on Database Theory</source>
          , Athens, Greece,
          <year>January 1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Gut84]
          <string-name>
            <given-names>A.</given-names>
            <surname>Guttman. R-Trees</surname>
          </string-name>
          :
          <article-title>A dynamic Index Structure for spatial Searching</article-title>
          .
          <source>Proc. of ACM SIGMOD Conf.</source>
          ,
          <year>1984</year>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [HAM+97]
          <string-name>
            <given-names>C.T.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Megiddo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          .
          <article-title>Range Queries in OLAP Data Cubes</article-title>
          . SIGMOD Conf.,
          <year>1997</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [JL98]
          <string-name>
            <surname>Jürgens</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lenz</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          :
          <article-title>The R*a-Tree: An improved R*-Tree with Materialized Data for Supporting Range Queries on OLAP-Data</article-title>
          , DWDOT Workshop, Vienna,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Kim96]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kimball</surname>
          </string-name>
          .
          <article-title>The Data Warehouse Toolkit</article-title>
          . John Wiley &amp; Sons, New York.
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [LS90]
          <string-name>
            <surname>Lomet</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Salzberg</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The hB-Tree: A Multiattribute Indexing Method with good guaranteed Performance</article-title>
          .
          <source>ACM TODS</source>
          ,
          <volume>15</volume>
          (
          <issue>4</issue>
          ),
          <year>1990</year>
          , pp.
          <fpage>625</fpage>
          -
          <lpage>658</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Mar99]
          <string-name>
            <given-names>V.</given-names>
            <surname>Markl</surname>
          </string-name>
          . MISTRAL:
          <article-title>Processing Relational Queries using a Multidimensional Access Method</article-title>
          .
          <source>Ph.D. Thesis</source>
          , TU München,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Moe98]
          <string-name>
            <given-names>G.</given-names>
            <surname>Moerkotte. Small Materialized</surname>
          </string-name>
          <article-title>Aggregates: A Light Weight Index Structure for Data Warehousing</article-title>
          . VLDB Conference. New York, USA,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [MRB99]
          <string-name>
            <given-names>V.</given-names>
            <surname>Markl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ramsak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Bayer</surname>
          </string-name>
          .
          <article-title>Improving OLAP Performance by Multidimensional Hierarchical Clustering</article-title>
          .
          <source>Proc. of IDEAS'99</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [MZB99]
          <string-name>
            <given-names>V.</given-names>
            <surname>Markl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zirkel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Bayer</surname>
          </string-name>
          .
          <article-title>Processing Operations with Restrictions in Relational Database Management Systems without external Sorting</article-title>
          .
          <source>ICDE</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [NHS84]
          <string-name>
            <given-names>J.</given-names>
            <surname>Nievergelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hinterberger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. C.</given-names>
            <surname>Sevcik. The</surname>
          </string-name>
          Grid-File.
          <source>ACM TODS</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <source>March</source>
          <year>1984</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [OM84]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Orenstein</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.H.</given-names>
            <surname>Merret</surname>
          </string-name>
          .
          <article-title>A Class of Data Structures for Associate Searching</article-title>
          .
          <article-title>SIGMOD-PODS Conf</article-title>
          .,
          <string-name>
            <surname>Portland</surname>
          </string-name>
          , Oregon,
          <year>1984</year>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>[OQ97] P. O´Neill</surname>
            and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Quass</surname>
          </string-name>
          .
          <article-title>Improved Query Performance with Variant Indexes</article-title>
          .
          <source>ACM SIGMOD Intl. Conf. On Management of Data</source>
          , Tucson, Arizona, pp.
          <fpage>38</fpage>
          -
          <lpage>49</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Pie98]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pieringer</surname>
          </string-name>
          .
          <article-title>Evaluation of the UB-Tree in the SAP Environment</article-title>
          .
          <source>Master Thesis</source>
          , TU München,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [RMF+00]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ramsak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Markl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fenk</surname>
          </string-name>
          et al.
          <article-title>Integrating the UB-Tree into a data-basesystem kernel</article-title>
          .
          <source>VLDB2000</source>
          , Cairo,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Sar97]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          .
          <article-title>Indexing OLAP data</article-title>
          .
          <source>Data Eng. Bulletin</source>
          <volume>20</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [Sam90]
          <string-name>
            <given-names>H.</given-names>
            <surname>Samet</surname>
          </string-name>
          .
          <article-title>The Design and Analysis of Spatial Data Structures</article-title>
          .
          <source>Addison Wesley</source>
          ,
          <year>1990</year>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [SDN+96]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Naughton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramasamy</surname>
          </string-name>
          .
          <article-title>Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies</article-title>
          . VLDB Conference.
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [SDN+98]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Naughton</surname>
          </string-name>
          .
          <article-title>Materialized View Selection for Multidimensional Datasets</article-title>
          . SIGMOD Conf.,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [TS97]
          <string-name>
            <given-names>D.</given-names>
            <surname>Theodoratos</surname>
          </string-name>
          and
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Sellis</surname>
          </string-name>
          .
          <source>Data Warehouse Configuration. VLDB</source>
          ,
          <year>1997</year>
          , p.
          <fpage>126</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [SRF97]
          <string-name>
            <given-names>T. K.</given-names>
            <surname>Sellis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Roussopoulos</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Faloutsos: Multidimensional Access Methods: Trees Have Grown Everywhere</article-title>
          . VLDB,
          <year>1997</year>
          , P.
          <fpage>13</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>[WB98] M.C.Wu</surname>
            and
            <given-names>A.P.</given-names>
          </string-name>
          <string-name>
            <surname>Buchmann</surname>
          </string-name>
          .
          <article-title>Encoded Bitmap Indexing for Data Warehouses</article-title>
          . ICDE, Orlando,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>