<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Processing Relational OLAP Queries with UB-Trees and Multidimensional Hierarchical Clustering</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">V</forename><surname>Markl</surname></persName>
							<affiliation key="aff0">
								<address>
									<addrLine>June</addrLine>
									<settlement>Stockholm</settlement>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">R</forename><surname>Bayer</surname></persName>
							<affiliation key="aff0">
								<address>
									<addrLine>June</addrLine>
									<settlement>Stockholm</settlement>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">M</forename><surname>Jeusfeld</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>-6</addrLine>
									<postCode>2000</postCode>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">H</forename><surname>Shu</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>-6</addrLine>
									<postCode>2000</postCode>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">M</forename><surname>Staudt</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>-6</addrLine>
									<postCode>2000</postCode>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><roleName>eds.</roleName><forename type="first">G</forename><surname>Vossen</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>-6</addrLine>
									<postCode>2000</postCode>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Processing Relational OLAP Queries with UB-Trees and Multidimensional Hierarchical Clustering</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">5101BD608606D0C279609789CEE24474</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T11:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Multidimensional access methods like the UB-Tree can be used to accelerate almost any query processing operation, if proper query processing algorithms are used: Relational queries or SQL queries consist of restrictions, projections, ordering, grouping and aggregation, and join operations. In the presence of multidimensional restrictions or sorting, multidimensional range query or Tetris algorithms efficiently process these operations. In addition, these algorithms also efficiently support queries that generate some hierarchical restrictions (for instance by following 1:n foreign key relationships). In this paper we investigate the impacts on query processing in RDBMS when using UB-Trees and multidimensional hierarchical clustering for physical data organization. We illustrate the benefits by performance measurements of queries for a star schema from a real world application of a SAP business information warehouse. The performance results reported in this paper were measured with our prototype implementation of UB-Trees on top of Oracle 8. We compare the performance of UB-Trees to native query processing techniques of Oracle, namely access via an index organized table, which essentially stores a relation in a clustered B*-Tree, and access via a full table scan of an entire relation. In addition we measure the performance of the intersection of multiple bitmap indexes to answer multidimensional range queries.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The most established relational data models for data warehousing applications are the star schema and the snowflake schema. In both approaches there is a central fact table that contains the measures and the dimension tables are situated around it. The connection between a fact tuple and the corresponding dimension members is realized via foreign key relationships. In the star schema the dimension tables are completely denormalized while in the snowflake schema they may be normalized. Queries usually contain restrictions on multiple dimension tables (e.g., only sales for specific customer group and for a specific time period are asked) that are then used as restrictions on the usually very large fact table. In this paper we investigate, how UB-Trees and the Tetris algorithm in combination with our technique of multidimensional hierarchical clustering by hierarchy interleaving (MHC/HI) may be used to accelerate relational query processing with a special focus on star-joins, the most frequent operation of query processing for relational data warehouses. With our technique of MHC/HI we specifically cluster the data with respect to the foreign key relationships defined by the star-or snowflake schema of a data warehouse. We also illustrate the physical data modeling with MHC/HI for a SAP business information warehouse and give a performance analysis for this realworld application. Our performance analysis gives insight into the real world data distribution of 6 GB of sales data from a fruit juice company and shows, why MHC/HI in combination with UB-Trees is superior to classical bitmap indexes, clustering B*-Trees and parallel full table scans. Our paper is organized as follows: Section 2 surveys related work, in Section 3 we present the basic concepts of UB-Trees, their algorithms and MHC/HI. Section 4 presents the SAP business warehouse schema that we use for our analysis. In Section 5 we analyze the data distribution of the real world data. Section 6 gives a performance analysis and performance measurements with our prototype implementation. Section 7 concludes our paper and gives an outlook on future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Volker Markl Bayerisches Forschungszentrum</head><p>für Wissensbasierte Systeme (FORWISS) Orleansstraße 34, D-81667, München, Germany volker.markl@forwiss.de</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Rudolf Bayer Institut für Informatik Technische Universität München</head><p>Orleanstraße 34, D-81667 München, Germany bayer@in.tum.de</p><p>The copyright of this paper belongs to the paper's authors. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Performance optimization has been well studied in the field of OLTP systems <ref type="bibr" target="#b9">[Gra93]</ref>. Due to the completely different query characteristics of OLAP applications in comparison to OLTP new questions have to be addressed here. The performance problem is heavily linked to the physical data model. The index selection problem for ROLAP application is widely discussed in the research community <ref type="bibr" target="#b7">[GHR+97,</ref><ref type="bibr" target="#b25">Sar97]</ref>. Especially bitmap indexes have been proposed to speed up ROLAP applications because of their compactness and support of star joins <ref type="bibr" target="#b4">[CI98]</ref>. A common way of performance improvement is the usage of materialized views -often in combination with indexing methods [TS97, Moe98  <ref type="bibr" target="#b2">[Bay97b]</ref> are the basis of our approach, where joins and sorted processing of data organized by a multidimensional index are investigated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The UB-Tree</head><p>The basic idea of the UB-Tree [Bay97a, Mar99] is to use a space-filling curve to partition a multidimensional universe. Using the Z-Curve (Figure <ref type="figure" target="#fig_0">1a</ref>) the UB-Tree preserves the multidimensional clustering. A Z-Address α = Z(x) is the ordinal number of the key attributes of a tuple x on the Z-Curve, which can be efficiently computed by bit-interleaving <ref type="bibr" target="#b21">[OM84]</ref>. A standard B-Tree is used to index the tuples taking the linear Z-Address of the tuples as keys.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Bit Interleaving</head><p>The performance of the UB-Tree crucially relies on an efficient implementation of the Z-address calculation. For tuples of positive integer numbers, bit interleaving immediately yields an algorithm to calculate a Z-address from the binary representation of a tuple. It is easy to incorporate varying attribute lengths, i.e., attributes with different resolutions, into this algorithm: When the limit of the resolution (i.e., the last bit) in one attribute is reached, this attribute is not used for bit-interleaving any further. The number of bits in each further step is reduced in this case.</p><p>For each attribute A i we denote the number of distinct values of its domain by r i , i.e., r i = |A i | Definition 1: The number of steps for attribute A i of a domain with cardinality 1 r i is determined by its resolution: steps(i) = log 2 r i Definition 2: The length of step k in bits (i.e., the number of dimensions in step k) is: steplength(k) = |i | steps(i) ≥ k and i ∈ D}| Figure <ref type="figure" target="#fig_1">3</ref> shows this generalized algorithm for bit interleaving.</p><p>Input: x : tuple Output: Z-address α for step = 1 to max({steps(r j ) | j ∈ D}) for i = 1 to steplength(step) copy bit step of x i to bit i of α step end for end for </p><formula xml:id="formula_0">∑ = d i i</formula><p>r O 1 1 Note that we defined r i to be 2 v for some v ∈ 1R for attributes of different length). The same holds for the inverse algorithm Z -1 that calculates the Cartesian coordinates of a tuple from its address.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Range Queries on UB-Trees</head><p>To answer a range query, only those Z-regions, which properly intersect the query box, must be fetched from the database and thus from the disk. Initially the range query algorithm calculates and retrieves the first Z-region that is overlapped by the query-box. Then the next intersecting Z-region is calculated and retrieved. This is repeated until a minimal cover for the query box has been constructed, i.e., the region that contains the ending point of the query box has been retrieved. It is important to note that for any Z-region the calculation of the next point intersecting the query box is performed solely in main memory. [Bay97a] describes an algorithm that for a Z-region [α : β ] and a query box Q calculates the smallest Z-address γ &gt; β that intersects Q in time exponential to the number of dimensions. A linear algorithm which is solely based on bit operations is described in <ref type="bibr" target="#b16">[Mar99]</ref> and <ref type="bibr" target="#b24">[RMF+00]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">The Tetris Algorithm</head><p>With the Tetris algorithm <ref type="bibr" target="#b19">[MZB99]</ref>, tables organized by a UB-Tree can be read in any key sort order in O(n) disk accesses where n is the number of pages of the table or the minimal number of regions covering a query box. The Tetris algorithm is a generalization of the multidimensional range query algorithm that efficiently combines sort operations with the evaluation of multi-attribute restrictions. The basic idea is to use the partial sort order imposed by a multidimensional partitioning in order to process a table in some total sort order. Essentially a plane sweep over a query space defined by restrictions on a multidimensionally partitioned table is performed. The direction of the sweep is determined by the sort attribute. Initially the algorithm calculates the first Z-region that is overlapped by the query box, retrieves it and caches it in main memory. Then it continues to read and cache the next Z-regions with respect to the requested sort order, until a complete thinnest possible slice of the query box (in the sorting dimension) has been read. Then the cached tuples of this slice are sorted in main memory, returned in sort order to the caller and removed from cache. The algorithm proceeds reading the next slice, until all Z-regions intersecting the query box have been processed. Only disk pages overlapping the query space are accessed. With sufficient, but modest, cache memory each disk page is accessed only once.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Multidimensional Hierarchical Clustering</head><p>Multidimensional Hierarchical Clustering by Hierarchy Interleaving (MHC/HI, <ref type="bibr" target="#b18">[MRB99]</ref>) clusters a multidimensional data set on disk with respect to the hierarchical relationships in multiple dimensions. This is achieved by creating surrogate values that ensure that a hierarchical restriction like REGION = "North America" and NATION = "USA" is mapped to an interval restriction in a linear domain. With this technique, a star join on a schema with d dimensions therefore creates a ddimensional interval restriction on the fact table which then may efficiently be processed by the UB-Tree. In general one can imagine any foreign key relationship to be used for such a clustering. In the following we illustrate MHC/HI by hierarchical relationships as they usually occur in data warehousing applications. In ROLAP hierarchies are usually modeled implicitly by a set of attributes A 1 , ..., A n where A i corresponds to hierarchy level i and level 1 is the root of the hierarchy. Many attributes in relational DBMS in general and in data warehouses in particular have an actual domain of a very small set of values. A typical example (cf. Figure <ref type="figure" target="#fig_2">4a</ref>) is the attribute REGION of the dimension table CUSTOMER, which has an actual domain of 8 values (Southern Europe, Central Europe, Northern Europe, Western Europe, North America, Latin America, Asia, Australia). By MHC/HI, these long strings are replaced by numbers. In the corresponding tables small numbers or bit strings are stored instead of long, space wasting character strings. If all records must be retrieved that concern Europe, only an interval [0;3] on this number is necessary. Otherwise every member of region must be specified. The mapped numbers are called surrogates because they replace the character strings. This data type is called enumeration type. This surrogate technique is generalized to all levels of the hierarchies. Figure <ref type="figure" target="#fig_2">4b</ref> shows a part of the customer hierarchy with the surrogates propagated to all levels of the hierarchy.</p><p>To efficiently encode hierarchies, we introduce the concept of compound surrogates for hierarchies. Since we require hierarchies to form a disjoint partitioning of a dimension, a uniquely identifying compound surrogate for each child node of a hierarchy member exists and can be recursively calculated by concatenating the compound surrogate of the member with the running number of the child node. Because only efficient bit operations are necessary, the computation is extremely fast. The ord function returns the corresponding surrogate of a hierarchy member in the specified level.</p><p>The hierarchy path North America Å USA Å Retail Å Bar (cf. Figure <ref type="figure" target="#fig_2">4b</ref>) has the compound surrogate: ord Customer (North America) ο ord North America (USA) ο ord USA (Retail) ο ord Retail (Bar) = 4 ο 1 ο 1 ο 2. The upper limit of the domain for surrogates of level i is calculated as the maximum fan-out (number of children) of all members of level i-1 of a hierarchy H, i.e., surrogates(H, i) = max {cardinality(children(H, m)) where m ∈ level(H, i -1)} The compound surrogate can be interpreted as a number.</p><p>In the above example -using the surrogate length of Figure <ref type="figure" target="#fig_2">4a</ref> with a length of 10 bits for the customer surrogate (3 for REGION, 3 for NATION, 1 for Trade Type and 3 for Business Type) the hierarchy path results in the compound surrogate 1000011010 2 = 538. Usually growth expectations for a hierarchy are known well in advance. Often hierarchy trees are even static. Therefore it is possible to determine a reasonable number of bits for storing each surrogate of the compound surrogate of a hierarchy. The overall number of bits necessary to store a compound surrogate is relatively small. For instance, a hierarchy tree with four branches on 8 levels already represents 4 8 = 65536 partitions and is stored in only 16 bits. The maximum length of the compound surrogates for the 'Juice &amp; More' schema that we used for our analysis can be computed from the maximum fan-out of the hierarchy levels given in Figure <ref type="figure" target="#fig_2">4a</ref>. This very compact fixed length encoding preserves the lexicographic order on the hierarchy levels. Thus, point restrictions on upper hierarchy levels result in range restrictions on the finest granularity of a hierarchy. For instance, the point restriction NATION = "USA" on the second level of the CUSTOMER hierarchy with f("North America") = 4 = 100 2 and f("USA") = 1 = 001 2 maps to the range restriction cs customer between 528 = 1000010000 2 and 543 = 1000011111 2 . Thus, a star join with this surrogate encoding for the foreign keys of a fact table results in a range restriction on each compound surrogate, if some hierarchy level of each dimension is restricted to a point. In the same way intervals on the children of one hierarchy level result in a range of the corresponding compound surrogates (e.g., YEAR = 1998 and MONTH between April and June). A star join on a schema with d dimensions therefore creates a d-dimensional interval restriction on the fact table.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Schema and Queries of 'Juice &amp; More'</head><p>In this section we investigate, how UB-Trees and the Tetris algorithm in combination with MHC/HI may be used to accelerate star-joins, the most frequent operation of query processing for relational data warehouses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">The Juice &amp; More Schema</head><p>We use the schema of the beverages supplier 'Juice &amp; More', a real customer of one of our project partners 2 . In the data warehouse of 'Juice &amp; More' data is organized along the following four dimensions: CUSTOMER, PRODUCT, DISTRIBUTION and TIME. Figure <ref type="figure" target="#fig_4">5a</ref> shows the hierarchies over the dimensions (the number in parentheses specifies the maximum number of level members).  The ROLAP data model for the 'Juice &amp; More' schema (Figure <ref type="figure" target="#fig_4">5b</ref>) is a typical star schema with one fact table FACT and a table for each of the 4 dimensions. Let 'SALES' and 'DISTCOST' be some of the measures in the fact table. We used the methodologies of surrogates and multidimensional hierarchical clustering as described in Section 3.4 for clustering the fact table of</p><p>In the following we describe some example queries involving star joins for the 'Juice &amp; More' schema. These queries were taken from the decision support system of 'Juice &amp; More'. Thus next to the investigation of multidimensional hierarchical clustering this section is interesting from a second point of view: The highly skewed data distribution of the 'Juice &amp; More' will prove that UB-Trees, the Tetris algorithm and MHC/HI are not only applicable to laboratory environment tests with generated data, but also prove their efficiency in the practical application scenarios. In order to show the highly skewed data distribution we included an entire section displaying the one-dimensional data distribution of 'Juice &amp; More' for every dimension.</p><p>We then present measurements performed with our prototype implementation of the UB-Tree on top of Oracle 8. For the evaluation of our clustering technique we defined a benchmark with 36 queries. In comparison we also conducted measurements with native Oracle 8 access methods: parallel full table scan (FTS) and bitmap indexes (BII). For these measurements we used a completely denormalized fact table, that is, no additional joins next to the star join had to be performed to answer the queries. The bitmap indexes were created on each hierarchy level. We did not include secondary indexes in our comparison measurements because earlier experiments showed that they are neither competitive to the UB-Tree nor to FTS or BII <ref type="bibr" target="#b19">[MZB99]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Queries on the 'Juice &amp; More' Schema</head><p>In the following we present typical queries that are taken from the 'Juice &amp; More' SAP business information warehouse. We will use these queries to illustrate our approach and we will present performance measurements for exactly these queries in Section 6.2. Query 1 (Q1, cf. Figure <ref type="figure">6</ref>) computes the sales for a given product group (TYPE and BRAND specified as (X1, X2)) and a given customer group (NATION and REGION specified as (Y1, Y2)) for the months from October to December of 1993. This query uses the UB-Tree range query algorithm on MHC/HI surrogates. However, surrogates are only used for clustering and indexing and thus are transparent to the user. Query 2 (Q2, cf. Figure <ref type="figure" target="#fig_7">7</ref>) calculates the cost of distribution of the products of type X for each distribution channel. This query uses the Tetris algorithm on MHC/HI surrogates in order to efficiently calculate aggregations.  Query 3 (Q3, cf. Figure <ref type="figure" target="#fig_8">8</ref>) restricts all dimensions on the first level of the hierarchies. This query also uses the Tetris algorithm on MHC/HI compound surrogates. The data of 'Juice &amp; More' is real world data; the data distribution of both the fact table and the dimension tables is highly skewed: The dimensions are neither distributed uniformly nor are independent. The original fact table consisted of 823.464 tuples (about 175 MB). To get a realistic large data cube, the fact table was enlarged to 26.350.848 tuples (about 6 GB). Our project partner implemented an augmentation algorithm with minimal impact on the data distribution (see <ref type="bibr" target="#b23">[Pie98]</ref>).</p><p>In the following we show some charts that describe the one-dimensional data distribution for each dimension. However, we once more stress that the dimensions are not independent (e.g., some customers always order the same subset of products, some customers or products only exist for a certain time, etc.). Thus in general, the overall selectivity of a query restricting several dimensions is not the product of the selectivities of the one-dimensional restrictions (which is shown in the following charts). We will see this deviation in Section 6. The fact table of 'Juice &amp; More' stores several measures (e.g., distribution cost, sales) aggregated on a daily basis with respect to the dimensions time, customer, product and distribution. Since the data is sensitive real-world business data, it is not possible to show the labels/names of the hierarchy members in the charts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">The Time Dimension</head><p>The time dimension of 'Juice &amp; More' consists of a twolevel hierarchy of months and years. The test data stored the years from 1993 to 1995. As shown in Figure <ref type="figure" target="#fig_4">5a</ref>, the days of the time dimension are organized by a two level hierarchy (year and month). Figure <ref type="figure" target="#fig_9">9</ref> shows the cumulated data distribution of the fact table with respect to the time dimension grouped by year and month. The horizontal axis displays the hierarchy members, with "All" at the very bottom (i.e., the lowest level), the years 1993, 1994, and 1995 in the middle and the twelve months for each year above the year.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">The Product Dimension</head><p>The top level of the hierarchy on product has five entries. The data distribution is quite skewed, there are three product groups to which 93% of all tuples of the fact table belong. 1% of the data is unclassified. The distribution of the first level of the product hierarchy is illustrated in Figure <ref type="figure" target="#fig_10">10</ref>. Multidimensional hierarchical clustering ensures that a restriction in the first hierarchy level will result in a 1%, 27%, 6%, 32% respectively 34% reduction of I/Os which are necessary to retrieve the result set. Without exactly showing the relationship between the hierarchy members, we show the skew of the data distribution over the first four product levels in Figure <ref type="figure" target="#fig_11">11</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">The Customer Dimension</head><p>According to business administration literature 20% of the customers contribute to 80% of the business. The customer dimension of 'Juice &amp; More' is a typical example for a classification of customers in such a company. A high number of customers (in this case 88%, the very left entry of Figure <ref type="figure" target="#fig_12">12</ref>) are not classified (maybe they are not interesting for the company because of small turnover or it is not possible to find a classification). The classified customer groups contain 0% to 3% of the tuples stored in the table. <ref type="foot" target="#foot_1">3</ref> The hierarchical relationships on the horizontal axis show that the hierarchy of the customer dimension is not balanced, since several hierarchy members just have one child.</p><p>0,28% 0,04% 0,73% 1,64% 0,00% 0,15% 1,72% 0,03% 0,09% 1,27% 0,27% 0,07% 0,21% 0,55% 0,77% 0,07% 0,76% 2,96% 0,06% 0,14% 0,16% 0,06% 0,08% 0,05% 0,13% 0,02% 0,06% 0,12% 0,13% 0,04% 0,02% 0,0% 0,5%  The consequence of the small number of classified customers is that queries the restriction CUSTOMER will result in a selectivity of at most 3% and therefore the overall result set will be small.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">The Distribution Dimension</head><p>There are seven entries on the first level of the distribution hierarchy. The data distribution of the fact table with respect to the distribution dimension is highly skewed.</p><p>Figure <ref type="figure" target="#fig_1">13</ref> shows the distribution of the first hierarchy level (i.e., sales organization), while Figure <ref type="figure" target="#fig_2">14</ref> shows the distribution of facts in the fact In this section we first analyze the performance of MHC/HI on UB-Trees analytically and then, taking the data distributions and query selectivities into account, verify our expectations by comparing the number of retrieved pages with the product of the selectivities over all dimensions. We then also give response times which show the superiority of UB-Trees and MHC/HI over traditional indexes even for our prototype implementation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Performance Analysis</head><p>Figure <ref type="figure" target="#fig_4">15</ref> shows the compound surrogates for the 'Juice &amp; More' data warehouse, which are calculated as fixed length compound surrogates in <ref type="bibr" target="#b18">[MRB99]</ref>. For any of the 4 hierarchies the length of the compound surrogate does not exceed 15 bits and thus can be stored in a single integer value. These compound surrogates are used as attributes for each of the four dimensions of 'Juice &amp; More' to calculate the Z-address for each tuple of the 'Juice &amp; More' fact table. The first 19 bits of the Z-address are guaranteed to be used to partition the four dimensional universe of the 'Juice &amp; More' fact table. This means that the binary strings p 15 p 14 p 13 p 12 p 11 of the compound surrogate of product, c 10 c 9 c 8 c 7 c 6 of customer, t 6 t 5 t 4 t 3 t 2 of time and d 5 d 4 d 3 d 2 of distribution are used to partition the universe. For each of the four dimensions the first hierarchy level is completely used for the partitioning. The second hierarchy level is used to a large extent to partition the universe. Therefore a restriction in the first hierarchy level will result in a reduction of the number of pages as determined by the data distribution of Section 5, i.e., a restriction of the product main group to "910" will reduce the number of pages to be retrieved to 27%, a restriction to "912" will result in a reduction to 6,41%. This holds for the restriction of the first hierarchy level in any dimension. If the top hierarchy level is restricted in several dimensions and the independence assumptions holds for these dimensions, the reduction is multiplicative. Figure <ref type="figure" target="#fig_13">16</ref> shows the predicted selectivity calculated as the product of the selectivity in each dimension, the actual selectivity and the loaded pages in percent of the entire pages in the database for two queries, which restrict the first hierarchy level of three out of four dimensions.  However, the data is not independently distributed in the entire 4-dimensional universe of 'Juice &amp; More'. In this case the predicted selectivity does not describe the actual selectivity anymore. Thus some bits of the first 19 bits are correlated. This means that not all combinations of these bits occur and some partitioning will take places in the bits below bit number 19 of the Z-address. In this case the second level may already be completely partitioned and even a third hierarchy level partitioning may have started for some dimensions. A typical part of the multidimensional space where this will happen is the customer hierarchy "unclassified", which stores 87,34% of the customers. At most the three bits c 10 c 9 c 8 of the customer hierarchy are needed to distinguish these customers from all other customers. Thus for the unspecified customers the bits c 7 c 6 of the first 19 bits of the Z-address are correlated to c 10 c 9 c 8 and two further bits may be used for partitioning. Thus d 1 will be split completely, p 10 will be used for the partitioning and t 1 will be partly split (c 5 is also correlated to c 10 c 9 c 8 ). Our measurements show that this effect also holds for other dimensions. Figure <ref type="figure" target="#fig_7">17</ref> shows queries where the first two hierarchy levels of CUSTOMER, PRODUCT and TIME are restricted, whereas the DISTRIBUTION dimension is not restricted. The selectivity predicted by the cost functions here differs from the actual selectivity of the query because of dependencies in the data distribution. However, the percentage of pages loaded is similar to the actual selectivity of each query. </p><formula xml:id="formula_1">{ 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3 2 1 3</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Performance Measurements</head><p>The measurements for 'Juice &amp; More' were performed on a SUN Enterprise with four 300 MHz UltraSPARC processors and 2 GB RAM under Solaris 2.6. As secondary storage a partition on a SPARCstorage array with RAID level 0 (6 disks striping, 5-6 MB/s transfer rate per disk) was used. All measurements were done in a single-user environment.</p><p>It is important to note that our implementation still causes significant overhead due to the fact that we have implemented the UB-Tree on top of a DBMS and not in the kernel itself. First, the number of SQL statements that have to be processed (UB: 1 statement for each page in the result set, Oracle 8 methods: 1 statement in total) leads to extensive inter-process communication (about 30% of the total processing time) and DBMS overhead (e.g., parsing of statements). Second, our table is larger than the one for the FTS and the bitmap indexes due to unimplemented compressing techniques in the UB-Tree (for 8 KB pages: UB: 878362 pages, FTS: 723539 pages, BII: FTS+31134 pages). Figure <ref type="figure" target="#fig_14">18</ref> shows result set sizes and response times of the three example queries (Section 4.2). Q1 shows that the UB-Tree with multidimensional clustering is over 2 times faster than BII even for very small result sets. Q3 which is processed by the unoptimized UB-Tree at least 10 times faster than with any other access method undermines this observation. The result set of query Q2 is quite large but the almost perfect clustering factor of the UB-Tree (in average more than 29 out of 30 tuples/page belong to the result set) still leads to a speed up of more than 30 % in comparison to BII. The time for FTS for Q2 differs from the times for Q1 and Q3 due to the less complex WHERE clause of the statement. The number of comparison operations is therefore much smaller than for the other queries, which causes the faster execution.</p><p>All these results on real data show how well the multidimensional hierarchical clustering with UB-Trees works in practice and the accuracy of our theoretical cost model <ref type="bibr" target="#b19">[MZB99]</ref>. In total more than 77% of all benchmark queries (28 out of 36) showed a speed up between a factor of 1.3 and 10 over traditional techniques.  Figures 20 and 21 list two instances for each of five further queries of that benchmark. For each query Figure <ref type="figure" target="#fig_15">19</ref> lists the number of hierarchy levels that by each query are restricted to a point for each dimension. Since the data is non-uniformly distributed, the selectivity of each query depends on the exact point restriction, not only on the number of restricted hierarchy levels. We thus present two instances of the queries, each of which has a different selectivity. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Customer</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Product</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Distribution</head><note type="other">Time</note></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions and Future Work</head><p>We have presented multidimensional hierarchical clustering by hierarchy interleaving (MHC/HI) as a physical data modeling technique for OLAP applications in relational databases. Our performance measurements have shown that MHC/HI in combination with UB-Trees and the Tetris algorithm significantly reduces the number of random accesses to the fact table for star joins and other queries with restrictions in multiple hierarchies. This results in considerably lower response times for typical data warehousing queries with star joins. For a 6 GB retail data SAP business information warehouse, our prototype implementation of MHC/HI accelerates the processing of star-join queries more than a factor of two compared to bitmap indexes, clustering B*-Trees or parallel full table scans. For dimensionalities typical for data warehousing, only I/O-time linear in size of the result set prior to aggregation and sublinear temporary storage are necessary to aggregate parts of a fact table of a star or snowflake schema. Thus secondary storage space and pre-computation time for many aggregates and bitmap indexes can be avoided. In addition the widely discussed view maintenance problem is minimized. Depending on the query, temporary storage requirements for sorting are reduced by several orders of magnitude. Our clustering approach also holds not only for ROLAP but also for MOLAP implementations of a data warehouse since both ROLAP fact tables and MOLAP data cubes can be clustered in this way. Our future work includes deriving a set of cost based decision rules for how to -possibly multidimensionallyindex a relation for a given set of queries. We also intend to apply our model to cost based query optimization and combine the model with multidimensional histograms in</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Z-curve and Z-addressesThe fundamental innovation of UB-Trees is the concept of Z-Regions to create a disjunctive partitioning of the multidimensional space. This allows for very efficient processing of multidimensional range queries<ref type="bibr" target="#b16">[Mar99]</ref>. A</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Bit-Interleaving to calculate α = Z(x) With r = max({steps(r j ) | j ∈ D}) bit interleaving has a CPU-complexity of O(d⋅r) bit operations (resp. ( )</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Surrogates for REGION and the Customer Hierarchy</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Hierarchies in the 'Juice &amp; More' schema and the corresponding star schema</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head></head><label></label><figDesc>Figure 6: Time Interval (Q1)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head></head><label></label><figDesc>SELECT SALESORG, CHANNEL, SUM(DistCost) FROM Fact F, Distribution D, Product P WHERE F.DistKey = D.DistKey AND F.ProductKey = P.ProductKey AND P.Type = X GROUP BY D.SalesOrg,D.Channel</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Distribution cost (Q2)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Partial match query in first hierarchy level (Q3)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Data Distribution in the Time Dimension</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: Product -first hierachy level</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Figure 11 :</head><label>11</label><figDesc>Figure 11: Product -first four hierachy levels</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_12"><head>Figure 12 :</head><label>12</label><figDesc>Figure 12: Customer -first four hierachy levels</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_13"><head>Figure 16 :</head><label>16</label><figDesc>Figure 16: Restrictions in the first hierarchy level in 3 of 4 dimensions</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_14"><head>Figure 18 :</head><label>18</label><figDesc>Figure 18: Query response times and result set sizes</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_15"><head>Figure 19 :</head><label>19</label><figDesc>Figure 19: Restricted hierarchies and selectivities for five queries against the 'Juice &amp; More' data warehouse</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_16"><head></head><label></label><figDesc>Figure 21: Performance Q4-Q8 (instance 2)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>, WB98]. Due to the large number of possible views a selection problem exists besides the maintenance issue [Gup97, SDN+96, SDN+98]. Clustering of OLAP data plays a key role in providing good performance. Clustering has been well researched in the field of access methods.</figDesc><table /><note>B-Trees, for instance, provide one-dimensional clustering. Multidimensional clustering has been discussed in the field of multidimensional access methods. See<ref type="bibr" target="#b6">[GG97]</ref> and<ref type="bibr" target="#b26">[Sam90]</ref> for excellent surveys of almost all of these methods. [ZSL98] addresses the issue of hierarchical clustering for the one-dimensional case. Most work on applying multidimensional indexes to RDBMS discusses restrictions by range queries [SRF97, NHS84, Gut84, OM84, LS90].<ref type="bibr" target="#b13">[JL98]</ref> accelerates range queries with aggregations by storing aggregated data in R-Trees.<ref type="bibr" target="#b18">[MRB99]</ref> and</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head></head><label></label><figDesc>table for each distribution channel of each sales organization. Again the arrows indicate the hierarchical relationship of the members of neighboring hierarchy levels with the hierarchy root "All" at the bottom of the Figure.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">8%</cell><cell>13%</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">16%</cell><cell></cell><cell cols="2">13%</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>7%</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">37%</cell><cell cols="2">6%</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>1</cell><cell>2</cell><cell></cell><cell>3</cell><cell></cell><cell>4</cell><cell>5</cell><cell></cell><cell>6</cell><cell></cell><cell>8</cell><cell></cell></row><row><cell></cell><cell cols="12">Figure 13: Distribution -first hierachy level</cell></row><row><cell>2,5% 5,0% 7,5% 10,0% 12,5%</cell><cell>10,30%</cell><cell>2,20%</cell><cell>10,30%</cell><cell>2,20%</cell><cell>6,66%</cell><cell>5,84%</cell><cell>12,50%</cell><cell>12,50%</cell><cell>12,50%</cell><cell>12,50%</cell><cell>4,05%</cell><cell>8,45%</cell></row><row><cell>0,0%</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>09 1</cell><cell>10 1</cell><cell>09 2</cell><cell>10 2</cell><cell>12 3</cell><cell>11 4</cell><cell>13 5</cell><cell>14 5</cell><cell>15 5</cell><cell>16 6</cell><cell>17 6</cell><cell>19 8</cell></row><row><cell cols="13">Figure 14: Distribution -first two hierachy levels</cell></row><row><cell cols="13">6 Performance Analysis of MHC/HI for</cell></row><row><cell cols="4">'Juice &amp; More'</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head></head><label></label><figDesc>Tree for the 'Juice &amp; More' fact table consists of P = 878362 pages, which corresponds to: l = log 2 P = log 2 878362 = 19,7 bits that are in any case necessary to address the bit-interleaved multidimensional space. With bit interleaving in the order of dimensions PRODUCT, CUSTOMER, TIME, and DISTRIBUTION the Z-address α for a tuple of the 'Juice &amp; More' fact table is calculated as:</figDesc><table><row><cell>cs</cell><cell cols="2">product</cell><cell cols="2">=</cell><cell cols="11">43 42 1 level 1 13 14 15 p p p</cell><cell>43 42 1 2 level 10 11 12 p p p</cell><cell>4 43 42 4 1 3 level 5 6 7 8 9 p p p p p</cell><cell>43 42 1 4 level 1 2 3 4 p p p p</cell></row><row><cell>cs</cell><cell cols="2">customer</cell><cell></cell><cell cols="2">=</cell><cell cols="10">2 1 level 1 9 10 c c c</cell><cell>8</cell><cell>c</cell><cell>level 6 7 c</cell><cell>2 c</cell><cell>5</cell><cell>level 4 c</cell><cell>3</cell><cell>c</cell><cell>level 2 3 c</cell><cell>4 c 1</cell></row><row><cell>cs</cell><cell>time</cell><cell>=</cell><cell>t</cell><cell>6</cell><cell>t</cell><cell cols="2">5</cell><cell>t</cell><cell>4</cell><cell cols="2">t</cell><cell cols="2">3</cell><cell>t</cell><cell>2</cell><cell>t</cell><cell>1</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="4">level</cell><cell cols="2">1</cell><cell cols="7">level</cell><cell>2</cell></row><row><cell>cs</cell><cell cols="4">on distributi</cell><cell cols="3">=</cell><cell cols="3">d</cell><cell cols="2">5</cell><cell cols="2">d</cell><cell>4</cell><cell>d</cell><cell>3</cell><cell>d</cell><cell>2</cell><cell>d</cell><cell>1</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="5">level</cell><cell>1</cell><cell>level</cell><cell>2</cell></row><row><cell cols="16">Figure 15: Compound surrogates for each dimension of</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>'Juice &amp; More'</cell></row><row><cell cols="16">The UB-{ 4 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 2 split partly on distributi data any for d partitione completely 1 1 2 6 11 2 3 7 12 3 4 8 13 4 5 9 14 5 6 10 15 d t c p d t c p d t c p d t c p d t c p = α</cell><cell>4 4 4 4 4 4 4 3 2 4 4 4 4 4 4 4 1 on distributi data on the depending d partitione 1 2 3 4 5 1 6 2 7 3 8 4 9 1 5 10 p p p p p c p c p c p c p t c p</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">'Juice &amp; More' with UB-Trees.2  The company and the data presented here have been made anonymous.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">Note that the data in the 'Juice &amp; More' warehouse is aggregated on a daily basis, thus the amount of data is usually compressed for large customers, thus the proportion of large customers is reduced.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m">to take dependencies and correlations between the attributes into account</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The universal B-Tree for multidimensional Indexing: General Concepts</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bayer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WWCA &apos;97</title>
				<meeting><address><addrLine>Tsukuba, Japan</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Verlag</publisher>
			<date type="published" when="1997-03">March, 1997</date>
			<biblScope unit="page" from="10" to="11" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">UB-Trees and UB-Cache -A new Processing Paradigm for Database Systems</title>
		<author>
			<persName><forename type="first">R</forename><surname>Bayer</surname></persName>
		</author>
		<idno>TUM-I9722</idno>
		<imprint>
			<date type="published" when="1997">1997</date>
		</imprint>
		<respStmt>
			<orgName>TU München</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">An Overview of Data Warehousing and OLAP Technologies</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chaudhuri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Dayal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIGMOD Rec</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Bitmap Index Design and Evaluation</title>
		<author>
			<persName><forename type="first">C</forename><surname>Chan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ioannidis</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>SIGMOD</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">A Guide to the SQL Standard</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Date</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1988">1988</date>
			<publisher>Addison-Wesley</publisher>
		</imprint>
	</monogr>
	<note>2nd edition</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Multidimensional Access Methods</title>
		<author>
			<persName><forename type="first">V</forename><surname>Gaede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Günther</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Index Selection for OLAP</title>
		<author>
			<persName><forename type="first">H</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Harinarayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rajaraman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ullman</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<publisher>ICDE</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Transaction Processing: Concepts and Techniques</title>
		<author>
			<persName><forename type="first">J</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Reuter</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<publisher>Morgan Kaufmann Publishers</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Query Evaluation Techniques for Large Databases</title>
		<author>
			<persName><forename type="first">G</forename><surname>Graefe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="73" to="170" />
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Selection of Views to Materialize in a Data Warehouse</title>
		<author>
			<persName><forename type="first">H</forename><surname>Gupta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Intl. Conference on Database Theory</title>
				<meeting>of the Intl. Conference on Database Theory<address><addrLine>Athens, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1997-01">January 1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">R-Trees: A dynamic Index Structure for spatial Searching</title>
		<author>
			<persName><forename type="first">A</forename><surname>Guttman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of ACM SIGMOD Conf</title>
				<meeting>of ACM SIGMOD Conf</meeting>
		<imprint>
			<date type="published" when="1984">1984</date>
			<biblScope unit="page" from="47" to="57" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Range Queries in OLAP Data Cubes</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">T</forename><surname>Ho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Megiddo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD Conf</title>
				<imprint>
			<date type="published" when="1997">1997</date>
			<biblScope unit="page" from="73" to="88" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">The R*a-Tree: An improved R*-Tree with Materialized Data for Supporting Range Queries on OLAP-Data</title>
		<author>
			<persName><forename type="first">M</forename><surname>Jürgens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">J</forename><surname>Lenz</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>DWDOT Workshop</publisher>
			<pubPlace>Vienna</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">The Data Warehouse Toolkit</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kimball</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1996">1996</date>
			<publisher>John Wiley &amp; Sons</publisher>
			<pubPlace>New York</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The hB-Tree: A Multiattribute Indexing Method with good guaranteed Performance</title>
		<author>
			<persName><forename type="first">D</forename><surname>Lomet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Salzberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM TODS</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="625" to="658" />
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">MISTRAL: Processing Relational Queries using a Multidimensional Access Method</title>
		<author>
			<persName><forename type="first">V</forename><surname>Markl</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
		<respStmt>
			<orgName>TU München</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. Thesis</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing</title>
		<author>
			<persName><forename type="first">G</forename><surname>Moerkotte</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">VLDB Conference</title>
				<meeting><address><addrLine>New York, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Improving OLAP Performance by Multidimensional Hierarchical Clustering</title>
		<author>
			<persName><forename type="first">V</forename><surname>Markl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ramsak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bayer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of IDEAS&apos;99</title>
				<meeting>of IDEAS&apos;99</meeting>
		<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Processing Operations with Restrictions in Relational Database Management Systems without external Sorting</title>
		<author>
			<persName><forename type="first">V</forename><surname>Markl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zirkel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bayer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
			<publisher>ICDE</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">The Grid-File</title>
		<author>
			<persName><forename type="first">J</forename><surname>Nievergelt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hinterberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Sevcik</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM TODS</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="38" to="71" />
			<date type="published" when="1984-03">March 1984</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A Class of Data Structures for Associate Searching</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Orenstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>Merret</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD-PODS Conf</title>
				<meeting><address><addrLine>Portland, Oregon</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1984">1984</date>
			<biblScope unit="page" from="294" to="305" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Improved Query Performance with Variant Indexes</title>
		<author>
			<persName><forename type="first">P</forename><surname>O´neill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Quass</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM SIGMOD Intl. Conf. On Management of Data</title>
				<meeting><address><addrLine>Tucson, Arizona</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1997">1997</date>
			<biblScope unit="page" from="38" to="49" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Evaluation of the UB-Tree in the SAP Environment</title>
		<author>
			<persName><forename type="first">R</forename><surname>Pieringer</surname></persName>
		</author>
		<imprint/>
		<respStmt>
			<orgName>TU</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master Thesis</note>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Integrating the UB-Tree into a data-basesystem kernel</title>
		<author>
			<persName><forename type="first">F</forename><surname>Ramsak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Markl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fenk</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">VLDB2000. 2000</date>
			<pubPlace>Cairo</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Indexing OLAP data</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sarawagi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Eng. Bulletin</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="36" to="43" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<title level="m" type="main">The Design and Analysis of Spatial Data Structures</title>
		<author>
			<persName><forename type="first">H</forename><surname>Samet</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1990">1990</date>
			<publisher>Addison Wesley</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shukla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Deshpande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Naughton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ramasamy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">VLDB Conference</title>
				<imprint>
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Materialized View Selection for Multidimensional Datasets</title>
		<author>
			<persName><forename type="first">A</forename><surname>Shukla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Deshpande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Naughton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD Conf</title>
				<imprint>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">Data Warehouse Configuration</title>
		<author>
			<persName><forename type="first">D</forename><surname>Theodoratos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Sellis</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<publisher>VLDB</publisher>
			<biblScope unit="page" from="126" to="135" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Multidimensional Access Methods: Trees Have Grown Everywhere</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Sellis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Roussopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Faloutsos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">VLDB</title>
		<imprint>
			<biblScope unit="page" from="13" to="14" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<title level="m" type="main">Encoded Bitmap Indexing for Data Warehouses</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Buchmann</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>ICDE</publisher>
			<pubPlace>Orlando</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
