<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Divide and Aggregate: Caching Multidimensional Objects</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Wolfgang</forename><surname>Lehner</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Database Systems (IMMD6</orgName>
								<orgName type="institution">University of Erlangen-Nuremberg</orgName>
								<address>
									<addrLine>Martensstr. 3</addrLine>
									<postCode>91058</postCode>
									<settlement>Erlangen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jens</forename><surname>Albrecht</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Database Systems (IMMD6</orgName>
								<orgName type="institution">University of Erlangen-Nuremberg</orgName>
								<address>
									<addrLine>Martensstr. 3</addrLine>
									<postCode>91058</postCode>
									<settlement>Erlangen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Wolfgang</forename><surname>Hümmer</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Database Systems (IMMD6</orgName>
								<orgName type="institution">University of Erlangen-Nuremberg</orgName>
								<address>
									<addrLine>Martensstr. 3</addrLine>
									<postCode>91058</postCode>
									<settlement>Erlangen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Divide and Aggregate: Caching Multidimensional Objects</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">0134DBBA63C7C51D403958787C38CD59</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T11:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>A common optimization technique in the OLAP application domain is the use of summary aggregate data. For an appropriate support of analysis hot spots, we propose a query based aggregate cache of 'multidimensional objects'. The major drawback of previous query caching methods is that new queries must be completely subsumed by a single cached object in order to utilize it. Our solution is the introduction of 'setderivability' which allows the combination of several aggregates to derive a single query. The set of cached multidimensional objects is dynamically determined using both users access patterns and semantical knowledge of dimensional structures. Experimental results show that average cost reductions of over 50% are easily reached while spending only 10% of additional storage for summary data. *. To illustrate these dependencies [HaRU96] introduced to notion of an aggregation lattice.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Online Analytical Processing (OLAP) offers analysts an efficient interactive access to consistent business data, typically stored in a data warehouse. From the physical database design point of view, the two main obstacles to provide efficient multidimensional access to those data are to choose the right partitioning scheme and the right selection of summary tables. Whereas partitioning is a common concept in physical database design, the generation of summary tables is specific to the OLAP application domain with mostly read-only data access. Due to a limited storage capacity not all possible aggregations may be precomputed. In this paper we present a dynamic caching strategy for multidimensional data incorporating both partitioning schemes and summary tables. Our strategy is easy to implement in relational OLAP servers and provides excellent support of analysis hot spots. Experiments show that average cost reductions of over 50% are easily reached while spending only 10% of additional storage for summary data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Derivability</head><p>To speed up queries, multidimensional objects are cached and reused during runtime. The framework of Multidimensional Objects provide a consistent and formalized view to database queries as well as to materialized summary data. They may be visualized as a partition (restricted to certain selection predicates) of an aggregated data cube (using grouping attributes) specified by a regular star-query.</p><p>A necessary prerequisite for reuse is derivability ([Klug82], <ref type="bibr">[Fink82]</ref>). Derivability in the context of partitioned aggregates concerns the following two aspects:</p><p>• compatibility of grouping attributes:</p><p>The precomputed data from which a query may be derived has to have an equal or finer granularity with regard to the corresponding dimensional structures. * • compatibility of selection predicates:</p><p>This condition for derivability states that the precomputed partition of summary data must not be more restrictive than the query which is the subject of derivation.</p><p>Since predicate comparison in general is ), recent work either does not consider partitioning at all ([HaRU96], [BaPT97]), is limited to test for equality <ref type="bibr">([ShSV99]</ref>) or reduced to statically predefined partitions (chunks; [DRSN98]).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Contribution</head><p>To overcome these limits and to provide a simple and powerful strategy to select and release precomputed summary data with limited scope, this paper contributes the following:</p><p>• set-derivability: Our approach is not limited to derive a single query from a single summary data table but computes a nearly optimal construction plan for a Related Work query based on a set of precomputed summary data partitions ('patch-working on multidimensional objects'; section 4). • use of dimensional structures: Since we allow partitioning only with respect to classification hierarchies (and not for dimensional attributes) algorithms to compose a multidimensional aggregate of a set of cached objects become feasible. • dynamic caching policy: Since static precomputation of complete data cubes does not adequately support data analysis hot spots, our approach decides 'on-thefly' which summary data partitions (multidimensional objects) are worth to keep materialized. Moreover, materialization of the same data cube partition at different aggregation levels is explicitly allowed and the size of a partition is not a system parameter but implicitly determined by the user. • easy to integrate: Since we perform algebraic optimization, our algorithm may be easily integrated into relational OLAP-servers. • high average cost reduction: As our experiments show, we achieve high cost reductions with only very decent storage overhead at almost no cost.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Structure of the Paper</head><p>In the next section, we give a comprehensive overview of recent work which is related to selection and use of materialized aggregates in the context of OLAP and data warehousing. The third section formally introduces the one-and multidimensional structures of the considered multidimensional model. Based on the notion of multidimensional objects, section 4 defines the concept of set-derivability using 'patch-working' and outlines the four different influence factors to compute the overall benefit of a single cached multidimensional object. Finally, results of various experiments are presented in section 5. The paper concludes with a summary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>The general idea of precomputing summary data is rooted in the application area of 'Statistical and Scientific Databases'. Several proposals like [ChMM88] were made but the idea became popular with the emergence of 'Data Warehousing' together with 'Online Analytical Processing' and the resulting need for an efficient and mostly readonly access to aggregates in a multidimensional context <ref type="bibr">([ChSh95]</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Static versus Adaptive Precomputation and Caching Schemes</head><p>A first classification of existing work may be specified according to the flexibility of the scheme with regard to user access behavior. Given a set of relational queries, the algorithm of [YaKL97] computes the global query execution plan and selects the node of the resulting plan with the highest number of references for precomputation. [Gupt97] extends this procedure in the context of AND/OR-graphs by considering alternative local query execution plans.</p><p>Based on the concept of an aggregation lattice, the algorithm specified in [HaRU96] ([GHRU97] additionally considers existing index structures) statically selects those nodes of the lattice with the best cost saving ratio in relation to the additional storage required by that node. Since this approach shows a complexity of O(n 3 ) with n as the number of possible nodes of an aggregation lattice, the work of [BaPT97] introduces heuristics to reduce this number and make the approach more applicable. In opposite to these static algorithms, dynamic or adaptive strategies try to incrementally optimize the set of precomputed aggregates. On the relational level, [ShSV99] materializes the results of arbitrary complex queries. Unfortunately, by using hash codes they are able to reuse materialized queries only in the case of an exact match. This is not acceptable in a real OLAP scenario. In the case of an exact match with regard to the group-by attributes, the recent work of [DRSN98] proposes a chunk-based caching strategy. The materialization of chunks (see below) is resolved according to a classical CLOCK scheme, thus directly reflecting the user access behavior. In [KoRo99] a similar approach is presented. Again, a single query can only be answered from a single materialized fragment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table-, Query-, Chunk-based Partitioning Schemes</head><p>The second classification criteria of existing strategies is based on the unit of precomputation. Basically there are three levels, namely table-, query-, and chunk-based strategies.</p><p>Algorithms like [HaRU96], [GHRU97], and [BaPT97] do not support partitioning and are therefore classified as table-based strategies. Compatibility of selection predicates has not to be checked since all summary tables are defined on an unrestricted scope (the second condition of derivability is trivial). However, this implies that analysis hot spots like "the current period" are not reflected appropriately, because the algorithms materialize only summary tables containing all periods.</p><p>On the opposite, chunk-based partitioning schemes like [DRSN98] define relatively small aggregate partitions of a fixed size (chunks). Queried chunks are cached and later reused to answer new queries. On the one hand, [DRSN98] reports very high cost saving ratios in their experiments. On the other hand however, the implementation of special bitwise index structures to identify single chunks is extremely expensive. Moreover, they use a special chunk-based file system preventing a seamless integration into relational OLAP servers. At last, the choice of the chunk size must be statically determined and is crucial for the efficiency of the whole system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The Multidimensional Model</head><p>The third class of different partitioning schemes is made up by query-based partitioning schemes ([SDJL96], [ShSV99], [YaKL97], [GuHQ95]). Single user queries reflect the unit of caching and make the selection of the right partition size obsolete. Recent work reuses results of former queries either by query containment or by exact match ([SDJL96], [ShSV99]) where the incoming query must be completely derivable from the materialized query. Moreover, it is further argued against query-based partitioning <ref type="bibr">([DRSN98]</ref>) that several materialized queries may address a common part of a data cube. This would result in a redundant storage of that part and decrease the utilization of the additional storage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The Multidimensional Model</head><p>The following section introduces the dimensional and multidimensional structures which build the formal representation of the multidimensional model. The description is divided into the presentation of dimensions and multidimensional objects both of which are necessary to define a multidimensional database ([LeAW98]).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dimensional Structures</head><p>Dimensional structures are defined in the conceptual schema design phase of a database. The semantic information coded into these structures is heavily used for cache replacement.</p><p>Definition 1: A dimension D of a multidimensional database is a set of attributes D = { PA } ∪ { CA 1 , ..., CA p } ∪ { DA 1 , ..., DA q } and the following set of functional dependencies between these attributes:</p><p>• ∀i (1≤i≤p): PA→CA i and ∀i (1≤i≤q): PA→DA i The primary attribute PA of a dimension functionally determines all other attributes of a single dimension. The instances of the primary attribute are called dimensional elements.</p><formula xml:id="formula_0">• ∀i (1≤i≤p-1): CA i → CA i+1</formula><p>The classification attributes CA i of a dimension are used to define a multi-level grouping of single dimensional elements and are ordered according to their functional dependencies. The aggregation level or granularity i of a classification attribute CA i is denoted as CA i . By definition, the primary attribute has the aggregation level 0 (PA = CA 0 ). • ∀i,j i≠j (1≤i,j≤q): DA i DA j The dimensional attributes DA i of a dimension represent all attributes describing properties of the dimensional elements. Although the concept of dimensional attributes is more complex ([LeAW98], [ABD + 99]), dimensional attributes may be seen as additional single-level hierarchies on a dimension, which implies that they are functionally independent of each other † .</p><p>Example 1: The set of dimensions of a typical OLAP sample scenario consists of a product, time, and shop dimension. Within the shop dimension for example, the primary attribute corresponds to a unique shop identifier. A regional classification may be defined by the sequence city → state → region → country. The set of dimensional attributes may be defined as { shop-type, purchase-class, selling-area, shop-window-size }. In real OLAP scenarios, a single (usually a product or a customer) dimension may easily have 20 different attributes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Multidimensional Structures</head><p>The basic data structure in the multidimensional context, the notion of a multidimensional object, is of fundamental importance. With regard to the caching algorithms described in the following section, all incoming queries as well as materialized results of former queries are internally represented as multidimensional objects. Example 2: The formal representation of a sales data cube for video products in Germany would be as follows: M = ( [ SALES , FLOW, SUM ], ( p.group = 'Video', s.country = 'Germany' ), [ (p.family, s.region), { p.brand, s.shoptype } ] ) The aggregation type determines the set of applicable operations on the numeric values to ensure correct summarizability ([LeSh97]). For example, all aggregation opera-→ / †. Moreover it is worth to mention that this definition does not explicitly consider multiple hierarchies with more than one level. Multiple hierarchies would introduce a partial order on the classification attributes (instead of a total order for the single classification hierarchy). This extension would not affect any of the presented derivability or caching algorithms but would complicate the presentation. Therefore, we only consider a single classification hierarchy.</p><p>CA 0</p><formula xml:id="formula_1">1 CA p 1 1 CA 0 n CA p n n DA 1 1 DA q 1 1 DA 1 n DA q n n 2-4</formula><p>The Multidimensional Model tions are applicable to sales figures (A = FLOW), whereas the sum operation would not be allowed on figures denoting the price of single articles (A = VALUEperUNIT). The aggregation type STOCK prohibits summation in the temporal dimension. Finally the size of a multidimensional object M at the instance level is denoted by |M |.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Relations and Operations on Multidimensional Objects</head><p>To formally define 'derivability' on multidimensional objects, we need to introduce the '≤'-relation for the granularity specification and the intersection of scopes of multidimensional objects. Moreover, we need intersection and difference operators of multidimensional objects themselves. ) ) The intersection is empty if two classes s i and s' i do not show a parent-child relationship within the corresponding classification hierarchy of dimension i.</p><p>It is important to note that the problem of intersecting classes in a classification hierarchy may be solved by checking the simple parent-child relationships. This concept is fundamental for the applicability of the set-derivability mechanism. • both objects refer to the same measure, i.e. M = M' • the intersection of the scopes is not empty, i.e. S ∩ S' ≠ ∅ • the granularity specification of M is finer than or equal to the granularity specification of M', i.e. G ≤ G'. As seen in this definition the coarser multidimensional object M' is implicitly refined to the granularity level G.</p><p>To be able to introduce the concept of set-derivability by substitution ('patch-working'; section 4), we need to define the difference of two multidimensional objects. As illustrated in figure <ref type="figure">1</ref> for the two-dimensional case, the differ-ence of two n-dimensional cubes would result in the worst case in a set of 3 n -1 convex residual cubes -assuming exactly 3 classification nodes partitioning each dimension.</p><p>It can be shown that intelligent slicing can reduce this number to 2n residual cubes ([ChMM88], proof of proposition 4). Informally, the cube is cut iteratively for each dimension into maximal three slices; two slices are residual cubes and are not considered any longer; the third slice contains the operand and is sliced further in the remaining dimensions. For n dimensions we obtain 2n residual cubes.</p><p>In the general case with dimensions consisting of an arbitrary number of classification nodes for each dimension i (1≤i≤n) the minimum number p i of partitioning classification nodes including the operand to be subtracted has to be determined. Then the number of residual cubes will be , 1≤i≤n, because of the same argumentation as above.For more detailed information refer to [AlGL99]. and .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Derivability of Multidimensional Objects</head><p>To reuse the results of former queries, derivability of multidimensional objects as the units of caching is a necessary prerequisite. The following definition of the total derivability will be relaxed in the next section to the partial derivability which is sufficient to define the set-derivability.   ) where a materialized aggregate can only be accessed if the complete query is derivable from this single cached object. This is because query containment can be checked with a relatively simple algorithm where the determination of the overlapping regions for arbitrary predicates is NP-complete ([SuKN89]). However, in the context of multidimensional objects where the predicates consist of scope definitions corresponding to classes in a classification hierarchy this problem can be reduced to the determination of ancestor and sibling relationships.</p><formula xml:id="formula_2">p i 1 - ( ) ∑ S S' S i i 1 = k ∪ ∪ = S i i 1 = k ∩ ∅ = Set-</formula><p>Definition 9: A multidimensional object M = (M, S, G) is partially derivable from a multidimensional object M' = ( M', S', G'), i.e. M' " M, if and only if</p><p>• the conditions (1) and (3) of the total derivability hold for M and M' • M and M' are overlapping, i.e. S ∩ S' ≠ ∅ In this case, M' is said to support M, and M ∩ M' is called the support of M' for M.</p><p>From the set of supporters, i.e. the potential candidates, an appropriate set of partitions, or patches, has to be selected in order to answer a query. Such a set is called a substitution for the query. .</p><formula xml:id="formula_3">Definition 10: A set S = { 〈M 1 , M 1 ' 〉,...,〈M k , M k ' 〉 } with M i = (M i , S i , G i ) and M i ' = (M i ', S i ', G i ') is called a substitution for a multidimensional object M Q , if and only if • ∀i (1≤i≤k): M i = M Q , G i = G Q , and • ∀i (1≤i≤k): M i ' " M i , • S Q = ∪ i S i ,</formula><p>• ∀i, j (1≤i,j≤k) with i≠j: S i ∩ S j = ∅. In this case M is said to be set-derivable from { M 1 ', ..., M k ' } Each patch in S has two components 〈M i , M i ' 〉 denoting that partition M i of M Q is to be computed from M i '. All partitions must be disjoint and their union must result in M Q .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The Query Optimization Problem</head><p>In general, the cache contains several possibly overlapping aggregates with different granularities which support a query. Therefore, the problem of the multidimensional query optimizer is to choose a substitution that allows the computation of the query at minimal cost. This problem is very similar to the well-known knapsack problem <ref type="bibr">([CoLR90]</ref>), where a thieve tries to select the most valuable set of things with different values and weights. A common method is to use a greedy strategy to approximate the optimal solution for the NP-complete 0/1 knapsack problem. In the case of the fractional knapsack problem where the thieve may take arbitrary fractions of objects the greedy approach actually yields the optimum. The problem of finding an optimal substitution can be reduced to the fractional knapsack, if the cost to compute a fraction of one multidimensional object from another one is monotonous in the size of the support. </p><formula xml:id="formula_4">M ∩ M' | &lt; | M ∩ M'' | ⇒ Cost(M, M' ) &lt; Cost(M, M'' ).</formula><p>In the case of a monotonous cost function, an algorithm selecting the cheapest substitution for each data cell in the multidimensional object M automatically finds the optimal substitution ('Greedy-Choice property'; [CoLR90]). However, especially in the context of relational databases monotony of the cost function may not be given. In a relational backend the computation of a multidimensional object results in range queries on the affected tables representing the supporting multidimensional objects. In the absence of indexes, range queries on relational tables usually result in full table scans destroying the property of monotony.</p><p>In order to find a substitution for a query we use the greedy strategy outlined in algorithm 1. The algorithm gets the query M Q and the set C of all materialized multidimensional objects as input parameters. It returns a substitution S and the estimated cost to compute M Q . As long as there are still uncomputed fragments remaining (line 5), the candidate with the least access cost per cell for each fragment is selected (lines 9-21). After the selected patch is added to the solution S (line 24) it must be removed from the remainder R using the difference operator (line 28). Since M may </p><formula xml:id="formula_5">M 1 M 2 M Q M 3</formula><p>Experimental Results only be partially derivable from M best the uncovered difference M-M best must remain in R. It is worth to mention here that a solution is always possible if C contains also the raw data objects</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cache Replacement</head><p>The knowledge of the OLAP application domain provides not only a lot of semantic information about the data model but also about the users behavior. In order to incorporate reference density as well as semantic knowledge the benefit in our approach is defined by the four factors of the weighted relative reference density, the absolute benefit, the degree of relationship, and the reconstruction cost ([ADB + 99]):</p><p>• Weighted Relative Reference Density D M The weighted relative reference density is used to approximate the traditional least reference density (LRD) in the multidimensional context. A reference counter is increased only by the fraction of M that was actually referenced.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>• Absolute Benefit A M:</head><p>The absolute benefit is based on the size, the granularity, and the scope of the multidimensional object and therefore generalizes the benefit definitions from [HaRU96] as well as</p><formula xml:id="formula_6">[DRSN98]. • Degree of Relationship R M (M Q ):</formula><p>In the OLAP context, it is very likely that an object in the cache that can be reached within a few navigational operations will be used with a high probability. Therefore, the number of navigation operations which are necessary to transform the last query M Q into a materialized, i.e. cached multidimensional object M defines the degree of relationship for M.</p><formula xml:id="formula_7">• Reconstruction Cost C M (M Q , C):</formula><p>Since cached multidimensional objects may be computed from the set of multidimensional objects being currently an element of the cache, it may prove beneficial if those multidimensional objects are replaced which can be reconstructed most easily. Since the factors have very different orders of magnitude, they are normalized by their average values before the overall benefit B M (M Q , C) of a multidimensional object M for a cache configuration C after query M Q is computed by a linear combination. The overall benefit is used as the indicator, whether there is a change of the set of cached multidimensional objects or not.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experimental Results</head><p>The following section illustrates the optimization potential of the presented algorithms for caching multidimensional objects. Our proposed dynamic caching strategy is analyzed according to different cache sizes and influence factors for the computation of the overall benefit. Furthermore, we compare our strategy with the static precomputation approach of <ref type="bibr">[HaRU96]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experimental Setup</head><p>The proposed data structures and algorithms are implemented in the multidimensional OLAP server CUBESTAR.</p><p>All query processing and cache management is based on algebraic operations on multidimensional objects. However, multidimensional objects are physically stored in a relational backend (Oracle 8.0.4; DEC Alpha 3000/800, 1CPU, 192MByte RAM) in a star-schema-like manner (ROLAP system). For each patch in the optimal substitution for the query (algorithm 1), the query processor gener-  Algorithm 1: Substitution of a multidimensional object ('patch-working')</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experimental Results</head><p>ates an SQL statement, which are put together by UNION ALL operations to yield the final result. We could not completely eliminate side-effects of the system, especially of the database buffer, but multiple runs confirmed the following results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Multidimensional Database Schema and Query Generation</head><p>The schema of the multidimensional database consists of three dimensions with five, four and six levels, and a set of up to 14 dimensional attributes, respectively. The corresponding raw data cube was randomly filled with a sparsity of 3% resulting in about 250.000 tuples in the relational representation. To simulate the cache replacement behavior, a new query is constructed from the previous query by applying a single operation in a certain dimension with a predefined probability. Since a typical OLAP user does not switch very often between dimensions, in 80% of the cases the operation was executed on the same dimension and in 20% of the cases the operation involved a different dimension as the previous one. The set of possible operations included slice (scope changes to a child node), unslice (scope changes to a parent node), drill-down (granularity changes to the next lower level), roll-up (granularity changes to the next higher level), split (add a dimensional attribute) and merge (remove a dimensional attribute).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Performance Metric</head><p>Since in general the cache management aims at maximizing the cost saving ratio ([ShSV99], [DRSN98]), we defined this metric in the context of multidimensional objects. The cost saving ratio CSR on cached multidimensional objects is defined by the division of the sum of the costs of the queries using the cache content and the total cost to compute the queries from the non-aggregated raw data.</p><p>In the experimental scenario, two user traces, one simulating 1000 operations of single user and another one 1500 steps of five independent users, were used to investigate the behavior of the cache for varying cache sizes and configurations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Influence of the Cache Parameters</head><p>The first set of graphs in figure <ref type="figure" target="#fig_8">3</ref> illustrates the impact of the four cache parameters using a cache size of 20% of the raw data volume. The figures show the development of the cost saving ratios for the single and the multi user trace. If all parameters are enabled, cost savings of about 56% in the single user case and 50% in the multi user case were obtained. By trying all possible combinations of parameters it turned out that for both the single and the multi user traces the usage of the relative reference density D and the relationship degree R yields the best results of 60% and 52% respectively. On the contrary, the worst results of 35% and 28%, respectively, were achieved using the absolute benefit A and the reconstruction cost C.</p><p>It is interesting to note that the number of potential users of a materialized aggregate, reflected by the absolute benefit A, is not suitable for the determination the overall benefit. Algorithms like [HaRU96] are based solely on this factor. Furthermore, it turns out that the general critics according to query-based caching schemes namely the possibly redundant storage of the same part of data cube in different objects is of minor interest, because the influence factor of the reconstruction cost also does very badly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Influence of the Cache Size</head><p>The diagram in figure <ref type="figure">4</ref> shows the simulation results for different cache sizes varying over 10.000, 25.000, 50.000, 100.000, and 500.000 tuples. Compared to the raw data volume these values result in an additional storage overhead of 4%, 10%, 20%, 40% and 200%, respectively. For these simulations the best combination of cache parame- ters, i.e. D and R, was used. As can be seen, already with a relatively small cache of 4% of the raw data size, cost reductions of over 50% in the single user mode and over 40% in the multi user mode are possible. However, the cost savings in the single user mode do not increase much over 60% for any cache size beyond 50.000 tuples. The figure illustrates that even for multiple users a cache of a fairly small size yields cost reductions not much worse than for a single user, i.e. above 50%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Comparison to Static Precomputation of Aggregates</head><p>In order to compare our method to other materialization methods of aggregations, we implemented a static precomputation algorithm based on [HaRU96] ‡ . This algorithm computes a set of aggregates with an unrestricted scope based on a benefit definition similar to the absolute benefit. Figure <ref type="figure">4</ref> shows the cost saving ratio for different cache sizes. If only 4% of additional storage are provided, there is almost no cost reduction possible for the static algorithm (therefore not in the figure), because the size of almost all useful aggregates with an unrestricted scope is larger than the cache size. Although with increasing cache size the cost saving ratio of the static approach rises, the values for the dynamic strategies remain unrivaled. Even for a cache size twice as large as the raw data volume the cost saving ratios for the single and multi user cases are only about 40% and 30%, respectively. Note that the dynamic strategy yields over 50% with 1/50 of that cache size. The reason for the bad results of the static approach is that the size of the aggregation lattice grows exponentially in the number of possible group-by combinations. In most application scenarios this number is high, because several evaluation criteria on a single dimension are used, like dimensional attributes or multiple hierarchies. Using a reasonable amount of additional storage, static methods can only supply an insufficient number of aggregates. Moreover, because general static approaches do not take into account specific partitions, i.e. regions of interest or hot spots, much of the additional storage space is wasted regions of aggregates which are never or seldom requested.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Summary</head><p>In this paper, we present a novel method for the dynamic selection of materialized aggregates in the context of multidimensional database systems. Our proposed solution is based on the notion of multidimensional objects which provide a consistent view to queries as well as to materialized aggregates. The use of multidimensional objects drastically simplifies the determination of intersections and differences of queries. Therefore, we are able to present a querybased caching strategy overcoming the limitations of query containment by the concept of set-derivability. Set-derivability is achieved by a patch-working mechanism, which (in the case of a monotonous cost function) computes the cheapest substitution of a query by a set of materialized aggregates. The cache replacement strategy is based on the overall benefit computation for each element in the aggregate cache which consists of a linear combination of four different influence parameters. At last we give the results of various experiments. Average cost reductions of 50% to 60% may be obtained by only 10% additional storage capacity.</p><p>The proposed optimization technique must not be seen as an alternative to traditional physical optimization methods like indexes or fragmentation, but as an addition. Our method not only explicitly supports fragmentation of the fact table but makes it even more beneficial. If aggregates are only materialized for hot spots, a single query may utilize both fragments of materialized aggregates and fragments of raw data in one operation. Our proposed strategy may easily be integrated into most relational OLAP servers, because all optimizations involve only algebraic operations on common multidimensional structures. ‡. Unfortunately, we could not compare our approach with the work of [DRSN98], since they require a very special 'chunk-based' file system in the context of their Paradise database system. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Definition 2 :</head><label>2</label><figDesc>A multidimensional object (MO) is a triple M = (M, S, G) consisting of a measure M, an n-ary vector of classes S = (s 1 , ..., s n ) denoting the scope, and the granularity specification G. A measure is a tuple M = [N, A, O] where N is a reserved name for the corresponding fact, A ∈ { FLOW, STOCK, VALUEperUNIT } is the aggregation type, and O ∈ { NONE, SUM, AVG, COUNT } is the applied aggregation function on that specific fact. A granularity specification is a tuple G = [L, P] where L ('aggregation Level') is an n-ary vector of classification attributes of each dimension (L ∈ { , ..., } ⊗ ... ⊗ { , ..., } ), and P ('Properties') is a set of dimensional attributes (P ∈ { , ..., } ∪ ... ∪ { , ..., } ).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Definition 5 :</head><label>5</label><figDesc>The intersection of multidimensional objects M = (M, S, G) and M' = (M',S',G'), i.e. M ∩ M' is defined as M ∩ M' = (M, S ∩ S', G) if and only if</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Definition 6 :</head><label>6</label><figDesc>The difference of two multidimensional objects M = (M, S, G) and M' = (M',S',G'), i.e. M − M' is defined as a set {(M, S 1 , G) 1 , ..., (M, S k , G) k } where k is the number of residual cubes and S i are the scopes of residual objects obeying the following conditions:</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Definition 7 :</head><label>7</label><figDesc>A measure M = (N, A, O) is derivable from a measure M'= (N', A', O') if and only if • the measures refer to the same real world entity (N = N' and A = A') • the operations are compatible, i.e., O = O' orO' = NONE Note that semi-additive functions like AVG are split into the additive and non-additive counterparts ({SUM, COUNT} and division ). The additive operations build a regular subject of the caching mechanism. Thus by materializing multidimensional objects with the applied operations SUM and COUNT, an AVG-query will never be directly but only indirectly cached.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure</head><label></label><figDesc>Figure 1: Difference of multidimensional objects</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Definition 11 :</head><label>11</label><figDesc>A cost function Cost(M, M' ) determining the cost to compute the support of a multidimensional object M from M' is monotonous, if for any two arbitrary multidimensional objects M' and M'' holds: |</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Patch-Working</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head></head><label></label><figDesc>of existing multidimensional objects C = {M 1 , ..., M n } multidimensional object representing the query M Q , Output: substitution S for M Q 1 // initialize remainder, patch-work and cost 2 R := { M Q }; S := ∅; cost total := 0 3 4 // iterate over</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Cost saving ratio for different combinations of cache parameters a) single user, 1000 steps b) multiple users, 1500 steps</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head></head><label></label><figDesc>Query trace configurations and cost savings with varying cache sizes</figDesc><table><row><cell></cell><cell>70%</cell></row><row><cell></cell><cell>60%</cell><cell>single user,</cell></row><row><cell></cell><cell></cell><cell>1000 steps</cell></row><row><cell></cell><cell>50%</cell></row><row><cell></cell><cell></cell><cell>5 users,</cell></row><row><cell>CSR</cell><cell cols="2">0 10% 20% Figure 4: Operations: 4% 10% 20% 40% 200% 14% drill-down / 30% roll-up 20% split / 15% merge 15% slice / 6% unslice 30% 1500 steps 40%</cell></row></table></figure>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Summary</head><p>Currently, we are exploring the integration of indexes on the multidimensional objects and the benefit of using special linearizations. Another research is the use of the fragment-based patch-working algorithm in a parallel server environment. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Summary</head></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Management of multidimensional Aggregates for efficient Online Analytical Processing</title>
		<author>
			<persName><forename type="first">J</forename><surname>Albrecht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Deyerling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Günzel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Hümmer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lehner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Schlesinger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Database Engineering and Applications Symposium (IDEAS&apos;99</title>
				<meeting><address><addrLine>Montreal, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1999">August 1-3), 1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Set-Derivability of Multidimensional Aggregates</title>
		<author>
			<persName><forename type="first">Algl99</forename><surname>Albrecht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Günzel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lehner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">First International Conference on Data Warehousing and Knowledge Discovery (DaWaK&apos;99</title>
				<meeting><address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1999-09-01">August 30 -September 1), 1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Materialized Views Selection in a Multidimensional Database</title>
		<author>
			<persName><forename type="first">E</forename><surname>Bapt97 Baralis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Paraboschi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Teniente</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">23rd International Conference on Very Large Data Bases (VLDB&apos;97</title>
				<meeting><address><addrLine>Athens, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1997">Aug 25-29), 1997</date>
			<biblScope unit="page" from="156" to="165" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A Model of Summary Data and its Applications in Statistical Databases</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Chmm88 Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Mcnamee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Melkanoff</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th International Working Conference on Statistical and Scientific Database Management (4SSDBM</title>
				<meeting>the 4th International Working Conference on Statistical and Scientific Database Management (4SSDBM<address><addrLine>Rome, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1988">June 21-23. 1988</date>
			<biblScope unit="page" from="356" to="372" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">An Overview of Costbased Optimization of Queries with Aggregates</title>
		<author>
			<persName><forename type="first">Chsh95</forename><surname>Chaudhuri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Data Engineering Bulletin</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="3" to="9" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>Colr90 Cormen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Leiserson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Rivest</surname></persName>
		</author>
		<title level="m">Introduction to Algorithms</title>
				<meeting><address><addrLine>Cambridge (MA); London; New York u</addrLine></address></meeting>
		<imprint>
			<publisher>McGraw-Hill Book Company</publisher>
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Caching Multidimensional Queries Using Chunks</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Drsn98 Deshpande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ramasamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shukla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Naughton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th International Conference on Management of Data (SIGMOD&apos;98</title>
				<meeting>the 27th International Conference on Management of Data (SIGMOD&apos;98<address><addrLine>Seattle (WA), USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1998-04">June 2-4), 1998</date>
			<biblScope unit="page" from="259" to="270" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Common Expression Analysis in Database Applications</title>
		<author>
			<persName><forename type="first">S</forename><surname>Fink82 Finkelstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on the Management of Data (SIGMOD&apos;82</title>
				<meeting>the International Conference on the Management of Data (SIGMOD&apos;82<address><addrLine>Orlando (FL), USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1982-04">June 2-4), 1982</date>
			<biblScope unit="page" from="235" to="245" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Index Selection for OLAP</title>
		<author>
			<persName><forename type="first">H</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Harinarayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rajaraman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Ullman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th International Conference on Data Engineering (ICDE&apos;97</title>
				<meeting>the 13th International Conference on Data Engineering (ICDE&apos;97<address><addrLine>Birmingham, Great Britain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1997-11">April 7-11), 1997</date>
			<biblScope unit="page" from="208" to="219" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Aggregate-Query Processing in Data Warehousing Environments</title>
		<author>
			<persName><forename type="first">Guhq95</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Harinarayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Quass</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21th International Conference on Very Large Data Bases (VLDB&apos;95</title>
				<meeting>the 21th International Conference on Very Large Data Bases (VLDB&apos;95<address><addrLine>Zürich, Schwitzerland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1995">Sept. 11-15. 1995</date>
			<biblScope unit="page" from="358" to="369" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Selection of Views to Materialize in a Data Warehouse</title>
		<author>
			<persName><forename type="first">H</forename><surname>Gupt97 ; Gupta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th International Conference on Database Theory (ICDT&apos;97</title>
				<meeting>the 6th International Conference on Database Theory (ICDT&apos;97<address><addrLine>Delphi, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1997-08-10">Jan. 8-10), 1997</date>
			<biblScope unit="page" from="98" to="112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Implementing Data Cubes Efficiently</title>
		<author>
			<persName><forename type="first">V</forename><surname>Haru96 Harinarayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rajaraman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Ullman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th International Conference on Management of Data</title>
				<meeting>the 25th International Conference on Management of Data<address><addrLine>Montreal, Quebec, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1996-06">June 4-6), 1996</date>
			<biblScope unit="page" from="205" to="216" />
		</imprint>
	</monogr>
	<note>SIGMOD&apos;96</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions</title>
		<author>
			<persName><forename type="first">A</forename><surname>Klug82 ; Klug</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the ACM</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="699" to="717" />
			<date type="published" when="1982">1982</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">DynaMat: A Dynamic View Management System for Data Warehouses</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Koro99 Kotidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Roussopoulos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Management of Data (SIGMOD&apos;99</title>
				<meeting><address><addrLine>Philadephia (PA), U.S.A.</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1999">June 1-3), 1999</date>
			<biblScope unit="page" from="371" to="382" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Normal Forms for Multidimensional Databases</title>
		<author>
			<persName><forename type="first">Leaw98</forename><surname>Lehner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Albrecht</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wedekind</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Conference on Scientific and Statistical Data Management (SSDBM&apos;98</title>
				<meeting>the 10th International Conference on Scientific and Statistical Data Management (SSDBM&apos;98<address><addrLine>Capri, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1998">July 1-3), 1998</date>
			<biblScope unit="page" from="63" to="72" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Summarizability in OLAP and Statistical Data Bases</title>
		<author>
			<persName><forename type="first">Lesh97</forename><surname>Lenz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-J</forename><surname>Shoshani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th International Conference on Statistical and Scientific Database Management (9SSDBM</title>
				<meeting>the 9th International Conference on Statistical and Scientific Database Management (9SSDBM<address><addrLine>Olympia (WA), USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1997">Aug. 11-13. 1997</date>
			<biblScope unit="page" from="132" to="143" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Answering Queries with Aggregation Using Views</title>
		<author>
			<persName><forename type="first">D</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">V</forename><surname>Jagadish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Levy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 22th International Conference on Very Large Data Bases (VLDB&apos;96</title>
				<meeting>22th International Conference on Very Large Data Bases (VLDB&apos;96<address><addrLine>Bombay, India</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1996-06-03">Sept. 3-6), 1996</date>
			<biblScope unit="page" from="318" to="329" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
