<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Algorithms for Rewriting Aggregate Queries Using Views</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sara Cohen</string-name>
          <email>sarina@cs.huji.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Werner Nutt</string-name>
          <email>Werner.Nutt@dfki.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Serebrenik</string-name>
          <email>alicser@cs.huji.ac.il</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Dept., The Hebrew University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>German Research Center for</institution>
          ,
          <addr-line>Artificial Intelligence GmbH</addr-line>
        </aff>
      </contrib-group>
      <fpage>9</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>Typical queries over data warehouses perform aggregation. One of the main ideas to optimize the execution of an aggregate query is to reuse results of previously answered queries. This leads to the problem of rewriting aggregate queries using views. More precisely, given a set of queries, called “views,” and a new query, the task is to reformulate the new query with the help of the views in such a way that executing the reformulated query over the views yields the same result as executing the original query over the base relations. Due to a lack of theory, so far algorithms for this problem were rather ad-hoc. They were sound, but were not proven to be complete. In earlier work we have given syntactic characterizations for the equivalence of aggregate queries, and applied them decide when there exist rewritings. However, these decision procedures are highly nondeterministic and do not lend themselves immediately to an implementation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In the current paper, we refine those procedures by
eliminating the nondeterminism as much as
possible, thus obtaining practical algorithms for
rewriting queries with the operators count and sum. It
can be proved that our algorithms are complete
for queries where each relation occurs only once
and for queries without comparisons. We also
show how algorithms for rewriting nonaggregate
The copyright of this paper belongs to the paper’s authors. Permission to
copy without fee all or part of this material is granted provided that the
copies are not made or distributed for direct commercial advantage.</p>
    </sec>
    <sec id="sec-2">
      <title>Proceedings of the International Workshop on Design and</title>
    </sec>
    <sec id="sec-3">
      <title>Management of Data Warehouses (DMDW’99)</title>
      <p>Heidelberg, Germany, 14. - 15.6. 1999
(S. Gatziu, M. Jeusfeld, M. Staudt, Y. Vassiliou, eds.)
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-19/
queries can be modified to obtain rewriting
algorithms for queries with min and max.</p>
      <sec id="sec-3-1">
        <title>These algorithms are a basis for realizing optimizers that rewrite aggregate queries using views.</title>
        <p>1</p>
        <sec id="sec-3-1-1">
          <title>Introduction</title>
          <p>Typically, queries over data warehouses are aggregate
queries. Aggregate queries occur in different places in a
data warehouse (DW). The ultimate purpose of a DW is to
support queries by end users that want to analyze a
business. They want to have a comprehensive picture of their
company and ask for the number of customers, maximum
sales, total profits, etc. However, aggregate queries are not
only processed at the data warehouse back end. But also
loading a data warehouse often requires computation of
aggregates. Aggregation is a means of reducing the
granularity of data. Data in a data warehouse is often coarser than
the data in the operational systems that it is based on. For
instance, while the operational data of a supermarket may
contain each single item of a customers purchase, the DW
may contain only the total sales of each item by day and
store. In such a case, loading can be viewed as executing
an aggregate query.</p>
          <p>The execution of aggregate queries tends to be time
consuming and costly. Computing one aggregate value often
requires to scan many data items. This makes query
optimization a necessity. A promising technique to speed up
the execution of aggregate queries is to reuse the answers
to previous queries to answer new queries. We call a
reformulation of a query that uses other queries a rewriting.</p>
          <p>Finding such rewritings is known in the literature as the
problem of rewriting queries using views. In this phrasing
of the problem, it is assumed that there is a set of views,
whose answers have been stored, or materialized. Then,
given a query, the problem is to find a rewriting, which
is formulated in terms of the views and some database
relations, such that evaluating the original query yields the
same answers as evaluating first the views and then, based
on that result, evaluating the rewriting.</p>
          <p>Rewriting techniques are not only applicable to query</p>
          <p>In the following query we calculate the total amount of
money spent on each job position:</p>
          <p>An intelligent query optimizer could now reason that for
each type of job we can calculate the total amount of money
spent on it if we multiply the salary that one TA receives for
such a job by the number of positions of that type. The two
materialized views contain information that can be
combined to yield an answer to our query. The optimizer can
formulate a new query that only accesses the views and
does not touch the tables in the database:</p>
          <p>Motivation
sum ators min, max, count, and [NSS98]. We have
apoptimization, but also to data warehouse design [TS97],
which in a sense is the inverse of the optimization
problem. Here, one assumes that a set of queries over base
relations is given, together with some additional information
about the frequency with which the queries are posed and
the frequency with which changes to the base relations
occur. One is then interested in a set of intermediate views
from which the queries can be computed such that the total
cost of query execution and view maintenance is minimal.</p>
          <p>In order to find good views, one has to understand how
queries can be rewritten using views.</p>
          <p>Rewriting queries using views has been studied
extensively for non-aggregate queries [LMSS95], and
algorithms have been devised and implemented [LSK95,
Qia96]. For aggregate queries, the problem has been
investigated mainly in the special case of datacubes (see e.g.,
[HRU96, Dyr96]. However, there is little theory for general
aggregate queries, and the rewriting algorithms that appear
in the literature are by and large ad hoc. These algorithms
are sound, that is, the reformulated queries they produce
are in fact rewritings, but there is neither a guarantee that
they output a rewriting whenever one exists, nor that they
generate all existing rewritings [SDJL96, GHQ95].</p>
          <p>Recently, we have given syntactic characterizations for
the equivalence of SQL queries with the aggregate
operplied them to decide, given an aggregate query and a set of
views, whether there exists a rewriting, and whether a new
query over views and base relations is a rewriting [CNS99].</p>
          <p>These characterizations, however, do not immediately yield
practical algorithms.</p>
          <p>In this paper, we show how to derive such algorithms.</p>
          <p>The algorithms are sound, i.e., they output rewritings. We
can also show that they are complete for important cases,
which are relevant in practice. In Section 2 we present a
motivating example. A formal framework for rewritings of
aggregate queries is presented in Section 3. In Section 4
we give algorithms for rewriting aggregate queries and in</p>
          <p>Section 5 we conclude.
job type sistants in a course. Each TA has a in the course
At the Hebrew University, there may be many teaching
ashe assists. For example, he may give lectures or grade
exercises. Teaching assistants are financed by different sources,
like science foundations and the university itself. For each
2
We discuss an example that illustrates the rewriting
problem for aggregate queries. This example models
exactly the payment policy for teaching assistants at the
Hebrew University in Jerusalem. There are two tables with
relations pertaining to salaries of teaching assistants:
In order to evaluate the new query, we no more need to look
up all the teaching assistants nor all the financing sources.</p>
          <p>Thus, probably, the new query can be executed more
efficiently.</p>
          <p>In this example, we used our common sense in two
occasions. First, we gave an argument why evaluating the
original query yields the same result as evaluating the new query
that uses the views. Second, because we understood the
semantics of the original query and the views, we were able to
come up with a reformulation of the query over the views.</p>
          <p>Since we cannot expect a query optimizer to have common
sense, we will only be able to build an optimizer that can
rewrite aggregate queries, if we can provide answers to the
3.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Non-Aggregate Queries</title>
      <p>following two questions. How can we prove that a new
query, which uses views, produces the same results as the
original query? How can we devise an algorithm that
systematically and efficiently finds all rewritings? If efficiency
and completeness cannot be achieved at the same time, we
may have to find a good trade-off between the two
requirements.</p>
      <sec id="sec-4-1">
        <title>A Formal Framework 3</title>
        <p>UNION alization to queries with the constructor is
possiHAVING nonnested queries without a clause and with the</p>
        <p>In this section we define the formal framework in which
we study rewritings of aggregate queries. First, we extend
the well-known Datalog syntax for non-aggregate queries
[Ull89] so that it covers also aggregates. This syntax is
more abstract and concise than SQL. It is not only better
suited for a theoretical investigation, but it is also a
better basis for implementing algorithms that reason about
queries, in particular for implementations in a logic
programming language. Based on this syntax, we define the
semantics of queries and give a precise formulation of the
rewriting problem.</p>
        <p>Through the syntax we implicitly define the set of
SQLqueries to which our techniques apply. They are essentially
aggregate operators min, max, count, and sum. A
generble, but beyond the scope of this paper. For queries with
arbitrary nesting and negation no rewriting algorithms are
feasible, since equivalence of such queries is undecidable.
q(x) predicate symbol. We abbreviate a query as B(s),
q textbook like [Ull89]). Under bag-set semantics, defines
D produces over (for a precise definition consult a standard
q(x) R &amp; the body, or, shortly, as C. The variables
B(s) s where stands for the body and for the terms
occurq(x) or simply by the predicate of its head q.
q a new relation qD which consists of all the answers that
q(x) R(s) &amp; query as C(t), in case we want to
distinq tion. Under set semantics, a conjunctive query defines
D q with (for a precise definition see [CV93]).
q Under set-semantics, two queries and q0 are
equivaD A database contains for each predicate a finite
relarelational or comparisons. If the body contains no
comparisons, then the query is relational. A query is linear
if it does not contain two relational atoms with the same
ring in the body. Similarly, we may write a conjunctive
guish between the relational atoms and the comparisons in
appearing in the head are called distinguished variables,
while those appearing only in the body are called
nondistinguished variables. Atoms containing at least one
nondistinguished variable are called nondistinguished atoms. By
abuse of notation, we will often refer to a query by its head
a multiset or bag ffqggD of tuples for each database D. The
bag ffqggD contains the same tuples as the relation qD, but
each tuple occurs as many times as it can be derived over
lent if for every database, they return the same set as a
result. Analogously, we define equivalence under
bag-setsemantics.</p>
        <p>In the example below, we show how to transform a query
in SQL notation into one in Datalog notation.
Example 3.1 Consider a query that finds the teaching
assistants who have a job for which they receive more then
$500 from the government:
M fsum(z1 z2) that maps any multiset of pairs of numbers
(m1; m2) to P(m1; m2). We call such a function an
aggregation function.</p>
        <p>An aggregate query is a conjunctive query augmented
by an aggregate term in its head. Thus, it has the form
SELECT clause. We simply add it to the terms in the head
SELECT attributes are identical to the grouping attributes,</p>
        <p>Example 3.2 Recall the query in Section 2 that calculates
the total amount of money spent on each job type. The
extension of the Datalog syntax is straightforward. Since the
there is no need to single them out by a special notation.</p>
        <p>Hence, the only new feature is the aggregate term in the
of the query, after replacing the attributes with
corresponding variables. Thus, the above SQL query is transformed
into the following Datalog query:
q It can be easily checked that the Datalog query above
is equivalent to our SQL query. Obviously, one notation
can be transformed into the other, back and forth,
completely automatically.</p>
        <p>GROUP BY queries with and aggregation. We restrict
GROUP BY in the clause. Also, we assume that queries
SELECT identical to those in the statement, although SQL</p>
        <p>We now extend the Datalog syntax so as to capture also
ourselves to SQL queries where the group by attributes are
only requires that the latter be a subset of those appearing
have only one aggregate term. The general case can easily
be reduced to this one.
q(x; (y)) gregate queries. Consider the query B(s).
q(x; y) core of q, which is defined as B(s). The core is
d x of constants of the same length as let
x We call the grouping variables of the query. Queries with
q We associate to a non-aggregate query, q, called the
D (d; core returns over a bag ffqggD of tuples e). For a tuple
elementary aggregate terms are elementary queries. If the
aggregation term in the head of a query has the form (y),
we call the query an -query (e.g., a max-query).</p>
        <p>In this paper we are interested in rewriting elementary
queries using elementary views. However, as the example
in Section 2 shows, even under this restriction the
rewritings may not be elementary.</p>
        <p>We now give a formal definition of the semantics of
agFor a database D, the query yields a new relation qD. To
define the relation qD, we proceed in two steps.
the query that returns all the values that are amalgamated
in the aggregate. Recall that under bag-set-semantics, the</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Equivalence Modulo a Set of Views</title>
      <p>2This definition blurs the distinction between the function as a
mathematical object and the symbol denoting the function. However, a notation
that takes this difference into account would be cumbersome.
9-4
&amp; : : : &amp; v1( 1x1) vn( nxn);
q(x) for a rewriting of has the form
only considered conjunctive rewritings.3 Thus, a candidate
where the i’s are substitutions that instantiate the view
predicates vi(xi).</p>
      <p>The second question is whether we can reduce
reasoning about the query r, which contains view predicates, to
reasoning about a query that has only base predicates. To
this end, we unfold r. That is, we replace each view atom
vi( ixi), with the instantiation iBi of the body of vi,
where vi is defined as vi(xi) Bi. We assume that the
nondistinguished variables in different bodies are distinct.</p>
      <p>We thus obtain the unfolding ru of r, for which the
Unfolding Theorem holds:
q queries using predicates from R[V, we define that and q0
D v 2 V by interpreting every view predicate as the
relaq V = are equivalent modulo V, written q0, if qDV q0DV
q tion vD . If is a query that contains also predicates from
q V, then qDV is the relation that results from evaluating</p>
      <p>We consider aggregate queries that use predicates both
from R, a set of base relations, and V, a set of view
definitions. We want to define the result of evaluating such a
query over a database D. We assume that a database
contains only facts about the base relations.</p>
      <p>For a database D, let DV be the database that extends
over the extended database DV . If q, q0 are two aggregate
for all databases D.
q want to know whether there is a rewriting of using the</p>
      <p>We review the questions related to rewriting relational
conjunctive queries. Suppose, we are given a set of conjunctive
queries V, the views, and another conjunctive query q. We
views in V.</p>
      <p>The first question that arises is, what is the language for
expressing rewritings? Do we consider arbitrary first order
formulas over the view predicates as candidates, or even
recursive queries, or do we restrict ourselves to
conjunctive queries over the views? Since reasoning about queries
in the first two languages is undecidable, researchers have
We now present techniques for rewriting aggregate queries.</p>
      <p>Our approach will be a to generalize the known techniques
for conjunctive queries. Therefore, we first give a short
review of the conjunctive case and then discuss in how far
aggregates give rise to more complications.</p>
      <p>4.2.1</p>
    </sec>
    <sec id="sec-6">
      <title>Rewritings of Relational Count-queries</title>
      <p>When rewriting count-queries, we must deal with the same
questions that arose when rewriting conjunctive queries.
Thus, we first define the language for expressing
rewritings. Even if we restrict the language to conjunctive
aggregate queries over the views, we still must decide on two
additional issues. First, which types of aggregate views are
useful for a rewriting? Second, what will be the
aggregation term in the head of the rewriting? A count-query
is sensitive to multiplicities, and count-views are the only
3It is an interesting theoretical question, which as yet has not been
resolved, whether more expressive languages give more possibilities for
rewritings. It is easy to show, at least, that in the case at hand allowing
also disjunctions of conjunctive queries as candidates does not give more
possibilities than allowing only conjunctive queries.
(1)
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>Rewritings of Count-queries</title>
      <p>We consider the problem of rewriting count-queries. As
a first step, we consider rewriting relational count-queries.
We then extend our technique in order to rewrite
countqueries with comparisons.
3.4</p>
    </sec>
    <sec id="sec-8">
      <title>General Definition of Rewriting</title>
      <p>ru(x)
nBn:
4</p>
      <sec id="sec-8-1">
        <title>Rewritings of Aggregate Queries</title>
        <p>4.1</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Reminder: Rewritings of Relational Conjunctive</title>
    </sec>
    <sec id="sec-10">
      <title>Queries</title>
      <p>n ing candidates, rn, for 1:</p>
      <p>Example 4.2 Consider aggregate queries q, v, and
rewrittype of aggregate views that do not lose multiplicities4.</p>
      <p>Thus, the natural answer to the first question is to use only
count-views when rewriting count-queries. We show in
the following example that there are an infinite number of
aggregate terms that can be usable in rewriting a
countquery.
q fvg Then it is easy to see that rn is a rewriting of modulo
for all n. It is natural to create only r1 as a rewriting of
q. In fact, only for r1 will the Unfolding Theorem hold.</p>
      <p>However, when creating rewritings automatically we must
restrict the aggregate term in the head of the rewriting in
order to prevent deriving infinitely many rewritings.
count aggregate term in the rewriting with the term. This
r r. We call a count-rewriting candidate. Note that in some
r Thus, we obtain the unfolding ru of defined as
count) where vic are count-views defined as vic (xi; Bi
count is a natural choice and is also necessary since is the
and zi are variables not appearing elsewhere in the body of
cases, it is possible to omit the summation. This is true if
the values of zi are functionally dependent on the value of
x. In such a case, the summation is always over a singleton.</p>
      <p>After presenting our rewriting candidates we now show
how we can reduce reasoning about rewriting candidates,
to reasoning about conjunctive aggregate queries. We use
a similar technique to that shown in Subsection 4.1. We
replace view atoms with the appropriate instantiations of
their bodies. As in the case for relational queries, unfolding
should preserve equivalence. Otherwise reduction about
the reasoning is not possible. We choose to replace the
only aggregate term which will preserve this characteristic.
r tic. Now, instead of checking whether is a rewriting
v replacing R0 with , variables that appeared in R0 and do
’ can assume, w.l.o.g. that is the identity mapping on the
v q with in the body of in order to derive a partial
rewrit’ R-usable(v, , w.r.t. to rewrite q. We denote this fact as
v(u; count) We start by discussing when a view, Rv,
v v replaced by , using ’, we say that is R-usable under
u not appear in are not accessible anymore. Thus, we can
v q R. Therefore, is usable for rewriting only if there
exv R v order for , to be usable, must “cover” some part of
q(x; count) q r R. Recall that a rewriting of is a query
defined is the only candidate preserving this
characterisof q, we can verify if ru is equivalent to r. It has been
shown [CV93, NSS98] that two relational count-queries
are equivalent if and only if they are isomorphic.</p>
      <p>We present an algorithm that finds a rewriting for a
query using views. Our approach can be thought of as
reverse engineering. We have characterized the “product”
that we must create, i.e., a rewriting, and we now present
an automatic technique for producing it.
instantiated by , is usable in order to rewrite a query,
that when unfolded yields a query isomorphic to q. Thus, in
ists an isomorphism, ’, from Rv to R0 R. Note that we
distinguished variables of v. We would like to replace R0
ing of q. This cannot always be done. Observe that after
only perform the replacement if these variables do not
appear anywhere else in q, in q’s head or body. If R0 can be
’).</p>
      <sec id="sec-10-1">
        <title>Example 4.3 Consider the following</title>
        <p>p1(x0; u0; y0)
4Although sum-views are sensitive to multiplicities (i.e., are
calculated under bag-set-semantics), they lose these values. For example,
sumviews ignore occurrences of zero values.
9-6
q rewrite the updated version of (line 6). Thus, in each
iteration of the loop, additional atoms are covered. In line 10
the algorithm checks if a nondistinguished atom is already
covered. If so, then the algorithm must fail, i.e., backtrack,
as explained above. The algorithm is sound and complete
as stated below.</p>
        <p>R &amp; C and a set of views, V, a rewriting
We extend the technique presented in the previous section
in order to rewrite queries with comparisons. Thus, we
consider the problem of rewriting queries having comparisons
with views having comparisons. We augment the
rewriting candidate form with comparisons. Thus given a query</p>
        <p>Our algorithm runs in nondeterministic polynomial
time. The algorithm guesses views and instantiations and
then verifies that the obtained result is a rewriting in a
polynomial time. This is an optimal algorithm, since the view
usability problem for relational conjunctive count-queries
is NP-complete [CNS99].
4.2.2</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Rewritings of Count-Queries with Comparisons</title>
      <p>Relational Count Rewriting
q(x; count) R V A query and a set of views
r A rewriting of q.</p>
    </sec>
    <sec id="sec-12">
      <title>Algorithm</title>
    </sec>
    <sec id="sec-13">
      <title>Input Output</title>
      <p>V tional rewriting. As above, it holds that ru r. Thus,
r We can unfold in the same fashion as unfolding a
relar q In order to verify that is a rewriting of , we have to
once again we can reduce reasoning about queries with
views to reasoning about equivalent queries without views.
verify that ru q.</p>
      <p>In [NSS98], we gave a sound and complete
characterization of equivalence of conjunctive count-queries. The
only known algorithm that checks equivalence of
conjunctive count-queries creates an exponential blowup of the
queries. Thus, it is difficult to present a tractable
algorithm for computing rewritings. However, equivalence of
linear count-queries with comparisons is isomorphism of
the queries [NSS98]. Thus, we will give a sound,
complete, and tractable algorithm for computing rewritings of
linear count-queries. This algorithm is also sound and
tractable for the general case, but is not complete. In the
sequel, in this section we assume that the comparisons in
the queries we are rewriting are deductively closed.
Computing a deductive closure is a well-known polynomial
procedure [Klu88].
that are sensitive to multiplicities, they are useful for
rewritings. However, sum-views may lose multiplicities
and make the aggregation variable inaccessible. Thus, at
most one sum-view should be used in the rewriting of a
query. The following are rewriting candidates of the query
(2)
(3)
C C modify in line 12. We remove from its comparisons
C-usable(v, , this fact as ’).</p>
      <p>R the appropriate subset of by the appropriate instantiation</p>
      <p>We present an algorithm for computing rewritings of
conjunctive count-queries in Figure 4.2.2. Note that we
containing a variable that is not accessible after replacing
of v. Thus, this step is necessary in order for the resulting
query to be safe [Ull88].</p>
      <p>The given algorithm runs in nondeterministic
polynomial time. The following theorem states that it is both
sound and complete for linear queries and is sound, but not
complete, for arbitrary queries.
sum queries. Since and count-views are the only views</p>
      <p>Rewriting sum-queries is similar to rewriting
countqueries. When rewriting sum-queries we must also take
the aggregation variable into consideration. We present an
algorithm for rewriting sum-queries that is similar to the
algorithm for count-queries.</p>
      <p>We define the form of rewriting candidates for
sumcount) where vic is a count-view of the form vic (xi; Bi
sum(y)) and vs is a sum-view of the form vs (xs; Bs .
y Note that the variable in the head of the query in
Equation 3 must appear among ixi for some i. In [CNS99] we
showed that if a rewriting candidate is equivalent to its
unfolding then it must be one of the above forms. As in the
case of count-query rewritings, in some cases the rewriting
may be optimized by dropping the summation.</p>
      <p>Once again, we reduce reasoning about rewriting
candidates to reasoning about conjunctive aggregate queries. For
this purpose we extend the unfolding technique introduced
in Subsection 4.2. Thus, the unfoldings of the candidates
presented are:
Com- may be chosen as well. We call this algorithm
pute Rewriting. We derive an algorithm for rewriting
r q Now, instead of checking whether is a rewriting of
we can verify if ru is equivalent to r. However, the only
known algorithm for checking equivalence of sum-queries,
presented in [NSS98], requires an exponential blowup of
the queries. Thus, it might be very difficult to provide a
tractable algorithm that is both sound and complete for
arbitrary sum-queries. However, relational sum-queries and
linear sum-queries are equivalent if and only if they are
isomorphic. Thus, we can extend the algorithm presented in
the Figure 4.2.2 for sum-queries.</p>
      <p>As a preliminary step for our algorithm we extend the
algorithm in Figure 4.2.2, such that in line 5 sum-views
sum-queries, presented in Figure 4.3. The algorithm runs
in nondeterministic polynomial time and the following
holds:</p>
    </sec>
    <sec id="sec-14">
      <title>Rewritings of Sum-Queries</title>
      <p>Example 4.6 This example shows the incompleteness of
the algorithm for the general case. Consider the query q,
view v, rewriting r, and unfolding ru</p>
    </sec>
    <sec id="sec-15">
      <title>Algorithm</title>
    </sec>
    <sec id="sec-16">
      <title>Input Output</title>
      <p>Count Rewriting
q(x; count) A query
r A rewriting of q.
http://www.cs.huji.ac.il/ versity and it is located at
HAVING clauses, negation and functional dependencies,
Aggregate queries are increasingly prevalent due to the
widespread use of data warehousing for decision support.</p>
      <p>They are generally computationally expensive since they
scan many data items, while retrieving few. Thus, the
computation time of aggregate queries is generally
orders of magnitude larger than the result size of the query.</p>
      <p>This makes query optimization a necessity.
Optimizing aggregate queries was studied in the context of
datacubes [HRU96, Dyr96]. However, there was little
theory for general aggregate queries, beyond this context. In
this paper, based on previous results in [NSS98, CNS99],
we presented algorithms that enable reuse of precomputed
queries in answering new ones.</p>
      <p>The algorithms presented were implemented in
SICStus Prolog. The system was developed at Hebrew
Uni~alicser/aggrq/.</p>
      <p>Topics for future research include rewriting queries with
and enriching the class of aggregate functions with
statistical functions.
(4)
(5)
queries can be modified to find rewritings of max-queries.</p>
      <p>Rewriting nonaggregate queries is a well known
problem [LMSS95]. Thus, we do not present algorithms for
finding rewritings of max-queries in this paper.
5 Conclusion
y Note that the variable in the head of the query in
Equawhere vi is a nonaggregate view and vm is a max-view.
tion 5 must appear among ixi for some i. In [CNS99]
we showed that if a rewriting candidate is equivalent to its
unfolding then it must have one of the above forms.</p>
      <p>Once again, reasoning about rewriting candidates can
be reduced to reasoning about max-queries, using an
appropriate extension of the unfolding technique. We have
shown [NSS98] that equivalence of relational max-queries
is equivalence of their cores. There is a similar
reduction for the general case. Thus, algorithms developed for
checking set-equivalence of queries can easily be converted
to algorithms for checking equivalence of max-queries.</p>
      <p>Similarly, algorithms that find rewritings of nonaggregate
4.4</p>
    </sec>
    <sec id="sec-17">
      <title>Rewritings of Max-Queries</title>
      <p>We consider the problem of rewriting max-queries. Note
that max-queries are insensitive to multiplicities. Thus, it
is natural to use nonaggregate views and max-views when
rewriting a max-query. When using a max-view the
aggregation variable becomes inaccessible. Thus, we use at most
one max-view. The following are rewriting candidates of
the query q:</p>
    </sec>
    <sec id="sec-18">
      <title>Algorithm</title>
    </sec>
    <sec id="sec-19">
      <title>Input</title>
    </sec>
    <sec id="sec-20">
      <title>Output</title>
      <p>count) Let q0 be the query q0(x;</p>
      <p>Let r0=Compute Rewriting(q0; V).</p>
      <p>If r0 is of the form
sum(y r0(x;
&amp; : : : &amp; &amp; r0(x; sum(Qin=1 zi)) v1c( 1x1; z1) vnc( nxn; zn) C0
y and appears among ixi</p>
    </sec>
    <sec id="sec-21">
      <title>Then return r0</title>
      <p>If r0 is of the form</p>
    </sec>
    <sec id="sec-22">
      <title>Then return</title>
      <p>[CM77]</p>
      <p>S. Cohen, W. Nutt, and A. Serebrenik.
Rewriting aggregate queries using views. In Ch.
Papadimitriou, editor, Proc. 18th Symposium on
Principles of Database Systems, Philadelphia
(Pennsylvania, USA), May 1999. ACM Press.</p>
      <p>To appear.</p>
      <p>S. Chaudhuri and M. Vardi. Optimization of
real conjunctive queries. In Proc. 12th
Symposium on Principles of Database Systems,
Washington (D.C., USA), May 1993. ACM Press.</p>
      <p>
        C. Dyreson. Information retrieval from an
incomplete datacube. In Proc. 22nd
International Conference on Very Large Data Bases,
Bombay (In
        <xref ref-type="bibr" rid="ref1">dia), September 1996</xref>
        . Morgan
      </p>
      <p>Kaufmann Publishers.
[GHQ95] A. Gupta, V. Harinarayan, and D. Quass.
Aggregate query processing in data warehouses.</p>
      <p>In Proc. 21st International Conference on Very
Large Data Bases. Morgan Kaufmann
Publishers, August 1995.</p>
      <p>V. Harinarayan, A. Rajaraman, and J. Ullman.</p>
      <p>Implementing data cubes efficiently. In Proc.
1996 ACM SIGMOD International Conference
on Management of Data, pages 205–227,
Montreal (Canada), June 1996.
[JK83]
[Klu88]</p>
      <sec id="sec-22-1">
        <title>D.S. Johnson and A. Klug. Optimizing conjunctive queries that contain untyped variables.</title>
        <p>SIAM Journal on Computing, 12(4):616–640,
1983.</p>
      </sec>
      <sec id="sec-22-2">
        <title>A. Klug. On conjunctive queries containing inequalities. J. ACM, 35(1):146–160, 1988. [LMSS95] A.Y. Levy, A.O. Mendelzon, Y. Sagiv, and D. Srivastava. Answering queries using views.</title>
        <p>In Proc. 14th Symposium on Principles of
Database Systems, pages 95–104, San Jose
(California, USA), May 1995. ACM Press.
[LSK95]
[NSS98]
[Qia96]</p>
      </sec>
      <sec id="sec-22-3">
        <title>A.Y. Levy, D. Srivastava, and T. Kirk. Data</title>
        <p>model and query evaluation in global
information systems. Journal of Intelligent Information
Systems, 5(2):121–143, 1995.</p>
      </sec>
      <sec id="sec-22-4">
        <title>W. Nutt, Y. Sagiv, and S. Shurin. Deciding equivalences among aggregate queries. In</title>
        <p>J. Paredaens, editor, Proc. 17th Symposium on
Principles of Database Systems, pages 214–
223, Seattle (Washington, USA), June 1998.
ACM Press. Long version as Report of Esprit
LTR DWQ.</p>
      </sec>
      <sec id="sec-22-5">
        <title>X. Qian. Query folding. In Stanley Y. W. Su,</title>
        <p>editor, Proc. 12th International Conference on
Data Engineering, pages 48–55, New Orleans
(Louisiana, USA), March 1996. IEEE
Computer Society.
[SDJL96] D. Srivastava, Sh. Dar, H.V. Jagadish, and A.Y.</p>
        <p>Levy. Answering queries with aggregation
using views. In Proc. 22nd International
Conference on Very Large Data Bases, Bombay
(In9-10
[TS97]
[Ull88]
[Ull89]</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>dia)</source>
          ,
          <year>September 1996</year>
          . Morgan Kaufmann Publishers.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Theodoratos</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.K.</given-names>
            <surname>Sellis</surname>
          </string-name>
          .
          <article-title>Data warehouse configuration</article-title>
          .
          <source>In Proc. 23nd International Conference on Very Large Data Bases</source>
          , pages
          <fpage>126</fpage>
          -
          <lpage>135</lpage>
          , Athens (Greece),
          <year>August 1997</year>
          . Morgan Kaufmann Publishers.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Ullman</surname>
          </string-name>
          .
          <article-title>Principles of Database and Knowledge-Base Systems</article-title>
          , Vol. I:
          <article-title>Classical Database Systems</article-title>
          . Computer Science Press, New York (New York, USA),
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Ullman</surname>
          </string-name>
          .
          <article-title>Principles of Database and Knowledge-Base Systems</article-title>
          , Vol. II:
          <article-title>The New Technologies</article-title>
          . Computer Science Press, New York (New York, USA),
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>