<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Tractable Counting of the Answers to Conjunctive Queries?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Reinhard Pichler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Skritek</string-name>
          <email>skritekg@dbai.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technische Universitat Wien</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Conjunctive queries (CQs) are one of the most fundamental forms of database queries. In general, the evaluation of CQs is NPcomplete. Consequently, there has been an intensive search for tractable fragments. In this paper, we want to initiate a systematic search for tractable fragments of the counting problem of CQs, i.e., the problem of counting the answers to a CQ. We prove several new tractability and intractability results by starting with acyclic conjunctive queries and generalising these results to CQs of bounded hypertree-width.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        other tractable fragments of CQs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] like CQs of bounded tree-width [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], CQs of
bounded query-width [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and also ACQs. Indeed, ACQs are precisely the CQs
whose hypertree-width is equal to 1.
      </p>
      <p>
        In this paper, we concentrate on the counting problem related to CQ
evaluation, i.e., the problem of counting the number of answers to a CQ. We shall
refer to this problem as the #CQ problem. Of course, the #CQ problem is
intractable. Indeed, it is #P-complete for CQs with free variables only (i.e., no
existential quanti ers) and even slightly harder (namely # NP-complete) in the
general case [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Interestingly, the search for tractable fragments of #CQ has
received very little attention so far even though the count operator is an integral
part of the SQL language and is supported by virtually any relational database
management system. Very recently, we have shown by an ad hoc algorithm that
the query complexity of #CQ becomes tractable for CQs of bounded tree-width
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. However, a systematic search for tractable fragments of #CQ is completely
missing to date. In this paper, we want to close this gap. Our goal is to arrive at a
situation comparable to the well-studied decision problem of CQ evaluation, i.e.,
(i) we want to establish tractability for ACQs, (ii) the corresponding algorithms
should be based on semi-joins, and (iii) it should be possible to generalise the
tractability results to CQs of bounded hypertree-width.
      </p>
      <p>Organisation of the paper and summary of results. In Section 2, we
recall some basic notions and results. A conclusion is given in Section 6. The
main results of the paper are detailed in Sections 3 { 5, namely:</p>
      <p>ACQs with free variables only. In Section 3, we restrict ourselves to ACQs
with free variables only. We present our basic algorithm for counting the answers
to ACQs of this speci c form. Our algorithm only requires semi-joins and simple
arithmetic. We thus establish the tractability of the combined complexity of
#CQ for ACQs without existential quanti ers.</p>
      <p>Arbitrary ACQs. In Section 4, we consider #CQ for arbitrary ACQs. The
tractability of the data complexity is easily shown. For the query complexity,
a signi cant extension of the basic algorithm from Section 3 is required. In
contrast, for the combined complexity of the #CQ problem for arbitrary ACQs
we establish the intractability by proving its #P-completeness.</p>
      <p>Extensions. All tractability and intractability results discussed above
immediately carry over to CQs of bounded hypertree-width. In Section 5, we
discuss further extensions and applications of our results. For instance, the
#Pcompleteness of the combined complexity of the counting problem of ACQs
allows us an easy #P-completeness proof for the counting problem of unions of
ACQs even if these ACQs have no existentially quanti ed variables at all.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        Schemas and instances. A relational schema R = fR1; : : : ; Rng is a set of
relation symbols Ri each of a xed arity k and with an assigned sequence of k
attributes (A1; : : : ; Ak). An instance I over a schema R consists of a relation RI
i
for each relation symbol Ri 2 R, s.t. both have the same arity. We only consider
nite instances here, and denote with dom(I) the active domain of I. We write
x for a tuple (x1; : : : ; xn). By slight abuse of notation, we also refer to the set
X, etc. For
CQs. A conjunctive query (CQ) Q on a database schema R is of the form
Q : ans(x) 9y (x; y), where (x; y) = Vin=1 ri is a conjunction of atoms ri =
Rj (z), s.t. Rj 2 R with arity k, and z x [ y with jzj = k. We do not consider
constants in the query, because a simple preprocessing step allows us to get rid of
them. For the same reason, w.l.o.g. we assume no variable to occur twice in any
atom [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The free variables x are also referred to as distinguished variables,
while the existentially quanti ed variables y are often called nondistinguished
variables. We denote with atoms(Q) the set fr1; : : : ; rng of atoms appearing
in Q, and for a set A of atoms, var (A) denotes the set of variables occurring
in A. By slight abuse of notation, for a tuple x = (x1; : : : ; xn) of variables
and a mapping : x ! dom(I), we write (x) for ( (x1); : : : ; (xn)). A tuple
s is called an answer or solution to a CQ Q on an instance I if s = (x),
where : x [ y ! dom(I) is a variable assignment on x; y, s.t. for every atom
Ri(z) 2 atoms(Q), Ri( (z)) 2 I. For such a mapping , we also write (Q) I
to emphasise that every atom Ri(z) 2 atoms(Q) is sent to an atom in I. The
set Q(I) of answers to Q on I is a relation ansQ(I) containing all answers.
      </p>
      <p>
        In this paper, we study the counting problem #CQ, which is de ned as
follows: Given a conjunctive query Q over some relational schema R and an
instance I for R, how many tuples are contained in ansQ(I)?
ACQs. There exist several notions of acyclic CQs (ACQs). We restrict ourselves
here to the so-called -acyclicity [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. One way to characterise ACQs is via join
trees, i.e., a CQ is acyclic, if it has a join tree [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. A join tree T for a query Q
is a tree T = (V; E), where V is identi ed with the set of atoms in Q, and E is
such that for every two atoms ri; rj 2 V having variables in common, all atoms
on the unique path from ri to rj contain all variables shared by ri and rj . Given
a CQ it can be e ciently checked if it is acyclic, and if so, also a join tree can
be computed e ciently [
        <xref ref-type="bibr" rid="ref12 ref20">12, 20</xref>
        ]. We often identify a tree T with its vertex set
V , and write t 2 T for T = (V; E); t 2 V . If we want to stress the di erence
between the graph structure and the query atoms, for some t 2 T for a join tree
T , we write Qt to denote the query atom identi ed with node t. Further, given
an instance I to evaluate Q on, for every t 2 T we denote with Rt the relation
RiI assigned to the relation symbol of Qt.
      </p>
      <p>
        Hypertree decompositions. A hypertree decomposition [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] of a CQ Q is
a triple hT; ; i, where T = (V; E) is a tree and ; are mappings : V !
P(var (atoms(Q))) and : V ! P(atoms(Q)) s.t. (1) for each atom ri 2 atoms(Q),
there exists a v 2 V s.t. var (frig) (v); (2) for each z 2 var (atoms(Q)), the
set T 0 = fv 2 V j z 2 (v)g of T induces a connected subtree of T ; (3) for
each v 2 V , (v) var ( (v)); (4) for each v 2 V , var ( (v)) \ (Tv) (v),
where Tv T is the complete subtree of T rooted at v. The width of a hypertree
decomposition is maxv2V j (v)j, and the hypertree-width of Q is the minimum
width over all hypertree decompositions of Q. For a given width !, the
existence of a hypertree decomposition of width ! can be e ciently decided, and a
decomposition can be e ciently computed, if one exists [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Counting complexity. While decision problems ask if for a given problem
instance, at least one solution exists, counting problems ask how many di erent
solutions exist, see e.g. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], Chap. 18. The most intensively studied counting
complexity class is #P, which contains those function problems which consist
in counting the number of accepting computation paths of a non-deterministic
polynomial-time Turing machine. In other words, #P captures the counting
problems corresponding to decision problems in NP. However, there exist also
#P-complete problems for which the corresponding decision problem is easy.
Classical examples of this \easy to decide, hard to count" phenomenon are
#PERFECT MATCHINGS and 2-SAT [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]: checking if a bipartite graph has a
perfect matching or if a 2-CNF formula has a model is easy. But counting the
number of perfect matchings or the number of models of a 2-CNF-formula is
hard. Note that by #P-completeness we mean completeness w.r.t. Cook
reductions, i.e., polynomial-time Turing reductions [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>ACQs without Nondistinguished Variables</title>
      <p>
        We rst consider CQs of the form ans(x) (x), i.e. only queries that contain
no existentially quanti ed variables. Without further restrictions, #CQ is well
known to be #P-complete for CQs of this form [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, just as for the
decision problem, for ACQs, we can make use of the existence of a join tree to
guide the computation. We provide an algorithm, mainly working with
semijoins along the join tree, that shows that the problem is indeed tractable for
ACQs. Note that we consider counting w.r.t. set semantics here.
      </p>
      <p>Before describing the algorithm, we have to x some notation. Given an
acyclic CQ Q : ans(x) (x) over database schema R and instance I, let T be
a join tree of Q. Recall that for t 2 T , Qt denotes the query atom identi ed with
t, and Rt the relation RiI assigned to the relational symbol of Qt. Further let
Tt be the complete subtree of T rooted at t, and let Xt = var (fQtg). For Qt =
Ri(x1; : : : ; xk), if clear from the context, it is convenient to also write Qt to refer
to (x1; : : : ; xk) only. For a subtree T 0 of T , XT 0 = St2T 0 Xt and QT 0 = St2T 0 Qt.
Given QT 0 , the subquery de ned by QT 0 is the query ans(XT 0 ) V
the following, we always assume an acyclic CQ Q : ans(x) (x) witth2Tj o0iQntt.rIene
T to be evaluated over some database instance I.</p>
      <p>Obviously, every row in Rt encodes a variable assignment on Xt, hence
a partial assignment on x. The idea of the algorithm is to compute, for every
t 2 T , in a bottom-up traversal of the tree the number of possible extensions of
to XTt that are valid solutions to the subquery de ned by QTt . Formally, for a
mapping : XTt ! dom(I) and some t 2 T , we de ne the footprint of on t as
fpr ( ; t) = , where is a variable assignment on Xt s.t. (xi) = (xi) for all
xi 2 Xt. For a subtree T 0 T with root t and : Xt ! dom(I), we de ne the set
N (T 0; t; ) = f : XT 0 ! dom(I) j fpr ( ; t) = and (QT 0 ) Ig of all possible
extensions of the variable assignment to an assignment on the variables XT 0
that are solutions to the subquery QT 0 . The intuition behind this set is that for
every row 2 Rt (which can be considered as a variable assignment on Xt),
N (Tt; t; ) contains all possible extensions of to XTt that are solutions to QTt .</p>
      <p>As we cannot a ord to compute this set explicitly (which would be the naive
algorithm), we just compute and store its cardinality, i.e. jN (Tt; t; )j. Hence,
at each node t 2 T we maintain a table R^t, having jXtj + 1 columns. Thereby
the rst jXtj columns (denoted as R^t; (Xt)) store , while the last column stores
jN (Tt; t; )j. To better distinguish between jN (Tt; t; )j, i.e. the value intended
to be stored in this column, and the actual value present there, we refer to this
column (or value) as c . Note that we are only interested in such assignments
s.t. N (Tt; t; ) 6= ;, because otherwise can never be extended to an answer to
Q. Hence, for every assignment s.t. (Qt) 2= Rt, we need no entry in R^t, and
therefore R^t contains exactly one row for every 2 Rt s.t. jN (Tt; t; )j &gt; 0.
Thereby denotes the variable assignment de ned by . If clear from the
context, we drop the from and use and interchangeably.</p>
      <p>It is easy to see that N (Tt; t; ) \ N (Tt; t; 0) = ; for 6= 0. Hence, the
number of answers to the ACQ Q can be retrieved from the root node r of
T , by just summing up the counter of all entries in R^r. Indeed, the union
S 2Rr N (Tr; r; ) gives all possible variable assignments on XTr = x s.t.</p>
      <p>(QTr ) = (Q) I. By the disjointness of N (Tr; r; ) for di erent values of
nu, mjSber2Rofr dNi (Terre;nrt; so)jlu=tioPns.2IRtrrjeNm(aTirn;srt;o )sjh, oawndhothwerteofocroemPput2eRtrhce vgailvueess tfhoer
each R^t. The algorithm for this consists of two phases: The initialisation phase,
and a bottom-up traversal of the tree.</p>
      <p>Initialisation Phase: For every leaf node t 2 T , R^t is computed as follows: For
every distinct tuple 2 Rt, add a tuple ( ; 1) to R^t. Note that for leaf nodes
we have Xt = XTt . Hence, the only extension for each is itself, and R^t
therefore contains the correct values.</p>
      <p>Bottom-Up Traversal: Let t 2 T be a node with children t1; : : : ; tk (k 1),
such that R^tj has already been computed during the bottom-up traversal for
every j 2 f1; : : : ; kg. The idea is now to compute R^t by starting from a relation
considering t only and then, step by step, for every j 2 f1; : : : ; kg, to include
the subtree Tj (where we use Tj to abbreviate Ttj ) by computing the
semijoin between the output of the previous step and R^tj . Formally, let R^t0 contain
a tuple ( ; 1) for every distinct tuple 2 Rt, i.e. R^t0 is de ned analogously
to R^t0 for a leaf node t0 2 T . Then, for every child j 2 f1; : : : ; kg, compute
Rt; (Xt) = R^tj; 1(Xt) n R^tj; (Xtj ). To compute cj for every row in R^tj, we rst
^j
de ne for every 2 R^tj; (Xt) the set B R^tj; (Xtj ) containing exactly those
2 R^tj; (Xt) that can be joined with , i.e. B
(xi) for every xi 2 Xt \ Xtj g. Then cj for every row
= f 2 R^tj; (Xtj ) j (xi) =</p>
      <p>^j
2 Rt; (Xt) is set to
cj = cj 1 P 2B c where cj 1 is the value of c in R^tj 1. Finally, R^t = R^tk.</p>
      <p>The following example will help to illustrate this algorithm.</p>
      <p>Example 1. Consider an ACQ Q : ans(x1; x2; x3; x4; x5; x6) R3(x3) ^ R4(x2,
x4; x3) ^ R1(x1; x2; x3) ^ R2(x2; x3) ^ R2(x5; x6) and an instance I = fR1(s1; c1,
b1); R1(s1; c1; b2); R1(s3; c3; b1); R1(s3; c1; b4); R1(s2; c2; b3); R2(c1; b2);
R2(c1; b1); R2(c4; b6); R3(b1); R3(b2); R4(c1; a1; b1); R4(c1; a1; b2); R4(c1; a2; b2)g
over the schema R = fR1=3; R2=2; R3=1; R4=3g. A possible join tree for Q is
shown in Fig. 1, where each of the nodes t 2 ft1; : : : ; t5g is annotated with Qt.
In addition, the pattern of the relations shown at the leaf nodes is \Rt ) R^t". At
the two non-leaf nodes, the pattern is \Rt ) R^t0 ) R^t1 ) R^t". The last column
in each table contains the counter. The content of the relations is computed in
a bottom-up manner as described above. The nal result jQ(I)j = 9 can be read
o at the root node.</p>
      <p>Theorem 1. Suppose that we only consider ACQs Q : ans(x) (x) without
nondistinguished variables. Then for query Q with join tree T over schema R
and instance I, #CQ can be solved in time O(jQj (maxt2T jRtj)2), assuming
unit cost for arithmetic operations.</p>
      <p>Proof (sketch). First of all, it must be shown that the algorithm is indeed correct.
This will be done implicitly in Section 4, where we present an algorithm for
counting the number of solutions to an ACQ with nondistinguished variables
and prove its correctness. As the algorithm presented above will turn out to be
a special case of the one in Section 4, for the correctness we refer to the proof
of Theorem 3.</p>
      <p>It therefore remains to show the upper bound on the runtime of the algorithm:
First of all, note that at each t 2 T , R^t; (Xt) Rt. Hence, at each node t 2 T ,
the space required to store R^t is obviously in O(jRtj). On the other hand, by
de nition, the size of a join tree of Q is linear in the size of Q. Hence the space
required to store R^t for all t 2 T is trivially O(jQj maxt2T jRtj). Therefore, as
even the naive implementation (with nested loops) of the semi-join needs only
quadratic time, by assuming unit cost for the arithmetic operations, the runtime
for the algorithm is in O(jQj (maxt2T jRtj)2). tu</p>
    </sec>
    <sec id="sec-4">
      <title>ACQs with Nondistinguished Variables</title>
      <p>
        For deciding whether a CQ has some solution, it makes no di erence whether it
contains only free variables, or additional nondistinguished ones. As the goal is
just to nd one assignment on all variables that maps the query into the instance,
all variables can be treated in the same way. In contrast, for the counting
problem, this is no longer true, as only variable assignments that di er on the free
variables also give di erent solutions. On the other hand, variable assignments
that only di er on the existentially quanti ed variables must not be counted
twice. As already noted in the introduction, under reasonable complexity
theoretical assumptions, in the general case, counting becomes harder for queries
with nondistinguished variables [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this section, we show that a similar
behaviour can be observed for ACQs: First we will show that the problem remains
tractable for data- and query complexity, and then we show #P-completeness
for the combined complexity.
      </p>
      <p>
        Given an ACQ Q : ans(x) 9y (x; y), the naive approach for counting
the number of solutions over some instance I is to check for each possible
variable assignment : x ! dom(I), whether the (Boolean) ACQ Q0 : ans()
9y ( (x); y) is satis ed over I. As Q0 is acyclic, the check can be done in time
O(jQj jIj) [
        <xref ref-type="bibr" rid="ref19 ref8">19, 8</xref>
        ] which immediately leads to tractable data complexity.
Theorem 2. Suppose that we only consider ACQs Q : ans(x) 9y (x; y) with
nondistinguished variables y. Then for query Q over schema R and instance I,
#CQ can be solved in time O(jdom(I)jjxj jQj jIj).
      </p>
      <p>When considering query complexity, tractability can be reached by some major
extensions of the algorithm from the previous section. In the following, we will
sketch the necessary changes. Recall that in the last section, given a CQ Q
with join tree T on some schema R with instance I, the idea was, at each
t 2 T , to de ne a footprint (small enough to be stored) for every assignment
: XTt ! dom(I), and to keep track of the number of partial solutions that
have this footprint. Now we have to deal with assignments = x [ y with
: XTt [YTt ! dom(I), where x and y are the restrictions of to XTt and YTt
resp., and Yt and YTt are de ned on the nondistinguished variables analogously to
Xt and XTt . Therefore we have fpr ( x; y; t) = ( x; y) again as the restriction
of x (resp. y) to Xt (resp. Yt). However, this time the problem is that the
same partial solution x may, with di erent y, have di erent footprints. Hence,
if we de ne the sets N (Tt; t; x; y) of partial assignments (on the distinguished
variables x) analogously as before, these sets are not necessarily disjoint. We
therefore have to group footprints by x, and identify sets of partial solutions
by a combined footprint, which is a set of footprints f( x; y1); : : : ; ( x; yn)g. For
convenience, we may also write ( x; I) with I = f y1; : : : ; yng to denote the
combined footprint.</p>
      <p>For a subtree Tt at node t 2 T and a combined footprint ( x; I) at node t,
we de ne N (Tt; t; x; I) as the set of mappings x : XTt ! dom(I) such that
(1) for every i 2 f1; : : : ; jIjg, there exists a y : YTt ! dom(I) s.t.</p>
      <p>(a) fpr ( x; y; t) = ( x; yi), and (b) (QTt ) I (where = x [ y)
(2) for all y : YTt ! dom(I) s.t. (QTt ) I (where = x [ y) holds, there
exists a j 2 f1; : : : ; jIjg s.t. fpr ( x; y; t) = ( x; yj)
As mentioned above, we cannot a ord to store the set N (Tt; t; x; I) at node
t 2 T . Hence, we only store ( x; I) together with jN (Tt; t; x; I)j, since we are
only interested in the number of such assignments. Note that ( x; I) ts into a
table with jXtj + jYtj columns (i.e. with schema (Xt; Yt)), containing one row i
for every yi 2 I, storing x in the rst jXtj columns, and yi in the last jYtj
columns. Hence at every node t 2 T , we maintain a set R^ Tt = fS^1; : : : ; S^ng (for
some n 2 N; an upper bound on n will be given below) of relations of the above
form. In addition, we maintain for each relation S^j 2 RTt a counter ctj storing
^
jN (Tt; t; x; I)j. Actually, it is convenient to write R^ t short for R^ Tt . Only for
the correctness proof (in the full version) it will be advantageous to explicitly
index R^ by a subtree Tt rather than just by a node t. Intuitively, every relation
S^j 2 R^ t encodes a set of mappings x : XTt ! dom(I), while each row in S^j
encodes a di erent set of mappings y : YTt ! dom(I) for this x, s.t. for each
= x [ y, (QTt ) I holds. Note that within each relation, each row has the
same value for Xt, and several relations may contain the same x.</p>
      <p>The content of R^ t (for t 2 T ) is again computed in two phases. In an
initialisation phase, rst for every leaf node t 2 T , R^ t contains exactly one relation
for every di erent value x for Xt in Rt, containing one row ( x; y) for every
value y for Yt s.t. ( x; y) 2 Rt. The counter for each such relation is set to
one. Then, in a bottom-up traversal of the tree, for every t 2 T with children
t1; : : : ; tk s.t. R^ ti has already been computed, we again start with some R^ t0 that
is de ned like R^ t0 for leaf nodes t0 2 T , and then include one child after the other
as follows: For i 2 f1; : : : ; kg, let R^ it contain all nonempty results S^1 n S^2 for all
pairs (S^1 2 Rt ^</p>
      <p>^ i 1; S^2 2 Rti ). If S^1 n S^2 is nonempty, the value of the counter
for the resulting table is the product of the counters for S^1 and S^2. Whenever
several such pairs give the same result, only one copy of the relations is stored,
and their counters are summed up. The number of solutions to Q is retrieved by
summing up the counters for all S^j 2 Rr at the root node r of T .</p>
      <p>^
Example 2. Recall the instance I and query Q from Example 1. We will use the
same instance I, and change Q to Q0 only by removing x3 from the free variables.
(For convenience, we rename it to y.) I.e., Q0 : ans(x1; x2; x4; x5; x6) 9yR3(y)^
R4(x2; x4; y) ^ R1(x1; x2; y) ^ R2(x2; y) ^ R2(x5; x6). We use the same join tree as
before. It is shown in Fig. 2, where again each node t 2 ft1; : : : ; t5g is annotated
with Qt. Recall that to deal with nondistinguished variables, we have to maintain
sets R^ t of relations. At each node t, R^ t is depicted together with its derivation
sequence: The pattern shown at the leaf nodes is \Rt ) R^ t", and for the inner
nodes it is \Rt ) R^ t0 ) R^ t1 ) R^ t". Thereby, the rst row in each table shows the
schema, followed by the relations S^j 2 R^ it separated by double horizontal lines.
In the rst row for each relation S^j , the last column shows the counter for S^j .
For instance, at node t3, we have two relations S^1 = f(s1; c1; b1); (s1; c1; b2)g and
S^2 = f(s1; c1; b2)g in R^ t. Note that S^1 and S^2 refer to two di erent extensions
of x(x1; x2) = (s1; c1) to x(x1; x2; x4). Indeed, S^1 corresponds to x(x4) = a1
while S^2 corresponds to x(x4) = a2. Again, the nal result jQ0(I)j = 6 can be
read o at the root node.</p>
      <p>t1 : R2(x2; y)
x2 y c x2 y c
c1 b2 1 c1 b2 3
) c1 b1 ) c1 b1
c4 b6 1 c4 b6 3</p>
      <p>Theorem 3. Suppose that we only consider ACQs Q : ans(x) 9y (x; y) with
nondistinguished variables y. Then for query Q with join tree T over schema R
and instance I, #CQ can be solved in time O(jQj m2 4m) (assuming unit cost
for arithmetic operations), where m = maxt2T jRtj.</p>
      <p>Proof (sketch). The details of the algorithm and the correctness proof will be
given in the full version. We only discuss the runtime here. Note that at some
node t 2 T , the values for Xt need not be di erent between two di erent relations
S^i; S^j 2 R^ t. Hence for every t 2 T , there might be up to 2m di erent elements
in R^ t, each of size O(m) (obviously, S^i Rt for every S^i 2 R^ t), where m =
maxt2T jRtj. Hence there are at most O((2m)2) semi-joins between two nodes,
each requiring time at most O(m2) even for a naive implementation, which gives
an overall runtime estimation of O(jQj m2 (2m)2). tu</p>
      <p>Theorem 2 and Theorem 3 immediately imply that #CQ can be solved
efciently for data- and query complexity. The natural next question is, whether
there exists also an algorithm that solves the problem e ciently when considering
combined complexity. However, the next theorem shows that such an algorithm
is very unlikely to exist.</p>
      <p>Theorem 4. Suppose that we only consider ACQs Q : ans(x) 9y (x; y)
with nondistinguished variables y. Then the combined complexity of #CQ is
#P-complete. The problem remains #P-complete, even if Q contains a single
nondistinguished variable only.</p>
      <p>
        Proof (sketch). Membership is easy to show. The hardness is shown by reduction
from the #P-complete problem #PERFECT MATCHINGS [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], i.e. from the
problem of counting the number of perfect matchings in a bipartite graph. A
perfect matching for a bipartite graph G = (V; E) is a subset M E s.t.
every v 2 V is contained in exactly one e 2 M . Let an arbitrary instance of
#PERFECT MATCHINGS be given by a bipartite graph G = (V; E), where
V = A [ B with A = fa1; : : : ; ang, B = fb1; : : : ; bng, and E A B. From this,
we create a database instance I over the schema S = fe=2; t=2g as follows:
I = fe(ai; bj ) j for each (ai; bj ) 2 Eg [ Sn
i=1ft(i; x) j x 2 B n fbigg,
where, by slight abuse of notation, we use ai and bi to denote both, vertices in
V and constants in I. We further de ne two conjunctive queries Q1, Q2 as
Q1 : ans(x1; : : : ; xn) Vyin=V1ne(ai; xi)
Q2 : ans(x1; : : : ; xn) 9 i=1 e(ai; xi) ^ Vin=1 t(y; xi)
Note that a perfect matching is equivalent to a bijective mapping from A to B
along the edges in E. The idea of the reduction is that Q1 (over I) selects all
possible mappings of nodes from A to nodes from B, while Q2 selects all those
mappings where at least one element from B is not assigned to an element from
A. Hence, Q1(I)nQ2(I) gives exactly the set of perfect matchings, and because of
Q2(I) Q1(I), jQ1(I)j jQ2(I)j returns the number of perfect matchings. Note
that this is a special case of a Turing reduction, called subtractive reduction [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Obviously, this reduction is feasible in polynomial time. Its correctness as
well as the membership will be proved in the full version.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Extensions</title>
      <p>
        Queries with bounded hypertree-width. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] it was shown that testing
if the result of a CQ is nonempty (or whether the result contains a certain tuple)
is tractable for queries with bounded hypertree-width. Below we also generalise
the tractability of #CQ on ACQs to queries with bounded hypertree-width.
In this section we discuss some extensions and applications of our main results.
Unions of conjunctive queries. For unions of conjunctive queries (UCQs), it
is well known that the decision problem is not harder to solve than for CQs. In
contrast, for the counting problem, even if we consider UCQs with distinguished
variables only, we encounter a similar problem as for CQs with nondistinguished
variables: the same answer could be produced in di erent ways. For UCQs this
means that the same answer could appear in several disjuncts. In fact, a slight
modi cation of the proof of Theorem 4 yields the following intractability result.
Theorem 5. Suppose that we only consider unions of acyclic conjunctive queries
ans(x) Win=1 i(x) without nondistinguished variables. Then the combined
complexity of the counting problem of such UCQs is #P-complete.
Proof (idea). For the hardness, recall the hardness part of the proof of
Theorem 4. We use the same reduction as described there, only that instead of a single
relation t=2, we use n relations ti=1, and we de ne Q2 as Q2 : ans(x1; : : : ; xn)
Wjn=1 Vin=1 e(ai; xi) ^ Vin=1 tj (xi) . We de ne instance I also as before, but
instead of the content of t, we have Sn
      </p>
      <p>j=1ftj (x) j x 2 B n fbj gg.</p>
      <p>Intuitively, we expand the query by instantiating the existentially quanti ed
variable in Q2, and create one disjunct for every instantiation. Hence, for the
correctness, analogous arguments as in the proof of Theorem 4 apply.
tu
tu
Theorem 6. Suppose that we only consider CQs Q whose hypertree-width is
bounded by some constant. Then the following holds: If Q contains no
nondistinguished variables, then #CQ is polynomial time solvable (combined complexity).</p>
      <p>On the other hand, if Q contains nondistinguished variables, then solving
#CQ is feasible in polynomial time (data complexity and query complexity),
while the problem is #P-complete (combined complexity).</p>
      <p>
        Proof (idea). In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] it was shown that, given a Boolean CQ Q with
hypertreewidth ! (for some constant !) over some instance I, an equivalent ACQ Q0 over
an instance I0 can be e ciently constructed, such that the combined size of Q0
and I0 is in O(jQj m!), where m = maxSi2I jSij, and s.t. Q(I) returns true
i Q0(I0) returns true. It is easy to verify that this construction preserves the
number of solutions. Actually, even Q(I) = Q0(I0) holds. Hence applying the
results from the previous sections to Q0 and I0 proves the case.
tu
Counting under bag semantics. So far, we have only considered counting
under set semantics. However, by a slight change of the initialisation step, the
algorithm presented in Section 3 works for counting under bag semantics as
well. Further, note that the proof of Theorem 4 heavily depends on assuming
set semantics for the query. In fact, we can show the following result.
Theorem 7. Suppose that we only consider ACQs. Then, for query Q with join
tree T over schema R and instance I, #CQ w.r.t. bag semantics can be solved
in time O(jQj maxt2T jRtj2) (assuming unit cost for arithmetic operations), no
matter whether Q contains nondistinguished variables or not.
      </p>
      <p>Proof (sketch). First, note that under bag semantics, there is no need to make a
distinction between distinguished- and nondistinguished variables: In Section 4,
the problem was to avoid multiple counting of the same solution if it can be
derived in several ways. However, under bag semantics, this is exactly what we
want to do. Hence, we can just consider all variables in Q as free.</p>
      <p>It therefore remains to show that the algorithm from Section 3 can be adapted
to bag semantics. In fact, the only thing that needs to be changed is in the
initialisation phase: Recall that for every distinct tuple 2 Rt, we add a tuple
( ; 1) to R^t. All that needs to be changed is to replace 1 by the number of
occurrences of 2 Rt. It can be easily checked that the remaining algorithm can
be left unchanged.
tu
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], we started to look for a tractable fragment of the #CQ problem via
the tree-width of the query. In the current paper, we initiated a systematic
study of tractable fragments of #CQ, considering ACQs and CQs with bounded
hypertree-width (which is a more general measure than tree-width). We have
presented new algorithms based on semi-joins which allowed us to prove several
new tractability results. We also identi ed limits to this search for tractable
fragments by proving the #P-completeness of the #CQ problem for arbitrary
acyclic conjunctive queries. Our results apply to the count operator, which is an
integral part of the SQL language. In the future, we want to extend the search
for tractable fragments to other aggregate functions (like minimum, maximum,
sum, average) and to the group by construct in SQL.
      </p>
      <p>
        Another interesting extension of [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] would be the search for a dichotomy
result in the style of [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Further, extending the results from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] concerning
the expressiveness of counting the answers of CQs to ACQs, and addressing the
complexity related questions raised there is another direction for future work.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Bauland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chapdelaine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Creignou</surname>
          </string-name>
          , M. Hermann, and
          <string-name>
            <given-names>H.</given-names>
            <surname>Vollmer</surname>
          </string-name>
          .
          <article-title>An algebraic approach to the complexity of generalized conjunctive queries</article-title>
          .
          <source>In SAT 2004 - Revised Selected Papers</source>
          , volume
          <volume>3542</volume>
          <source>of LNCS</source>
          , pages
          <volume>30</volume>
          {
          <fpage>45</fpage>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Bielecki</surname>
          </string-name>
          and
          <string-name>
            <surname>J. V.</surname>
          </string-name>
          den Bussche.
          <article-title>Database interrogation using conjunctive queries</article-title>
          .
          <source>In Proc. ICDT</source>
          <year>2003</year>
          , pages
          <fpage>259</fpage>
          {
          <fpage>269</fpage>
          . Springer,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Chandra</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Merlin</surname>
          </string-name>
          .
          <article-title>Optimal implementation of conjunctive queries in relational data bases</article-title>
          .
          <source>In Proc. STOC</source>
          <year>1977</year>
          , pages
          <fpage>77</fpage>
          {
          <fpage>90</fpage>
          . ACM,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Chekuri</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajaraman</surname>
          </string-name>
          .
          <article-title>Conjunctive query containment revisited</article-title>
          .
          <source>Theor. Comput. Sci.</source>
          ,
          <volume>239</volume>
          (
          <issue>2</issue>
          ):
          <volume>211</volume>
          {
          <fpage>229</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>N.</given-names>
            <surname>Creignou</surname>
          </string-name>
          and
          <string-name>
            <surname>M. Hermann.</surname>
          </string-name>
          <article-title>Complexity of generalized satis ability counting problems</article-title>
          .
          <source>Inf. Comput.</source>
          ,
          <volume>125</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>12</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>A.</given-names>
            <surname>Durand</surname>
          </string-name>
          , M. Hermann, and
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Kolaitis</surname>
          </string-name>
          .
          <article-title>Subtractive reductions and complete problems for counting complexity classes</article-title>
          .
          <source>Theor. Comput. Sci.</source>
          ,
          <volume>340</volume>
          (
          <issue>3</issue>
          ):
          <volume>496</volume>
          {
          <fpage>513</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          .
          <article-title>Degrees of acyclicity for hypergraphs and relational database schemes</article-title>
          .
          <source>J. ACM</source>
          ,
          <volume>30</volume>
          (
          <issue>3</issue>
          ):
          <volume>514</volume>
          {
          <fpage>550</fpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Flum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Frick</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Grohe</surname>
          </string-name>
          .
          <article-title>Query evaluation via tree-decompositions</article-title>
          .
          <source>J. ACM</source>
          ,
          <volume>49</volume>
          (
          <issue>6</issue>
          ):
          <volume>716</volume>
          {
          <fpage>752</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>G.</given-names>
            <surname>Gottlob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leone</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Scarcello</surname>
          </string-name>
          .
          <article-title>A comparison of structural CSP decomposition methods</article-title>
          .
          <source>Artif. Intell.</source>
          ,
          <volume>124</volume>
          (
          <issue>2</issue>
          ):
          <volume>243</volume>
          {
          <fpage>282</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. G. Gottlob,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leone</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Scarcello</surname>
          </string-name>
          .
          <article-title>The complexity of acyclic conjunctive queries</article-title>
          .
          <source>J. ACM</source>
          ,
          <volume>48</volume>
          (
          <issue>3</issue>
          ):
          <volume>431</volume>
          {
          <fpage>498</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. G. Gottlob,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leone</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Scarcello</surname>
          </string-name>
          .
          <article-title>Hypertree decompositions and tractable queries</article-title>
          .
          <source>J. Comput. Syst. Sci.</source>
          ,
          <volume>64</volume>
          (
          <issue>3</issue>
          ):
          <volume>579</volume>
          {
          <fpage>627</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>M.</given-names>
            <surname>Graham</surname>
          </string-name>
          .
          <article-title>On the universal relation</article-title>
          .
          <source>University of Toronto technical report</source>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>M. Grohe</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Schwentick</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <article-title>Segou n. When is the evaluation of conjunctive queries tractable?</article-title>
          <source>In Proc. STOC</source>
          <year>2001</year>
          , pages
          <fpage>657</fpage>
          {
          <fpage>666</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. P. G. Kolaitis and
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Vardi</surname>
          </string-name>
          .
          <article-title>Conjunctive-query containment and constraint satisfaction</article-title>
          .
          <source>J. Comput. Syst. Sci.</source>
          ,
          <volume>61</volume>
          (
          <issue>2</issue>
          ):
          <volume>302</volume>
          {
          <fpage>332</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Papadimitriou</surname>
          </string-name>
          .
          <article-title>Computational complexity</article-title>
          .
          <source>Addison-Wesley</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skritek</surname>
          </string-name>
          .
          <article-title>The complexity of evaluating tuple generating dependencies</article-title>
          .
          <source>In Proc. ICDT</source>
          <year>2011</year>
          , pages
          <fpage>244</fpage>
          {
          <fpage>255</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>N.</given-names>
            <surname>Robertson</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. D.</given-names>
            <surname>Seymour</surname>
          </string-name>
          .
          <article-title>Graph minors. ii. algorithmic aspects of treewidth</article-title>
          .
          <source>J. Algorithms</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <volume>309</volume>
          {
          <fpage>322</fpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>L. G. Valiant.</surname>
          </string-name>
          <article-title>The complexity of enumeration and reliability problems</article-title>
          .
          <source>SIAM J. Comput.</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <volume>410</volume>
          {
          <fpage>421</fpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>M.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          .
          <article-title>Algorithms for acyclic database schemes</article-title>
          .
          <source>In Proc. VLDB</source>
          <year>1981</year>
          , pages
          <fpage>82</fpage>
          {
          <fpage>94</fpage>
          . IEEE Computer Society,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ozsoyoglu</surname>
          </string-name>
          .
          <article-title>An algorithm for tree-query membership of a distributed query</article-title>
          .
          <source>In Proc. COMPSAC 1979w</source>
          , pages
          <fpage>306</fpage>
          {
          <fpage>312</fpage>
          . IEEE,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>