Bag Equivalence of Bounded Symmetry-Degree
     Conjunctive Queries with Inequalities

                       Mingmin Chen and Todd J. Green

                         Department of Computer Science
                          University of California, Davis
                              Davis, CA 95616 USA
                         {chenmi,green}@cs.ucdavis.edu


      Abstract. We consider the problem of checking equivalence of conjunc-
      tive queries with inequalities under bag (multiset) semantics. The prob-
      lem is known to be decidable in pspace and as hard as graph isomor-
      phism, but its exact complexity remains open. We introduce a natural
      restriction based on what we call the symmetry degree of queries, and
      show that when the symmetry degree is bounded by a ﬁxed polyno-
      mial in the size of the query, the equivalence problem is in Π2p . We also
      show that for asymmetric queries (those of symmetry degree 1), check-
      ing bag-equivalence is polynomial-time Turing equivalent to the graph
      automorphism problem. These results can be interpreted as ﬁrst steps
      in a more general program of ﬁnding “islands of tractability” for testing
      equivalence of conjunctive queries with inequalities under bag (rather
      than set) semantics.


1   Introduction
We study the problem of checking equivalence of conjunctive queries (CQs) with
built-in inequality predicates over a densely-ordered domain under bag (multiset)
semantics. This problem is well-understood assuming the classical set semantics,
where it is known that the complexity jumps from NP-complete for ordinary
CQs [5] to Π2p -complete when inequalities are allowed [23, 29]. For bag semantics
(implemented by most DBMSs, and hence of practical as well as theoretical
interest), the problem is comparatively less-well understood: it is known to be
as hard as graph isomorphism [6] and decidable in pspace [27], but the exact
complexity is unknown (and seems to be difficult to resolve).
    In this paper, we introduce a natural restriction on the structure of CQs called
bounded symmetry degree (a bound on the number of query automorphisms)
and use this notion to study the complexity of the bag equivalence problem.
The notion is adapted from the well-known concept of graph symmetry studied
extensively in the graph theory literature [17, 21, 25, 18]. We show that if the
symmetry degree of CQs with inequalities is bounded, then bag equivalence can
be decided in Π2p (and we also show that checking for bounded symmetry degree
can be done in Π2p ). In fact, we show this holds even when we relax the restriction
to allow queries of polynomially bounded symmetry degree (see Section 3.1). We
2

also consider several stronger forms of symmetry degree bounds, and establish
that for asymmetric CQs, bag equivalence of CQs with inequalities reduces to
checking graph automorphism, while a bound on the number of self-joins in the
queries yields a coNP upper bound. We also compare the notion of bounded
symmetry degree with the standard notion of bounded hypertree width, which
turns out to be orthogonal and not obviously applicable to the setting of bag
semantics. These results may be interpreted as first steps in a more general
program of finding “islands of (relative) tractability” for testing equivalence of
CQs with inequalities under bag (rather than set) semantics.

Outline. Section 2 contains preliminaries and a discussion of related work. Sec-
tion 3 defines the notion of bounded symmetry degree, establishes some basic
results, and compares it with bounded hypertree width. Section 4 presents the
main results on equivalence, for bag and bag-set semantics. We conclude in Sec-
tion 5. The Appendix contains formal proofs of our results.


2    Preliminaries and Related Work

Conjunctive queries with inequalities. Following the standard terminology [1],
a conjunctive query (CQ) is a safe, nonrecursive datalog rule whose body is a
conjunction of relational atoms. A conjunctive query with inequalities (CQ< ) is
a conjunctive query extended with a conjunction of inequality atoms using <, ≤,
and 6=. We use uppercase letters for predicate symbols, x, y, z, . . . for variables,
a, b, c, . . . for constants, and t, u, v, . . . for variable or constant terms, with over-
bars x̄, ā, t̄, etc. indicating lists. We write a CQ< Q as Q(x̄) :– R, C where R,
the relational part, is a conjunction of relational atoms and C, the conditions, is
a conjunction of inequalities x θ t, θ ∈ {<, ≤, 6=}. We denote by QR the CQ ob-
tained from Q by deleting C. For example, if Q is Q(x, y) :– T (x, y), S(x, y), x >
y, y 6= 2, then QR is QR (x, y) :– T (x, y), S(x, y).

Semantics of queries. We fix a densely-ordered domain of constants which, for
concreteness, we will assume is that of the rationals Q. A set relation of arity k is
a finite set R of tuples from Qk . A bag (multiset) relation of arity k is a mapping
R : Qk → N of tuples to their associated multiplicities, where the mapping is
non-zero on only finitely many tuples. A set (resp., bag) database instance I is
an assignment of set (resp., bag) relations RI to predicate names R.
    The semantics of CQ< s on set and bag relations is based on the notion of
valuations. A valuation is mapping ν of variables to constants, extended to map
constants to themselves. Valuations operate pointwise on lists/tuples and atoms
in the expected way. Let Q be a CQ Q(t̄) :– R1 (t¯1 ), . . . , Rn (t¯n ), C and let I be
a set instance. The result of evaluating Q on I under set semantics is the set
relation JQKI defined

            JQKI = {ν(t̄) | ν |= C and ν(t¯1 ) ∈ R1I ∧ · · · ∧ ν(t¯n ) ∈ RnI }
                                                                                        3

Now suppose I is a bag instance. The result of evaluating Q on I under bag
semantics is the bag relation JQKIb defined
                               X
                  JQKIb (ā) =   (R1I (ν(t¯1 )) × · · · × RnI (ν(t¯n )))
                                 ν

where the sum is over all valuations ν such that ν(t̄) = ā and ν |= C. Finally,
suppose I is a set instance. The result of evaluating Q on I under bag-set se-
mantics is the bag relation JQKIbs obtained by viewing I as a (duplicate-free) bag
relation, and evaluating Q on I under bag semantics.
    A CQ< Q is satisfiable (or consistent [23]) if there exists a set (or, equiva-
lently, a bag) database instance I such that JQKI 6= ∅. Every CQ is satisfiable,
and a CQ< is satisfiable iff its comparison part is satisfiable. The latter can be
checked in linear time using an algorithm in [4]. Here, we focus only on satisfiable
queries, and we do not allow equality atoms x = t (since, for satisfiable queries,
such atoms can always be eliminated by replacing each occurrence of x with t).

Containment mappings and isomorphisms. Let Q(x̄) :– R, C and Q′ (x̄′ ) :– R′ , C ′
be two CQ< s, and denote by V(Q) and V(Q′ ) the sets of variables and constants
that occur in Q and Q′ , respectively. A containment mapping [5] (or homomor-
phism) from Q′ to Q is a mapping h : V(Q′ ) → V(Q), lifted to operate on atoms
and conjunctions of atoms in the obvious way, such that:
 1. h(t̄′ ) = t̄.
 2. h is the identity mapping on constants.
 3. h(R′ ) ⊆ R.
 4. C |= h(C ′ ), i.e. C |= h(a′ ) θ h(b′ ) if a′ θ b′ is in C ′ where θ ∈ {<, ≤, 6=}
A homomorphism h is called an isomorphism if h is bijective and its inverse is
also a homomorphism. Note that Q, Q′ are isomorphic via mapping h iff QR and
Q′R are isomorphic and C and C ′ are logically equivalent modulo h. Given a can-
didate isomorphism mapping, the latter condition can be checked by comparing
the transitive closures of C and C ′ (see Appendix for details).
    An endomorphism (resp., automorphism) is a homomorphism (resp., isomor-
phism) from a CQ< to itself. For a CQ< Q, the set of automorphisms of QR ,
denoted by Aut(QR ), forms a subgroup of the symmetric group on V(Q). The
identity element in this group is the identity mapping. Similarly, the set of en-
domorphisms of QR , denoted by End(QR ), forms a monoid.

Linearizations and linear expansions. We use the notions of linearizations and
linear expansions of CQ< s due to Nutt et al. [27], which are based on the stan-
dard notion of linear extensions for partially ordered sets. Let C be a set of
inequalities (<, ≤, 6=) over a set T of variables and constants. A linearization of
C is a total order L over T that is compatible with C. (Note that L may equate
distinct variables, when this is allowed by C.)
    If L is a linearization of C, and C is the condition of a CQ< Q, then we
denote QL the CQ< obtained from Q by replacing C with L, and we say that
4

QL is a linearization of Q. By default, we do not eliminate equalities in lineariza-
tions unless explicitly pointed out. (Thus, the condition for a linearization is a
conjunction of inequalities of the form x θ t, θ ∈ {<, =}.) The linear expansion
of Q is the bag lin(Q) of all linearizations of Q (it can be viewed as a union of
CQ< s). We say that linear expansions lin(Q) and lin(Q′ ) are isomorphic if there
is a bijection f : lin(Q) → lin(Q′ ) such that QL is isomorphic to f (QL ) for every
QL ∈ lin(Q).

Known results on equivalence. The equivalence problem for conjunctive queries
with or without inequalities under set semantics, bag semantics, bag-set seman-
tics have been studied extensively, beginning with the seminal paper by Chan-
dra and Merlin [5] which established the NP-completeness of checking contain-
ment/equivalence of CQs under set semantics using a characterization in terms
of containment mappings. Containment/equivalence of CQ< queries was shown
to be in Π2P by Klug [23] and Π2P -hard by van der Meyden [29].
    The papers by Chaudhuri and Vardi [6] and Ioannidis and Ramakrishnan [20]
initiated the study of query containment and equivalence under bag/bag-set se-
mantics. The former paper showed that two CQs are bag-equivalent iff they are
isomorphic, which yields a logspace equivalence of the problem with the graph
isomorphism problem (GI). The latter paper showed that containment of unions
of CQs is undecidable via a reduction from Hilbert’s Tenth Problem. The decid-
ability of bag containment of CQs remains an open problem. (A recent paper
by Afrati et al. [2] gives positive decidability results for certain classes of CQs.)
Bag containment of CQ< s, on the other hand, was shown to be undecidable by
Jayram et al. [22].
    Checking bag-set equivalence of CQ< s was shown to be decidable by Nutt et
al. [27] (Cohen [8] extends it to bag semantics), a result we shall refer to again
in the sequel:
Theorem 1 ([27, 8]). Two CQ< s Q, Q′ are bag (bag-set) equivalent if and only
if they have isomorphic linear expansions, which can be checked in pspace.
This theorem also holds for bag-set equivalence for CQ< s. The pspace upper
bound seems unlikely to be tight: in fact, using their characterization of equiva-
lence in terms of linear expansions, it is not hard to show that the upper bound
can be sharpened somewhat to coNP#P. (coNP#P is contained in pspace, but it
is not known whether the containment is strict.) However, the exact complexity
of the problem remains open.
    The following simple variation on an example given in [27] shows that non-
isomorphic CQ< s may indeed have isomorphic linear expansions, hence isomor-
phism is not a necessary condition for bag equivalence of CQ< s:

Example 1. Let ϕ(x, y, z) be the conjunction of relational atoms R(x, y, z), R(x, z, y),
R(y, x, z), R(y, z, x), R(z, x, y), R(z, y, x). Now consider two Boolean CQs<
Q, Q′ defined
                        Q() :– ϕ(x, y, z), x < z, y < z, x 6= y
                       Q′ () :– ϕ(x, y, z), x < y, x < z, y 6= z
                                                                                     5

The linear expansions of Q and Q′ are
                       Q1 () :– ϕ(x, y, z), x < z, y < z, x > y
                       Q2 () :– ϕ(x, y, z), x < z, y < z, x < y

                       Q′1 () :– ϕ(x, y, z), x < y, x < z, y > z
                       Q′2 () :– ϕ(x, y, z), x < y, x < z, y < z
Observe that Q1 is isomorphic to Q′1 and Q2 is isomorphic to Q′2 . By Theorem
1, Q and Q′ are bag-equivalent since they have isomorphic linear expansions.
However, Q and Q′ are not isomorphic.

3     Symmetry
3.1   Motivation and basic definitions
A CQ< is linear [9, 27] if its body contains no two occurrences of the same
relational predicate symbol (in other words, self-joins are not allowed). For linear
CQ< , an easy corollary of Theorem 1 (pointed out in [27]) is that bag equivalence
can be checked in polynomial time.
    In this section, we introduce a generalization of this restriction based on
what we term the symmetry degree of a query. This is a natural idea derived
from the well-studied notion of the symmetry degree of a graph (see, e.g., [17,
21, 25, 18]), but it does seem to have been applied in database theory before. An
automorphism of a directed graph G = (V, E) is a permutation σ of the vertex
set V, such that for any edge e = (u, v), σ(e) = (σ(u), σ(v)) is also an edge.
The automorphism group of G contains all these automorphisms. The order of
the automorphism group of G is called its symmetry degree, denoted sym(G). For
queries, the notion is exactly analogous: we define the symmetry degree of a CQ<
Q, sym(Q), to be the order of its automorphism group Aut(QR ).
Example 2. To illustrate, consider the Boolean CQs below:
                  Q1 () :– P (x1 , x2 ), P (x2 , x3 ), . . . , P (xn−1 , xn )
                  Q2 () :– S(u, v), S(u, w)
                  Q3 () :– R(y1 ), . . . , R(yn )
Q1 has symmetry degree 1 (the variables must all stay put), Q2 has symmetry
degree 2 (v and w can be swapped), and Q3 has symmetry degree n! (any
permutation of the variables is an automorphism).
  We will examine the complexity of computing sym(Q) in Section 3.2. In the
meantime, we derive an easy bound based on counting self-joins.
Proposition 1. Suppose predicates p1 , . . . , pk each occur n1 , . . . , nk times, re-
spectively, in the body of a CQ Q. Define the number of self-joins of Q to be
n = n1 + · · · + nk − k. Then
                                          Y
                                          k
                            sym(Q) ≤            (ni !) ≤ (n + 1)!.
                                          i=1
6

In particular, the proposition implies that linear queries have symmetry degree
one. On the other hand, queries may have low symmetry degree but many self-
joins, e.g., Q1 in Example 2.
    We introduce some additional terminology to use in the remainder of the pa-
per. Q is called symmetric if Aut(QR ) is a non-trivial subgroup of the symmetric
group on V(Q), i.e. sym(Q) > 1. Q is called asymmetric if Aut(QR ) is trivial,
which only contains the identity, i.e. sym(Q) = 1. Q is called rigid if End(QR ) is
trivial.

Incidence graphs. A CQ has a natural interpretation as an edge-labeled directed
hypergraph. However, to apply results of interest from graph theory, we may
need to transform this hypergraph into an ordinary directed graph. Given a
CQ Q, we construct its incidence graph [7] G(Q) = (V, E, λ), a labeled directed
graph, as follows: V has a vertex labeled x for each variable x in Q, a vertex
labeled c for each constant c in Q, and a vertex labeled P for each occurrence of
predicate P in Q; and E contains an edge labeled n from vertex t to predicate
vertex P whenever t occurs as the nth argument in P . For example, the incidence
graph of CQ Q(x) :– T (x, y), S(x, z), S(z, 2) is shown below:

                                           1
                           x   v               Q   Q(x)
                                           1
                                       1
                           y   v   2           T   T(x,y)

                                           2
                           z   v               S    S(x,z)
                                           1
                                           2
                           2   2               S    S(z,2)


We introduce labels in the definition of incidence graph to ensure that the con-
struction preserves the symmetry degree1 of Q:

Proposition 2. For any CQ Q, we have sym(Q) = sym(G(Q)).


3.2   Complexity of Computing Symmetry Degrees

We next turn to the question of computing the symmetry degree of a CQ, starting
with the simpler problem of deciding whether a query is symmetric. Not sur-
prisingly, the latter problem reduces easily to the graph automorphism problem
(GA), which is the problem of checking whether a given graph has a non-trivial
automorphism.2
1
  The automorphism group of a labeled directed graph G = (V, E , λ) is the subgroup
  of the automorphism group of G ′ = (V, E ) which respects the labeling λ.
2
  As with GI, GA is in NP but not known or believed to be either NP-complete or
  in ptime. It is no harder than GI in the sense that the problem is polynomial-time
  many-one reducible to GI; however the converse relationship is at present unknown.
                                                                                  7

Proposition 3. Checking whether a CQ< (CQ) is symmetric is polynomial-
time many-one equivalent to the graph automorphism problem.
This leads, using results in [24], to a characterization of the complexity of com-
puting the symmetry degree:
Corollary 1. Computing the symmetry degree of a CQ< (or CQ) is polynomial-
time Turing equivalent to the graph isomorphism problem.
      The following is a direct corollary of a result in [3].

Corollary 2 (of Theorem 15 in [3]). If the number of self-joins in a CQ<
Q is bounded by some constant k, then there is an nO(k) time algorithm that
computes Aut(QR ) as a generating set in the symmetric group on V(Q).

   Finally, we consider testing for rigidity. Again our strategy is to reduce the
problem on CQ< queries to the analogous problem on graphs, where a result of
Goralcik et al. [11] shows as a special case that graph rigidity is coNP-complete.
Proposition 4. The problem of checking whether a CQ< (CQ) is rigid is coNP-
complete.

3.3     Symmetry Degree versus Hypertree Width
Gottlob et al. [12, 13] introduced the notion of hypertree width as a generalization
of the classical notion of acyclic queries, and proved that Boolean conjunctive
queries of bounded hypertree width can be evaluated efficiently (and that, as a
consequence, query containment under set semantics is also tractable).
    Symmetry degree and hypertree width seem to be essentially orthogonal con-
cepts: intuitively, hypertree width is in some sense a measure of cyclicity, rather
than symmetry. Moreover, a query with high symmetry degree may have small
hypertree width, such as the following CQ that computes a “star self-join” on
the first column of a relation R:

                        Q1 () :– R(x, y1 ), R(x, y2 ), . . . , R(x, yk )

Q1 has symmetry degree k! but is acyclic hence has hypertree width 1. At the
same time, a query with a high hypertree width may have a small symmetry
degree, such as the CQ below:

                                     ^
                                     k     ^
                                           k
                          Q2 () :–                  Rik−k+j (xi , xj )
                                     i=1 j=1,j6=i

(Q2 intuitively corresponds to a complete directed graph without self-loops on
k vertices where each edge is labeled with a distinct predicate name.) Q2 has
symmetry degree 1, but we can show that it has hypertree width ⌈k/2⌉.
   We are not aware of any work studying the interaction of hypertree width and
bag equivalence of queries. We also do not know of any work studying whether
8

bounded hypertree width leads to tractable query containment or equivalence
checks for CQ< s under the classical set semantics.
   A recent paper by Pichler et al. [28] showed that the the query complexity of
evaluating CQs under bag-set semantics becomes tractable when the CQs have
bounded treewidth. A natural question (which we leave open here) is whether
bounded symmetry degree also makes query evaluation under bag or bag-set
semantics tractable.


4      Main Results
We are now ready to present our results on the complexity of bag equivalence
of CQ< s having bounded symmetry degree (Section 4.1). We also discuss how
these bounds transfer to bag-set semantics (Section 4.2), and present an initial
result that applies these ideas to the query equivalence problem under classical
set semantics (Section 4.3).

4.1     Bag Equivalence
As already mentioned, Nutt et al. [27] showed that checking bag-equivalence of
CQ< s can be done in pspace, but the exact complexity of the problem remains
unclear. Here, we show that if the symmetry degree of the queries is bounded,
then the upper bound can be lowered to Π2p . In fact, we use a rather liberal
notion of “bounded” here where we fix a univariate polynomial P ahead of time
and say that a CQ< Q of length n has polynomially bounded symmetry degree if
sym(Q) ≤ P (n).
Theorem 2. Checking bag equivalence of CQ< s of polynomially bounded sym-
metry degree is in coNPGI , hence in Π2p .
      The proof of the theorem relies on the following technical lemma:
Lemma 1. Let Q be a CQ< of polynomially bounded symmetry degree, and let
QL be one of its linearizations. Then the number of linearizations in lin(Q) that
are isomorphic to QL can be computed in polynomial time with access to a GI
oracle.

Proof. The basic idea of the proof is as follows: first, we use the GI oracle to com-
pute Aut(QR ); second, we use Aut(QR ) to compute the number of linearizations
in lin(Q) that are isomorphic to QL .
    To compute Aut(QR ), we first compute the generating set of Aut(QR ). This
can be done in polynomial time using the GI oracle, since it is well-known that
the problem of computing the generating set of the automorphism group of a
labeled graph is polynomial-time Turing equivalent to GI [24, 19]. Now, since we
have assumed sym(Q) ≤ P (n), and we have the generating set for Aut(QR ), it
is clear that we can enumerate all the elements of Aut(QR ) in polynomial time.
    Next, let S denote the set of all linearizations in lin(Q) that are isomorphic
to QL . Our goal is to compute |S|. We observe that a linearization QL′ ∈ S iff
                                                                               9

h(L) = L′ for some h ∈ Aut(QR ) and L′ |= C, where C denotes the condition
part of Q. In other words, S = {h(QL ) | h ∈ Aut(QR ) and h(L) |= C}. Since
we have already computed Aut(QR ), we can therefore use it to compute S in
polynomial time, and return |S| as the final answer.                      ⊓
                                                                          ⊔
   Now we are ready to prove Theorem 2.
Proof. By Lemma 1, given a GI oracle and a linearization QL of Q, we can com-
pute in polynomial time the numbers of linearizations of Q and Q′ , respectively,
which are isomorphic to QL . If these two numbers are different, then it follows
by Theorem 1 that Q and Q′ are not bag-equivalent. Thus we can guess a non-
membership witness and verify it in polynomial time given a GI oracle, which
means this problem is in coNPGI . Since GI≤pT NP, it follows that the problem
is also in Π2p .                                                               ⊓
                                                                               ⊔
    We note that the pre-condition of Theorem 2 can certainly be checked in
Π2p , by Corollary 1. Also, we note that it is not hard to show that Lemma 1
and Theorem 2 can be generalized to show a Π2p upper bound for checking bag
equivalence of unions of CQ< s (UCQ< s). In Section 4.2, we will also generalize
it to apply to bag-set equivalence.
    Next we show some other (relatively) tractable subcases of checking bag
equivalence of CQ< s. Here, we show that if the number of self-joins is bounded
by some constant, we can push the upper bound down to coNP.
Theorem 3. If the numbers of self-joins in CQ< Q, Q′ are bounded by some
constant k, then the problem of checking whether Q and Q′ are bag equivalent is
in coNP.
Proof. Here wlog we assume that Q and Q′ have the same relational part, since
their relational parts are isomorphic by Theorem 1. As stated in Proposition 1,
the symmetry degree is bounded by (k + 1)! and we can enumerate the auto-
morphism group in constant time. For checking bag equivalence of Q and Q′ ,
we can guess a linearization QL . Initialize two sets SQ and SQ′ to be empty. By
applying the automorphism to QL one by one and see if the comparison part is
compatible with the comparison parts of Q and Q′ respectively and if yes add
it to the corresponding SQ or SQ′ . This can be done in polynomial time. If the
orders of these two sets are different, then Q and Q′ are not bag-equivalent,
since they have different numbers of linearizations that are isomorphic to QL .
In other words, we can guess a non-member witness and verify it in polynomial
time. Hence, this problem is in coNP.                                          ⊓
                                                                               ⊔
    By imposing various restrictions on the symmetry degree of queries, we can
establish other upper bounds on the complexity of the problem. The following
result is one example.
Theorem 4. Asymmetric CQ< s Q and Q′ are bag equivalent iff there is an iso-
morphism h from Q′R to QR such that h(C ′ ) |==| C. Moreover, bag equivalence
of asymmetric CQ< queries is polynomial-time Turing equivalent to the graph
automorphism problem.
10

Proof. By Theorem 1, Q, Q′ have isomorphic linear expansions. In other words,
there is a bijective mapping ψ : lin(Q) 7→ lin(Q′ ) such that QL is isomorphic to
ψ(QL ) for all QL ∈ lin(Q). Denote by ϕi the isomorphism mapping from QLi
to ψ(QLi ). Since Q, Q′ are asymmetric, there is only one isomorphism mapping
from QR to Q′R (Suppose there are two ϕ1 , ϕ2 , then ϕ1 ◦ ϕ−1  2   is a nontrivial
automorphism of QR , which is a contradiction). So, ϕi are all the same for any
QLi ∈ lin(Q). The isomorphism of Q and Q′ follows. The only problem is to
find out this only isomorphism mapping of their relational parts and check if
their comparisons are equivalent under this isomorphism. By [24], asymmetric
graph isomorphism is polynomial-time Turing equivalent to the graph automor-
phism problem. Hence, finding this isomorphism mapping from QR to Q′R is
polynomial-time Turing equivalent to the graph automorphism problem, so is
the bag equivalence problem.                                                    ⊓
                                                                                ⊔

4.2   Bag-Set Equivalence
Next we examine bag-set equivalence of CQ< s. Theorem 4 holds not just for bag
semantics, but also for bag-set semantics. Let us now look at extending Theorem
2 as well.
    We have the following lemma which is analogous to Lemma 1, but uses an
NP (rather than GI) oracle. Intuitively, we use this extra power to deal with the
fact that for bag-set semantics, when manipulating linearizations, we are forced
to reduce equalities (since this can have the effect revealing that an atom in the
body is actually a duplicate copy and should be removed).
Lemma 2. Let Q be a CQ< of polynomially bounded symmetry degree, and
let QL be one of its linearizations (with equalities reduced). The number of the
linearizations in lin(Q) that are isomorphic to QL can be computed in polynomial
time with access to a NP oracle.
   Using this lemma, we can establish the analog of Theorem 2 for bag-set
semantics.
Theorem 5. Checking bag-set equivalence of CQ< s of polynomially bounded
symmetry degree is in coNPN P = Π2p .

4.3   Set Equivalence
We have already seen in the preceding sections that bounded symmetry degree
makes bag (bag-set) equivalence problem of CQ< easier to tackle. We show in this
section a preliminary application of bounded symmetry to checking equivalence
under the classical set semantics. As mentioned earlier, Klug [23] and Van der
Meyden [29] showed that the containment/equivalent problem is Π2p -complete for
CQ< s, hence equivalence is in Π2p . Here we show that the complexity is lower
than Π2p if the CQ< s are rigid.
Theorem 6. Checking set equivalence of rigid CQ< s is polynomial-time Turing
reducible to the graph automorphism problem.
                                                                                    11

5   Conclusions and Future Work

We note that the equivalence results on bag semantics of CQ< s presented here
have wider applicability to any semantics in which CQ< equivalence is character-
ized by isomorphism of linearizations. This includes, for example, equivalence of
CQ< on databases annotated with certain kinds of provenance information [15],
as well as equivalence under the recently proposed Z-semantics [16].
    We conjecture that if only <, ≤ are allowed as inequalities, two CQ< s Q, Q′
are bag equivalent if and only if there is an isomorphism from Q′R to QR under
which the condition of Q′ is equivalent to that of Q.
    One possible future direction involves applying bounded symmetry degree to
study query containment. Although the decidability of bag containment of CQs
remains a longstanding open problem (Chaudhuri and Vardi [6] showed Π2P -
hardness of this problem), bag containment of CQ< s or of conjunctive queries
is known to be undecidable [20, 22]. A natural question is whether these unde-
cidability results hold even for queries of bounded symmetry degree. In fact, the
reduction from Hilbert’s Tenth Problem in [20] constructs a pair of UCQs each
of which is composed of asymmetric CQs, hence bag containment remains unde-
cidable even for unions of asymmetric CQs. The reduction of [22], on the other
hand, uses CQ< s of arbitrary symmetry degree, hence it remains open whether
containment of CQ< s of bounded symmetry degree is decidable.
    Finally, we would like to pin down the exact complexity of the general case of
bag equivalence of CQ< queries, which seems stubborn to resolve. The problem
is reminiscent to another open issue in the area, that of identifying the exact
complexity of checking bag equivalence of positive relational algebra queries
with inequalities. This class of queries is expressively equivalence to UCQ< , but
exponentially more concise, and here again the known lower and upper bounds
on the complexity are GI and pspace, respectively.


Acknowledgements. We thank the anonymous reviewers for their helpful com-
ments. This work was supported by NSF CAREER award IIS-1055107.


References

 1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley,
    1995.
 2. F. N. Afrati, M. Damigos, and M. Gergatsoulis. Query containment under bag and
    bag-set semantics. Inf. Process. Lett., 110:360–369, April 2010.
 3. V. Arvind and J. Köbler. On Hypergraph and Graph Isomorphism with Bounded
    Color Classes. STACS 2006, pages 384–395, 2006.
 4. N. Brisaboa, H. Hernández, J. Paramá, and M. Penabad. Containment of conjunc-
    tive queries with built-in predicates with variables and constants over any ordered
    domain. In Advances in Databases and Information Systems. Springer, 1998.
 5. A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries
    in relational data bases. In STOC, 1977.
12

 6. S. Chaudhuri and M. Y. Vardi. Optimization of real conjunctive queries. In PODS
    ’93, pages 59–70, New York, NY, USA, 1993. ACM.
 7. C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. Theor.
    Comput. Sci., 239:211–229, May 2000.
 8. S. Cohen. Equivalence of queries that are sensitive to multiplicities. The VLDB
    Journal, 18(3):765–785, 2009.
 9. M. P. Consens and A. O. Mendelzon. Graphlog: a visual formalism for real life
    recursion. In PODS ’90, pages 404–416, New York, NY, USA, 1990. ACM.
10. V. Dalmau and P. Jonsson. The complexity of counting homomorphisms seen from
    the other side. TCS, 329(1–3):315–323, 2004.
11. P. Goralcik and V. Koubek. Verifying nonrigidity. Information Processing Letters,
    22(2):91–95, 1986.
12. G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable
    queries. In PODS ’99, pages 21–32, New York, NY, USA, 1999. ACM.
13. G. Gottlob, N. Leone, and F. Scarcello. Hypertree Decompositions: A Survey. In
    FOCS, 2001.
14. G. Gottlob, N. Leone, and F. Scarcello. Robbers, marshals, and guards: game
    theoretic and logical characterizations of hypertree width. JCSS, 66(4):775–808,
    2003.
15. T. J. Green. Containment of conjunctive queries on annotated relations. In ICDT,
    2009.
16. T. J. Green, Z. G. Ives, and V. Tannen. Reconcilable diﬀerences. In ICDT, 2009.
17. Z. Hedrlı́n and A. Pultr. Symmetric relations (undirected graphs) with given
    semigroups. Monatsh. Math., 69(4):318–322, 1965.
18. P. Hell and J. Nesšetřil. Graphs and homomorphisms. Oxford University Press,
    USA, 2004.
19. C. Hoﬀmann. Group-theoretic algorithms and graph isomorphism. Lecture notes
    in computer science. Springer, 1982.
20. Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: beyond
    relations as sets. ACM Trans. Database Syst., 20(3):288–324, 1995.
21. S. Janson, T. Luczak, and A. Ruciński. Random graphs. Wiley-Interscience Series
    in Discrete Mathematics and Optimization, 2000.
22. T. S. Jayram, P. G. Kolaitis, and E. Vee. The containment problem for real
    conjunctive queries with inequalities. In PODS, 2006.
23. A. Klug. On conjunctive queries containing inequalities. J. ACM, 35(1):146–160,
    1988.
24. J. Köbler, U. Schöning, and J. Torán. The graph isomorphism problem: its struc-
    tural complexity. Springer, 1993.
25. J. Lauri and R. Scapellato. Topics in graph automorphisms and reconstruction.
    Cambridge Univ Pr, 2003.
26. G. McNulty and C. Shallon. Inherently nonﬁnitely based ﬁnite algebras. In
    R. Freese and O. Garcia, editors, Universal Algebra and Lattice Theory. 1983.
27. W. Nutt, Y. Sagiv, and S. Shurin. Deciding equivalences among aggregate queries.
    In PODS ’98, pages 214–223, New York, NY, USA, 1998. ACM.
28. R. Pichler and S. Skritek. The complexity of evaluating tuple generating depen-
    dencies. In ICDT2011.
29. R. van der Meyden. The complexity of querying indeﬁnite data about linearly
    ordered domains. In PODS ’92, pages 331–345, New York, NY, USA, 1992. ACM.
                                                                                 13

A      Transitive closure of inequality set

Consider a satisfiable set C of inequalities, and denote by V(C) the set of vari-
ables and constants occurring in C. The transitive closure of C, denoted by C ∗ ,
is the conjunction of all inequalities over V(C) entailed by C. For example, if
C = {x < y, y ≤ 2}, its transitive closure C ∗ = {x < y, y ≤ 2, x < 2, x ≤ y, x ≤
2, x 6= y, y 6= 2, x 6= 2}. (Notice that C |= x < 3, but x < 3 is not contained in
C ∗ since 3 does not occur in C.) The transitive closure of a set of inequalities
can be computed in polynomial time.

Proposition 5. For sets C1 and C2 of inequalities, the following are equivalent:

1. C1 |==| C2 , i.e. C1 |= C2 and C2 |= C1 .
2. C1 and C2 have the same sets of linearizations.
3. C1 and C2 have the same transitive closures.


B      Proofs

Proof. (of Proposition 3) One direction is clear as the graph automorphism prob-
lem can be trivially reduced to the problem of checking symmetry of a Boolean
CQ Q with a single binary relation E such that E(x, y) and E(y, x) are both
subgoals in Q if vertices x and y are adjacent in graph G. In the other direction,
given a CQ Q, we construct its incidence graph G(Q). By Proposition 2, Q is
symmetric iff G(Q) is symmetric. G(Q) is a labeled graph, but it is easy to ver-
ify that labeled graph automorphism is many-one reducible to (ordinary) graph
automorphism, which completes the proof.                                        ⊓
                                                                                ⊔

Proof. (of Corollary 1) The reductions used in Proposition 3 are parsimonious, so
computing the symmetry degree of a CQ< is polynomial-time many-one equiva-
lent to #GA3 . By results in [24], #GA is polynomial-time Turing equivalent to
the graph isomorphism problem.                                                  ⊓
                                                                                ⊔

Proof. (of Corollary 2) Since the number of self-joins is bounded, the times
of labels that are used in its associated hypergraph are also bounded, which
corresponds exactly to bounded color classes. Theorem 15 in [3] states that
“Given a hypergraph X = (V, E) with color classes of size bounded by k, there
is an nO(k) time algorithm that computes Aut(X) as a generating set in Sym(V ).
In particular, HGI and HGA are in ptime.”, which completes the proof.        ⊓
                                                                             ⊔

Proof. (of Proposition 4) For CQ< (and hence CQ as well) it is easy to see this
problem is in coNP because we can guess a nontrivial endomorphism candidate
and check in polynomial time if it is an endomorphism. If so, the query is not
3
    #GA is the problem of computing the order of graph automorphism group of a given
    graph. It is polynomial-time Turing equivalent to #GI and GI [24], where #GI is
    short for the counting the number of graph isomorphisms from one graph to the
    other.
14

rigid. Goralcik et al. [11] showed that the problem of whether a finite algebra
A is nonrigid is NP-complete as soon as the type of A has either one binary or
two unary symbols. This result can be applied in particular to the graph algebra
associated with a directed graph [26] to show that checking rigidity of directed
graphs is also coNP-complete. Since a directed graph can be transformed to a
Boolean CQ using a single binary predicate, such that the graph is rigid iff the
associated query is rigid, it follows that checking rigidity of CQs (and hence also
CQ< s) is coNP-complete.

Proof. (of Lemma 2) We can find a homomorphism h from QR to QR       L by using
this NP oracle. We can also enumerate all the automorphisms of Q by this NP
oracle in polynomial time since it has polynomial symmetry degree. Using similar
argument as in Lemma 1, by composing h by the elements in Aut(QR ) one by
one and applying to L and then see if it is compatible with C, we can get the
number of the linearizations in lin(Q) that are isomorphic to QL .            ⊓
                                                                              ⊔

Proof. (of Theorem 5) Use the same argument as in Theorem 2, but instead of a
GI oracle, we use an NP oracle as Lemma 2 suggests. This yields a coNPNP = Π2p
upper bound.                                                                ⊓
                                                                            ⊔

Proof. (of Theorem 6) Assuming that two rigid queries Q and Q′ are set equiv-
alent, then there is only one homomorphism mapping ψ from Q′R to QR and
only one homomorphism mapping ϕ from QR to Q′R , i.e. QR and Q′R are ho-
momorphically equivalent. But Q and Q′ are rigid, hence cores, and if two cores
are homomorphically equivalent then they are isomorphic. Since Q′R and QR
are cores, ψ is a unique isomorphism from Q′R to QR . Now the only problem
is to find out this unique isomorphism mapping, and check the conditions of
these queries are compatible under this mapping which is in polynomial time.
By [24], asymmetric graph isomorphism is polynomial-time Turing equivalent to
the graph automorphism problem. Since rigid graphs are subclass of asymmetric
graphs, the set equivalence problem is polynomial-time Turing reducible to the
graph automorphism problem.                                                  ⊓
                                                                             ⊔