Bag Equivalence of Bounded Symmetry-Degree Conjunctive Queries with Inequalities Mingmin Chen and Todd J. Green Department of Computer Science University of California, Davis Davis, CA 95616 USA {chenmi,green}@cs.ucdavis.edu Abstract. We consider the problem of checking equivalence of conjunc- tive queries with inequalities under bag (multiset) semantics. The prob- lem is known to be decidable in pspace and as hard as graph isomor- phism, but its exact complexity remains open. We introduce a natural restriction based on what we call the symmetry degree of queries, and show that when the symmetry degree is bounded by a fixed polyno- mial in the size of the query, the equivalence problem is in Π2p . We also show that for asymmetric queries (those of symmetry degree 1), check- ing bag-equivalence is polynomial-time Turing equivalent to the graph automorphism problem. These results can be interpreted as first steps in a more general program of finding “islands of tractability” for testing equivalence of conjunctive queries with inequalities under bag (rather than set) semantics. 1 Introduction We study the problem of checking equivalence of conjunctive queries (CQs) with built-in inequality predicates over a densely-ordered domain under bag (multiset) semantics. This problem is well-understood assuming the classical set semantics, where it is known that the complexity jumps from NP-complete for ordinary CQs [5] to Π2p -complete when inequalities are allowed [23, 29]. For bag semantics (implemented by most DBMSs, and hence of practical as well as theoretical interest), the problem is comparatively less-well understood: it is known to be as hard as graph isomorphism [6] and decidable in pspace [27], but the exact complexity is unknown (and seems to be difficult to resolve). In this paper, we introduce a natural restriction on the structure of CQs called bounded symmetry degree (a bound on the number of query automorphisms) and use this notion to study the complexity of the bag equivalence problem. The notion is adapted from the well-known concept of graph symmetry studied extensively in the graph theory literature [17, 21, 25, 18]. We show that if the symmetry degree of CQs with inequalities is bounded, then bag equivalence can be decided in Π2p (and we also show that checking for bounded symmetry degree can be done in Π2p ). In fact, we show this holds even when we relax the restriction to allow queries of polynomially bounded symmetry degree (see Section 3.1). We 2 also consider several stronger forms of symmetry degree bounds, and establish that for asymmetric CQs, bag equivalence of CQs with inequalities reduces to checking graph automorphism, while a bound on the number of self-joins in the queries yields a coNP upper bound. We also compare the notion of bounded symmetry degree with the standard notion of bounded hypertree width, which turns out to be orthogonal and not obviously applicable to the setting of bag semantics. These results may be interpreted as first steps in a more general program of finding “islands of (relative) tractability” for testing equivalence of CQs with inequalities under bag (rather than set) semantics. Outline. Section 2 contains preliminaries and a discussion of related work. Sec- tion 3 defines the notion of bounded symmetry degree, establishes some basic results, and compares it with bounded hypertree width. Section 4 presents the main results on equivalence, for bag and bag-set semantics. We conclude in Sec- tion 5. The Appendix contains formal proofs of our results. 2 Preliminaries and Related Work Conjunctive queries with inequalities. Following the standard terminology [1], a conjunctive query (CQ) is a safe, nonrecursive datalog rule whose body is a conjunction of relational atoms. A conjunctive query with inequalities (CQ< ) is a conjunctive query extended with a conjunction of inequality atoms using <, ≤, and 6=. We use uppercase letters for predicate symbols, x, y, z, . . . for variables, a, b, c, . . . for constants, and t, u, v, . . . for variable or constant terms, with over- bars x̄, ā, t̄, etc. indicating lists. We write a CQ< Q as Q(x̄) :– R, C where R, the relational part, is a conjunction of relational atoms and C, the conditions, is a conjunction of inequalities x θ t, θ ∈ {<, ≤, 6=}. We denote by QR the CQ ob- tained from Q by deleting C. For example, if Q is Q(x, y) :– T (x, y), S(x, y), x > y, y 6= 2, then QR is QR (x, y) :– T (x, y), S(x, y). Semantics of queries. We fix a densely-ordered domain of constants which, for concreteness, we will assume is that of the rationals Q. A set relation of arity k is a finite set R of tuples from Qk . A bag (multiset) relation of arity k is a mapping R : Qk → N of tuples to their associated multiplicities, where the mapping is non-zero on only finitely many tuples. A set (resp., bag) database instance I is an assignment of set (resp., bag) relations RI to predicate names R. The semantics of CQ< s on set and bag relations is based on the notion of valuations. A valuation is mapping ν of variables to constants, extended to map constants to themselves. Valuations operate pointwise on lists/tuples and atoms in the expected way. Let Q be a CQ Q(t̄) :– R1 (t¯1 ), . . . , Rn (t¯n ), C and let I be a set instance. The result of evaluating Q on I under set semantics is the set relation JQKI defined JQKI = {ν(t̄) | ν |= C and ν(t¯1 ) ∈ R1I ∧ · · · ∧ ν(t¯n ) ∈ RnI } 3 Now suppose I is a bag instance. The result of evaluating Q on I under bag semantics is the bag relation JQKIb defined X JQKIb (ā) = (R1I (ν(t¯1 )) × · · · × RnI (ν(t¯n ))) ν where the sum is over all valuations ν such that ν(t̄) = ā and ν |= C. Finally, suppose I is a set instance. The result of evaluating Q on I under bag-set se- mantics is the bag relation JQKIbs obtained by viewing I as a (duplicate-free) bag relation, and evaluating Q on I under bag semantics. A CQ< Q is satisfiable (or consistent [23]) if there exists a set (or, equiva- lently, a bag) database instance I such that JQKI 6= ∅. Every CQ is satisfiable, and a CQ< is satisfiable iff its comparison part is satisfiable. The latter can be checked in linear time using an algorithm in [4]. Here, we focus only on satisfiable queries, and we do not allow equality atoms x = t (since, for satisfiable queries, such atoms can always be eliminated by replacing each occurrence of x with t). Containment mappings and isomorphisms. Let Q(x̄) :– R, C and Q′ (x̄′ ) :– R′ , C ′ be two CQ< s, and denote by V(Q) and V(Q′ ) the sets of variables and constants that occur in Q and Q′ , respectively. A containment mapping [5] (or homomor- phism) from Q′ to Q is a mapping h : V(Q′ ) → V(Q), lifted to operate on atoms and conjunctions of atoms in the obvious way, such that: 1. h(t̄′ ) = t̄. 2. h is the identity mapping on constants. 3. h(R′ ) ⊆ R. 4. C |= h(C ′ ), i.e. C |= h(a′ ) θ h(b′ ) if a′ θ b′ is in C ′ where θ ∈ {<, ≤, 6=} A homomorphism h is called an isomorphism if h is bijective and its inverse is also a homomorphism. Note that Q, Q′ are isomorphic via mapping h iff QR and Q′R are isomorphic and C and C ′ are logically equivalent modulo h. Given a can- didate isomorphism mapping, the latter condition can be checked by comparing the transitive closures of C and C ′ (see Appendix for details). An endomorphism (resp., automorphism) is a homomorphism (resp., isomor- phism) from a CQ< to itself. For a CQ< Q, the set of automorphisms of QR , denoted by Aut(QR ), forms a subgroup of the symmetric group on V(Q). The identity element in this group is the identity mapping. Similarly, the set of en- domorphisms of QR , denoted by End(QR ), forms a monoid. Linearizations and linear expansions. We use the notions of linearizations and linear expansions of CQ< s due to Nutt et al. [27], which are based on the stan- dard notion of linear extensions for partially ordered sets. Let C be a set of inequalities (<, ≤, 6=) over a set T of variables and constants. A linearization of C is a total order L over T that is compatible with C. (Note that L may equate distinct variables, when this is allowed by C.) If L is a linearization of C, and C is the condition of a CQ< Q, then we denote QL the CQ< obtained from Q by replacing C with L, and we say that 4 QL is a linearization of Q. By default, we do not eliminate equalities in lineariza- tions unless explicitly pointed out. (Thus, the condition for a linearization is a conjunction of inequalities of the form x θ t, θ ∈ {<, =}.) The linear expansion of Q is the bag lin(Q) of all linearizations of Q (it can be viewed as a union of CQ< s). We say that linear expansions lin(Q) and lin(Q′ ) are isomorphic if there is a bijection f : lin(Q) → lin(Q′ ) such that QL is isomorphic to f (QL ) for every QL ∈ lin(Q). Known results on equivalence. The equivalence problem for conjunctive queries with or without inequalities under set semantics, bag semantics, bag-set seman- tics have been studied extensively, beginning with the seminal paper by Chan- dra and Merlin [5] which established the NP-completeness of checking contain- ment/equivalence of CQs under set semantics using a characterization in terms of containment mappings. Containment/equivalence of CQ< queries was shown to be in Π2P by Klug [23] and Π2P -hard by van der Meyden [29]. The papers by Chaudhuri and Vardi [6] and Ioannidis and Ramakrishnan [20] initiated the study of query containment and equivalence under bag/bag-set se- mantics. The former paper showed that two CQs are bag-equivalent iff they are isomorphic, which yields a logspace equivalence of the problem with the graph isomorphism problem (GI). The latter paper showed that containment of unions of CQs is undecidable via a reduction from Hilbert’s Tenth Problem. The decid- ability of bag containment of CQs remains an open problem. (A recent paper by Afrati et al. [2] gives positive decidability results for certain classes of CQs.) Bag containment of CQ< s, on the other hand, was shown to be undecidable by Jayram et al. [22]. Checking bag-set equivalence of CQ< s was shown to be decidable by Nutt et al. [27] (Cohen [8] extends it to bag semantics), a result we shall refer to again in the sequel: Theorem 1 ([27, 8]). Two CQ< s Q, Q′ are bag (bag-set) equivalent if and only if they have isomorphic linear expansions, which can be checked in pspace. This theorem also holds for bag-set equivalence for CQ< s. The pspace upper bound seems unlikely to be tight: in fact, using their characterization of equiva- lence in terms of linear expansions, it is not hard to show that the upper bound can be sharpened somewhat to coNP#P. (coNP#P is contained in pspace, but it is not known whether the containment is strict.) However, the exact complexity of the problem remains open. The following simple variation on an example given in [27] shows that non- isomorphic CQ< s may indeed have isomorphic linear expansions, hence isomor- phism is not a necessary condition for bag equivalence of CQ< s: Example 1. Let ϕ(x, y, z) be the conjunction of relational atoms R(x, y, z), R(x, z, y), R(y, x, z), R(y, z, x), R(z, x, y), R(z, y, x). Now consider two Boolean CQs< Q, Q′ defined Q() :– ϕ(x, y, z), x < z, y < z, x 6= y Q′ () :– ϕ(x, y, z), x < y, x < z, y 6= z 5 The linear expansions of Q and Q′ are Q1 () :– ϕ(x, y, z), x < z, y < z, x > y Q2 () :– ϕ(x, y, z), x < z, y < z, x < y Q′1 () :– ϕ(x, y, z), x < y, x < z, y > z Q′2 () :– ϕ(x, y, z), x < y, x < z, y < z Observe that Q1 is isomorphic to Q′1 and Q2 is isomorphic to Q′2 . By Theorem 1, Q and Q′ are bag-equivalent since they have isomorphic linear expansions. However, Q and Q′ are not isomorphic. 3 Symmetry 3.1 Motivation and basic definitions A CQ< is linear [9, 27] if its body contains no two occurrences of the same relational predicate symbol (in other words, self-joins are not allowed). For linear CQ< , an easy corollary of Theorem 1 (pointed out in [27]) is that bag equivalence can be checked in polynomial time. In this section, we introduce a generalization of this restriction based on what we term the symmetry degree of a query. This is a natural idea derived from the well-studied notion of the symmetry degree of a graph (see, e.g., [17, 21, 25, 18]), but it does seem to have been applied in database theory before. An automorphism of a directed graph G = (V, E) is a permutation σ of the vertex set V, such that for any edge e = (u, v), σ(e) = (σ(u), σ(v)) is also an edge. The automorphism group of G contains all these automorphisms. The order of the automorphism group of G is called its symmetry degree, denoted sym(G). For queries, the notion is exactly analogous: we define the symmetry degree of a CQ< Q, sym(Q), to be the order of its automorphism group Aut(QR ). Example 2. To illustrate, consider the Boolean CQs below: Q1 () :– P (x1 , x2 ), P (x2 , x3 ), . . . , P (xn−1 , xn ) Q2 () :– S(u, v), S(u, w) Q3 () :– R(y1 ), . . . , R(yn ) Q1 has symmetry degree 1 (the variables must all stay put), Q2 has symmetry degree 2 (v and w can be swapped), and Q3 has symmetry degree n! (any permutation of the variables is an automorphism). We will examine the complexity of computing sym(Q) in Section 3.2. In the meantime, we derive an easy bound based on counting self-joins. Proposition 1. Suppose predicates p1 , . . . , pk each occur n1 , . . . , nk times, re- spectively, in the body of a CQ Q. Define the number of self-joins of Q to be n = n1 + · · · + nk − k. Then Y k sym(Q) ≤ (ni !) ≤ (n + 1)!. i=1 6 In particular, the proposition implies that linear queries have symmetry degree one. On the other hand, queries may have low symmetry degree but many self- joins, e.g., Q1 in Example 2. We introduce some additional terminology to use in the remainder of the pa- per. Q is called symmetric if Aut(QR ) is a non-trivial subgroup of the symmetric group on V(Q), i.e. sym(Q) > 1. Q is called asymmetric if Aut(QR ) is trivial, which only contains the identity, i.e. sym(Q) = 1. Q is called rigid if End(QR ) is trivial. Incidence graphs. A CQ has a natural interpretation as an edge-labeled directed hypergraph. However, to apply results of interest from graph theory, we may need to transform this hypergraph into an ordinary directed graph. Given a CQ Q, we construct its incidence graph [7] G(Q) = (V, E, λ), a labeled directed graph, as follows: V has a vertex labeled x for each variable x in Q, a vertex labeled c for each constant c in Q, and a vertex labeled P for each occurrence of predicate P in Q; and E contains an edge labeled n from vertex t to predicate vertex P whenever t occurs as the nth argument in P . For example, the incidence graph of CQ Q(x) :– T (x, y), S(x, z), S(z, 2) is shown below: 1 x v Q Q(x) 1 1 y v 2 T T(x,y) 2 z v S S(x,z) 1 2 2 2 S S(z,2) We introduce labels in the definition of incidence graph to ensure that the con- struction preserves the symmetry degree1 of Q: Proposition 2. For any CQ Q, we have sym(Q) = sym(G(Q)). 3.2 Complexity of Computing Symmetry Degrees We next turn to the question of computing the symmetry degree of a CQ, starting with the simpler problem of deciding whether a query is symmetric. Not sur- prisingly, the latter problem reduces easily to the graph automorphism problem (GA), which is the problem of checking whether a given graph has a non-trivial automorphism.2 1 The automorphism group of a labeled directed graph G = (V, E , λ) is the subgroup of the automorphism group of G ′ = (V, E ) which respects the labeling λ. 2 As with GI, GA is in NP but not known or believed to be either NP-complete or in ptime. It is no harder than GI in the sense that the problem is polynomial-time many-one reducible to GI; however the converse relationship is at present unknown. 7 Proposition 3. Checking whether a CQ< (CQ) is symmetric is polynomial- time many-one equivalent to the graph automorphism problem. This leads, using results in [24], to a characterization of the complexity of com- puting the symmetry degree: Corollary 1. Computing the symmetry degree of a CQ< (or CQ) is polynomial- time Turing equivalent to the graph isomorphism problem. The following is a direct corollary of a result in [3]. Corollary 2 (of Theorem 15 in [3]). If the number of self-joins in a CQ< Q is bounded by some constant k, then there is an nO(k) time algorithm that computes Aut(QR ) as a generating set in the symmetric group on V(Q). Finally, we consider testing for rigidity. Again our strategy is to reduce the problem on CQ< queries to the analogous problem on graphs, where a result of Goralcik et al. [11] shows as a special case that graph rigidity is coNP-complete. Proposition 4. The problem of checking whether a CQ< (CQ) is rigid is coNP- complete. 3.3 Symmetry Degree versus Hypertree Width Gottlob et al. [12, 13] introduced the notion of hypertree width as a generalization of the classical notion of acyclic queries, and proved that Boolean conjunctive queries of bounded hypertree width can be evaluated efficiently (and that, as a consequence, query containment under set semantics is also tractable). Symmetry degree and hypertree width seem to be essentially orthogonal con- cepts: intuitively, hypertree width is in some sense a measure of cyclicity, rather than symmetry. Moreover, a query with high symmetry degree may have small hypertree width, such as the following CQ that computes a “star self-join” on the first column of a relation R: Q1 () :– R(x, y1 ), R(x, y2 ), . . . , R(x, yk ) Q1 has symmetry degree k! but is acyclic hence has hypertree width 1. At the same time, a query with a high hypertree width may have a small symmetry degree, such as the CQ below: ^ k ^ k Q2 () :– Rik−k+j (xi , xj ) i=1 j=1,j6=i (Q2 intuitively corresponds to a complete directed graph without self-loops on k vertices where each edge is labeled with a distinct predicate name.) Q2 has symmetry degree 1, but we can show that it has hypertree width ⌈k/2⌉. We are not aware of any work studying the interaction of hypertree width and bag equivalence of queries. We also do not know of any work studying whether 8 bounded hypertree width leads to tractable query containment or equivalence checks for CQ< s under the classical set semantics. A recent paper by Pichler et al. [28] showed that the the query complexity of evaluating CQs under bag-set semantics becomes tractable when the CQs have bounded treewidth. A natural question (which we leave open here) is whether bounded symmetry degree also makes query evaluation under bag or bag-set semantics tractable. 4 Main Results We are now ready to present our results on the complexity of bag equivalence of CQ< s having bounded symmetry degree (Section 4.1). We also discuss how these bounds transfer to bag-set semantics (Section 4.2), and present an initial result that applies these ideas to the query equivalence problem under classical set semantics (Section 4.3). 4.1 Bag Equivalence As already mentioned, Nutt et al. [27] showed that checking bag-equivalence of CQ< s can be done in pspace, but the exact complexity of the problem remains unclear. Here, we show that if the symmetry degree of the queries is bounded, then the upper bound can be lowered to Π2p . In fact, we use a rather liberal notion of “bounded” here where we fix a univariate polynomial P ahead of time and say that a CQ< Q of length n has polynomially bounded symmetry degree if sym(Q) ≤ P (n). Theorem 2. Checking bag equivalence of CQ< s of polynomially bounded sym- metry degree is in coNPGI , hence in Π2p . The proof of the theorem relies on the following technical lemma: Lemma 1. Let Q be a CQ< of polynomially bounded symmetry degree, and let QL be one of its linearizations. Then the number of linearizations in lin(Q) that are isomorphic to QL can be computed in polynomial time with access to a GI oracle. Proof. The basic idea of the proof is as follows: first, we use the GI oracle to com- pute Aut(QR ); second, we use Aut(QR ) to compute the number of linearizations in lin(Q) that are isomorphic to QL . To compute Aut(QR ), we first compute the generating set of Aut(QR ). This can be done in polynomial time using the GI oracle, since it is well-known that the problem of computing the generating set of the automorphism group of a labeled graph is polynomial-time Turing equivalent to GI [24, 19]. Now, since we have assumed sym(Q) ≤ P (n), and we have the generating set for Aut(QR ), it is clear that we can enumerate all the elements of Aut(QR ) in polynomial time. Next, let S denote the set of all linearizations in lin(Q) that are isomorphic to QL . Our goal is to compute |S|. We observe that a linearization QL′ ∈ S iff 9 h(L) = L′ for some h ∈ Aut(QR ) and L′ |= C, where C denotes the condition part of Q. In other words, S = {h(QL ) | h ∈ Aut(QR ) and h(L) |= C}. Since we have already computed Aut(QR ), we can therefore use it to compute S in polynomial time, and return |S| as the final answer. ⊓ ⊔ Now we are ready to prove Theorem 2. Proof. By Lemma 1, given a GI oracle and a linearization QL of Q, we can com- pute in polynomial time the numbers of linearizations of Q and Q′ , respectively, which are isomorphic to QL . If these two numbers are different, then it follows by Theorem 1 that Q and Q′ are not bag-equivalent. Thus we can guess a non- membership witness and verify it in polynomial time given a GI oracle, which means this problem is in coNPGI . Since GI≤pT NP, it follows that the problem is also in Π2p . ⊓ ⊔ We note that the pre-condition of Theorem 2 can certainly be checked in Π2p , by Corollary 1. Also, we note that it is not hard to show that Lemma 1 and Theorem 2 can be generalized to show a Π2p upper bound for checking bag equivalence of unions of CQ< s (UCQ< s). In Section 4.2, we will also generalize it to apply to bag-set equivalence. Next we show some other (relatively) tractable subcases of checking bag equivalence of CQ< s. Here, we show that if the number of self-joins is bounded by some constant, we can push the upper bound down to coNP. Theorem 3. If the numbers of self-joins in CQ< Q, Q′ are bounded by some constant k, then the problem of checking whether Q and Q′ are bag equivalent is in coNP. Proof. Here wlog we assume that Q and Q′ have the same relational part, since their relational parts are isomorphic by Theorem 1. As stated in Proposition 1, the symmetry degree is bounded by (k + 1)! and we can enumerate the auto- morphism group in constant time. For checking bag equivalence of Q and Q′ , we can guess a linearization QL . Initialize two sets SQ and SQ′ to be empty. By applying the automorphism to QL one by one and see if the comparison part is compatible with the comparison parts of Q and Q′ respectively and if yes add it to the corresponding SQ or SQ′ . This can be done in polynomial time. If the orders of these two sets are different, then Q and Q′ are not bag-equivalent, since they have different numbers of linearizations that are isomorphic to QL . In other words, we can guess a non-member witness and verify it in polynomial time. Hence, this problem is in coNP. ⊓ ⊔ By imposing various restrictions on the symmetry degree of queries, we can establish other upper bounds on the complexity of the problem. The following result is one example. Theorem 4. Asymmetric CQ< s Q and Q′ are bag equivalent iff there is an iso- morphism h from Q′R to QR such that h(C ′ ) |==| C. Moreover, bag equivalence of asymmetric CQ< queries is polynomial-time Turing equivalent to the graph automorphism problem. 10 Proof. By Theorem 1, Q, Q′ have isomorphic linear expansions. In other words, there is a bijective mapping ψ : lin(Q) 7→ lin(Q′ ) such that QL is isomorphic to ψ(QL ) for all QL ∈ lin(Q). Denote by ϕi the isomorphism mapping from QLi to ψ(QLi ). Since Q, Q′ are asymmetric, there is only one isomorphism mapping from QR to Q′R (Suppose there are two ϕ1 , ϕ2 , then ϕ1 ◦ ϕ−1 2 is a nontrivial automorphism of QR , which is a contradiction). So, ϕi are all the same for any QLi ∈ lin(Q). The isomorphism of Q and Q′ follows. The only problem is to find out this only isomorphism mapping of their relational parts and check if their comparisons are equivalent under this isomorphism. By [24], asymmetric graph isomorphism is polynomial-time Turing equivalent to the graph automor- phism problem. Hence, finding this isomorphism mapping from QR to Q′R is polynomial-time Turing equivalent to the graph automorphism problem, so is the bag equivalence problem. ⊓ ⊔ 4.2 Bag-Set Equivalence Next we examine bag-set equivalence of CQ< s. Theorem 4 holds not just for bag semantics, but also for bag-set semantics. Let us now look at extending Theorem 2 as well. We have the following lemma which is analogous to Lemma 1, but uses an NP (rather than GI) oracle. Intuitively, we use this extra power to deal with the fact that for bag-set semantics, when manipulating linearizations, we are forced to reduce equalities (since this can have the effect revealing that an atom in the body is actually a duplicate copy and should be removed). Lemma 2. Let Q be a CQ< of polynomially bounded symmetry degree, and let QL be one of its linearizations (with equalities reduced). The number of the linearizations in lin(Q) that are isomorphic to QL can be computed in polynomial time with access to a NP oracle. Using this lemma, we can establish the analog of Theorem 2 for bag-set semantics. Theorem 5. Checking bag-set equivalence of CQ< s of polynomially bounded symmetry degree is in coNPN P = Π2p . 4.3 Set Equivalence We have already seen in the preceding sections that bounded symmetry degree makes bag (bag-set) equivalence problem of CQ< easier to tackle. We show in this section a preliminary application of bounded symmetry to checking equivalence under the classical set semantics. As mentioned earlier, Klug [23] and Van der Meyden [29] showed that the containment/equivalent problem is Π2p -complete for CQ< s, hence equivalence is in Π2p . Here we show that the complexity is lower than Π2p if the CQ< s are rigid. Theorem 6. Checking set equivalence of rigid CQ< s is polynomial-time Turing reducible to the graph automorphism problem. 11 5 Conclusions and Future Work We note that the equivalence results on bag semantics of CQ< s presented here have wider applicability to any semantics in which CQ< equivalence is character- ized by isomorphism of linearizations. This includes, for example, equivalence of CQ< on databases annotated with certain kinds of provenance information [15], as well as equivalence under the recently proposed Z-semantics [16]. We conjecture that if only <, ≤ are allowed as inequalities, two CQ< s Q, Q′ are bag equivalent if and only if there is an isomorphism from Q′R to QR under which the condition of Q′ is equivalent to that of Q. One possible future direction involves applying bounded symmetry degree to study query containment. Although the decidability of bag containment of CQs remains a longstanding open problem (Chaudhuri and Vardi [6] showed Π2P - hardness of this problem), bag containment of CQ< s or of conjunctive queries is known to be undecidable [20, 22]. A natural question is whether these unde- cidability results hold even for queries of bounded symmetry degree. In fact, the reduction from Hilbert’s Tenth Problem in [20] constructs a pair of UCQs each of which is composed of asymmetric CQs, hence bag containment remains unde- cidable even for unions of asymmetric CQs. The reduction of [22], on the other hand, uses CQ< s of arbitrary symmetry degree, hence it remains open whether containment of CQ< s of bounded symmetry degree is decidable. Finally, we would like to pin down the exact complexity of the general case of bag equivalence of CQ< queries, which seems stubborn to resolve. The problem is reminiscent to another open issue in the area, that of identifying the exact complexity of checking bag equivalence of positive relational algebra queries with inequalities. This class of queries is expressively equivalence to UCQ< , but exponentially more concise, and here again the known lower and upper bounds on the complexity are GI and pspace, respectively. Acknowledgements. We thank the anonymous reviewers for their helpful com- ments. This work was supported by NSF CAREER award IIS-1055107. References 1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. 2. F. N. Afrati, M. Damigos, and M. Gergatsoulis. Query containment under bag and bag-set semantics. Inf. Process. Lett., 110:360–369, April 2010. 3. V. Arvind and J. Köbler. On Hypergraph and Graph Isomorphism with Bounded Color Classes. STACS 2006, pages 384–395, 2006. 4. N. Brisaboa, H. Hernández, J. Paramá, and M. Penabad. Containment of conjunc- tive queries with built-in predicates with variables and constants over any ordered domain. In Advances in Databases and Information Systems. Springer, 1998. 5. A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, 1977. 12 6. S. Chaudhuri and M. Y. Vardi. Optimization of real conjunctive queries. In PODS ’93, pages 59–70, New York, NY, USA, 1993. ACM. 7. C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. Theor. Comput. Sci., 239:211–229, May 2000. 8. S. Cohen. Equivalence of queries that are sensitive to multiplicities. The VLDB Journal, 18(3):765–785, 2009. 9. M. P. Consens and A. O. Mendelzon. Graphlog: a visual formalism for real life recursion. In PODS ’90, pages 404–416, New York, NY, USA, 1990. ACM. 10. V. Dalmau and P. Jonsson. The complexity of counting homomorphisms seen from the other side. TCS, 329(1–3):315–323, 2004. 11. P. Goralcik and V. Koubek. Verifying nonrigidity. Information Processing Letters, 22(2):91–95, 1986. 12. G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable queries. In PODS ’99, pages 21–32, New York, NY, USA, 1999. ACM. 13. G. Gottlob, N. Leone, and F. Scarcello. Hypertree Decompositions: A Survey. In FOCS, 2001. 14. G. Gottlob, N. Leone, and F. Scarcello. Robbers, marshals, and guards: game theoretic and logical characterizations of hypertree width. JCSS, 66(4):775–808, 2003. 15. T. J. Green. Containment of conjunctive queries on annotated relations. In ICDT, 2009. 16. T. J. Green, Z. G. Ives, and V. Tannen. Reconcilable differences. In ICDT, 2009. 17. Z. Hedrlı́n and A. Pultr. Symmetric relations (undirected graphs) with given semigroups. Monatsh. Math., 69(4):318–322, 1965. 18. P. Hell and J. Nesšetřil. Graphs and homomorphisms. Oxford University Press, USA, 2004. 19. C. Hoffmann. Group-theoretic algorithms and graph isomorphism. Lecture notes in computer science. Springer, 1982. 20. Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: beyond relations as sets. ACM Trans. Database Syst., 20(3):288–324, 1995. 21. S. Janson, T. Luczak, and A. Ruciński. Random graphs. Wiley-Interscience Series in Discrete Mathematics and Optimization, 2000. 22. T. S. Jayram, P. G. Kolaitis, and E. Vee. The containment problem for real conjunctive queries with inequalities. In PODS, 2006. 23. A. Klug. On conjunctive queries containing inequalities. J. ACM, 35(1):146–160, 1988. 24. J. Köbler, U. Schöning, and J. Torán. The graph isomorphism problem: its struc- tural complexity. Springer, 1993. 25. J. Lauri and R. Scapellato. Topics in graph automorphisms and reconstruction. Cambridge Univ Pr, 2003. 26. G. McNulty and C. Shallon. Inherently nonfinitely based finite algebras. In R. Freese and O. Garcia, editors, Universal Algebra and Lattice Theory. 1983. 27. W. Nutt, Y. Sagiv, and S. Shurin. Deciding equivalences among aggregate queries. In PODS ’98, pages 214–223, New York, NY, USA, 1998. ACM. 28. R. Pichler and S. Skritek. The complexity of evaluating tuple generating depen- dencies. In ICDT2011. 29. R. van der Meyden. The complexity of querying indefinite data about linearly ordered domains. In PODS ’92, pages 331–345, New York, NY, USA, 1992. ACM. 13 A Transitive closure of inequality set Consider a satisfiable set C of inequalities, and denote by V(C) the set of vari- ables and constants occurring in C. The transitive closure of C, denoted by C ∗ , is the conjunction of all inequalities over V(C) entailed by C. For example, if C = {x < y, y ≤ 2}, its transitive closure C ∗ = {x < y, y ≤ 2, x < 2, x ≤ y, x ≤ 2, x 6= y, y 6= 2, x 6= 2}. (Notice that C |= x < 3, but x < 3 is not contained in C ∗ since 3 does not occur in C.) The transitive closure of a set of inequalities can be computed in polynomial time. Proposition 5. For sets C1 and C2 of inequalities, the following are equivalent: 1. C1 |==| C2 , i.e. C1 |= C2 and C2 |= C1 . 2. C1 and C2 have the same sets of linearizations. 3. C1 and C2 have the same transitive closures. B Proofs Proof. (of Proposition 3) One direction is clear as the graph automorphism prob- lem can be trivially reduced to the problem of checking symmetry of a Boolean CQ Q with a single binary relation E such that E(x, y) and E(y, x) are both subgoals in Q if vertices x and y are adjacent in graph G. In the other direction, given a CQ Q, we construct its incidence graph G(Q). By Proposition 2, Q is symmetric iff G(Q) is symmetric. G(Q) is a labeled graph, but it is easy to ver- ify that labeled graph automorphism is many-one reducible to (ordinary) graph automorphism, which completes the proof. ⊓ ⊔ Proof. (of Corollary 1) The reductions used in Proposition 3 are parsimonious, so computing the symmetry degree of a CQ< is polynomial-time many-one equiva- lent to #GA3 . By results in [24], #GA is polynomial-time Turing equivalent to the graph isomorphism problem. ⊓ ⊔ Proof. (of Corollary 2) Since the number of self-joins is bounded, the times of labels that are used in its associated hypergraph are also bounded, which corresponds exactly to bounded color classes. Theorem 15 in [3] states that “Given a hypergraph X = (V, E) with color classes of size bounded by k, there is an nO(k) time algorithm that computes Aut(X) as a generating set in Sym(V ). In particular, HGI and HGA are in ptime.”, which completes the proof. ⊓ ⊔ Proof. (of Proposition 4) For CQ< (and hence CQ as well) it is easy to see this problem is in coNP because we can guess a nontrivial endomorphism candidate and check in polynomial time if it is an endomorphism. If so, the query is not 3 #GA is the problem of computing the order of graph automorphism group of a given graph. It is polynomial-time Turing equivalent to #GI and GI [24], where #GI is short for the counting the number of graph isomorphisms from one graph to the other. 14 rigid. Goralcik et al. [11] showed that the problem of whether a finite algebra A is nonrigid is NP-complete as soon as the type of A has either one binary or two unary symbols. This result can be applied in particular to the graph algebra associated with a directed graph [26] to show that checking rigidity of directed graphs is also coNP-complete. Since a directed graph can be transformed to a Boolean CQ using a single binary predicate, such that the graph is rigid iff the associated query is rigid, it follows that checking rigidity of CQs (and hence also CQ< s) is coNP-complete. Proof. (of Lemma 2) We can find a homomorphism h from QR to QR L by using this NP oracle. We can also enumerate all the automorphisms of Q by this NP oracle in polynomial time since it has polynomial symmetry degree. Using similar argument as in Lemma 1, by composing h by the elements in Aut(QR ) one by one and applying to L and then see if it is compatible with C, we can get the number of the linearizations in lin(Q) that are isomorphic to QL . ⊓ ⊔ Proof. (of Theorem 5) Use the same argument as in Theorem 2, but instead of a GI oracle, we use an NP oracle as Lemma 2 suggests. This yields a coNPNP = Π2p upper bound. ⊓ ⊔ Proof. (of Theorem 6) Assuming that two rigid queries Q and Q′ are set equiv- alent, then there is only one homomorphism mapping ψ from Q′R to QR and only one homomorphism mapping ϕ from QR to Q′R , i.e. QR and Q′R are ho- momorphically equivalent. But Q and Q′ are rigid, hence cores, and if two cores are homomorphically equivalent then they are isomorphic. Since Q′R and QR are cores, ψ is a unique isomorphism from Q′R to QR . Now the only problem is to find out this unique isomorphism mapping, and check the conditions of these queries are compatible under this mapping which is in polynomial time. By [24], asymmetric graph isomorphism is polynomial-time Turing equivalent to the graph automorphism problem. Since rigid graphs are subclass of asymmetric graphs, the set equivalence problem is polynomial-time Turing reducible to the graph automorphism problem. ⊓ ⊔