Efficient Evaluation of Well-designed Pattern Trees
                    (Extended Abstract)?

                 Pablo Barceló1 , Reinhard Pichler2 , and Sebastian Skritek2
1
    Center for Semantic Web Research & Department of Computer Science, University of Chile
                  2
                    Faculty of Informatics, Vienna University of Technology


         Abstract. Conjunctive queries (CQs) constitute the core of the query languages
         for relational databases and also the most intensively studied querying mecha-
         nism in the database theory community. But CQs suffer from a serious drawback
         when dealing with incomplete information: If it is not possible to match the com-
         plete query with the data, they return no answer at all. The semantic web there-
         fore provides a formalism - known as well-designed pattern trees (WDPTs) - that
         tackles this problem. In particular, WDPTs allow us to match patterns over the
         data if available, but do not fail to give an answer otherwise. Here, we abstract
         away the specifics of semantic web applications and study WDPTs over arbitrary
         relational schemas. Since our language properly subsumes the class of CQs, the
         evaluation problem associated with it is intractable. In this paper we identify nat-
         ural structural properties of WDPTs that lead to tractability of various variants of
         the evaluation problem.


1     Introduction

Conjunctive queries (CQs) constitute the core of the query languages for relational
databases and also the most intensively studied querying mechanism in the database
theory community. But CQs suffer from a serious drawback when dealing with incom-
plete information: they fail to provide an answer when the pattern described by the
query cannot be matched completely into the data.
    The semantic web therefore provides formalisms to overcome this problem. One
simple such formalism corresponds to the {AND,OPT}-fragment of SPARQL – the
standard query language for RDF, the semantic web data model. The OPT-operator
extends the AND-operator by the possibility to return partial answers. I.e., instead of
returning no answer at all if not the complete query can be matched into the data, it al-
lows to match parts of the query. Pérez et al. noticed that a non-constrained interaction
of these two operators may lead to undesired behavior [11]. This motivated the defini-
tion of a better behaved syntactic restriction of the language, known as well-designed
{AND,OPT}-SPARQL. Queries in this fragment allow for a natural tree representa-
tion, called well-designed pattern trees (WDPTs) [10]. Here, we abstract away from the
specifics of RDF and define WDPTs over arbitrary relational schemas.
    Despite the importance of WDPTs, very little is known about some fundamental
problems related to them. In particular, no in-depth study has been carried out regarding
?
    This is an extended abstract of [3]
efficient evaluation of these queries, a problem that permeates the literature on CQs and
its extensions [13, 8, 9]. The main goal of this work is to initiate a systematic study of
tractable fragments of WDPTs for the different variants of query evaluation that have
been studied in the literature. We explain this in more detail below.


2     Well-designed Pattern Trees

We define the class of WDPTs below. Intuitively, the nodes of a WDPT represent CQs
(called “basic graph patterns” in the semantic web context) while the nesting of optional
matching is represented by the tree structure of a WDPT.
    A well-designed pattern tree (WDPT) over schema σ is a pair (T, λ, x̄), such that:

 1. T is a rooted tree and λ maps each node t in T to a set of relational atoms over σ.
 2. For every variable y mentioned in T , the set of nodes where y occurs is connected.
 3. The tuple x̄ of distinct variables from T denotes the free variables of the WDPT.

We say that (T, λ, x̄) is projection-free, if x̄ contains all variables mentioned in T .
    Assume p = (T, λ, x̄) is a WDPT over σ. We write r to denote the root of
T . Given a subtree T 0 of T rooted in r, we define qTS             0 to be the CQ Ans(ȳ) ←

R1 (v̄1 ), . . . , Rm (v̄m ), where {R1 (v̄1 ), . . . , Rm (v̄m )} = t∈T 0 λ(t), and ȳ are all the
variables that are mentioned in T 0 . That is, all variables in qT 0 appear free.
    We define the semantics of WDPTs by naturally extending their interpretation un-
der semantic web vocabularies [10]. The intuition behind the semantics of a WDPT
(T, λ, x̄) is as follows. A mapping h is an answer to (T, λ) over a database D, if it is
“maximal” among the mappings that satisfy the patterns qT 0 defined by the subtrees T 0
of T . This means, h is a solution to qT 0 and there is no way to “extend” h to a solution
of some qT 00 for some bigger subtree T 00 of T . The evaluation of a WDPT (T, λ, x̄)
over D corresponds then to the projection over the variables in x̄ of the mappings h that
satisfy (T, λ) over D. We formalize this next: Let D be a database and p = (T, λ, x̄) a
WDPT over schema σ. Assume that dom(D) is the set of elements in the active domain
of D and X are the variables mentioned in p. Then:

    – A homomorphism from p to D is a partial mapping h : X → dom(D), for which it
      is the case that there is a subtree T 0 of T rooted in r such that h is a homomorphism
      from the CQ qT 0 to D.
    – The homomorphism h is maximal if there is no homomorphism h0 from p to D
      such that h0 extends h.

    If h is a homomorphism from p = (T, λ, x̄) to D we denote by hx̄ the restriction
of h to the variables in x̄. The evaluation of p over D, denoted p(D), corresponds to all
mappings of the form hx̄ , such that h is a maximal homomorphism from p to D.
    Notice that WDPTs properly extend CQs. In fact, assume q(x̄) is a CQ of the form
Ans(x̄) ← R1 (v̄1 ), . . . , Rm (v̄m ). Then q(x̄) is equivalent to WDPT p = (T, λ, x̄),
where T consists of a single node r and λ(r) = {R1 (v̄1 ), . . . , Rm (v̄m )}. In other
words, q(D) = p(D), for each database D.
3     Efficient evaluation of WDPTs
3.1   Evaluation of WDPTs:
We study the complexity of the evaluation problem E VAL(C) for different classes C of
WDPTs. This problem is formally defined as follows: Given a database D and a WDPT
p over σ, as well as a partial mapping h : X → dom(D), where X is the set of variables
mentioned in p, is it the case that h belongs to p(D)?
    The complexity of E VAL(C) has been studied for the case when C is the class Call
of all WDPTs or the class Cpf of projection-free WDPTs. In particular, E VAL(Call ) is
Σ2P -complete [10] and E VAL(Cpf ) is CO NP-complete [11]. This raises the need for
understanding which classes of WDPTs can be evaluated in polynomial time.
    Evaluation of WDPTs is defined in terms of CQ evaluation, which is an intractable
problem in general. Therefore, our goal of identifying tractable classes of WDPTs nat-
urally calls for a restriction of the classes of CQ patterns allowed in them. In particular,
there has been a flurry of activity around the topic of determining which classes of CQs
admit efficient evaluation that could be reused in our scenario [13, 8, 9]. These include
classes of bounded treewidth [6], hypertreewidth [9], etc. We concentrate on the first
two. From now on, we denote by TW(k) (resp., HW(k)), for k ≥ 1, the class of CQs
of treewidth (resp., hypertreewidth) at most k.
    A condition that has been shown to help identifying relevant tractable fragments of
WDPTs is local tractability [10]. This refers to restricting the CQ defined by each node
in a WDPT to belong to a tractable class. Formally, let C be either TW(k) or HW(k),
for k ≥ 1. A WDPT (T, λ, x̄) is locally in C, if for each node t ∈ T such that λ(t) =
{R1 (v̄1 ), . . . , Rm (v̄m )} the CQ Ans() ← R1 (v̄1 ), . . . , Rm (v̄m ) is in C. We write `-C
for the set of all WDPTs that are locally in C.
    It is known that local tractability leads to tractability of evaluation for projection-
free WDPTs [10]. On the other hand, this result does not hold in the presence of pro-
jection, even when C is of bounded treewidth. Formally, E VAL(`-TW(k)) and E VAL(`-
HW(k)) are NP-complete for every k ≥ 1 [10].
    This raises the question of which further restrictions on WDPTs are needed to
achieve tractability. Here we identify a natural such restriction, called bounded inter-
face. Intuitively, this restricts the number of variables shared between a node in a WDPT
and its children. Formally, a WDPT (T, λ, x̄) has c-bounded interface, for c ≥ 1, if for
each node t ∈ T with children t1 , . . . , tk it is the case that the number of variables
that appear both in a relational atom in λ(t) and in a relational atom in λ(ti ), for some
1 ≤ i ≤ k, is at most c. We denote by BI(c) the set of WDPTs of c-bounded interface.
Interestingly, similar restrictions on the number of variables shared by different atoms
of CQs have been recently applied for obtaining reasonable bounds for the problem of
containment of Datalog into unions of CQs [5]. One of our main results states that local
tractability and bounded interface yield tractability of WDPT evaluation:

Theorem 1. Let C be TW(k) or HW(k) and c ≥ 1. Then E VAL(`-C ∩ BI(c)) is in
P TIME.

Notice that CQs are a special case of WDPTs consisting of the root node only. Hence,
TW(k) ⊆ `-TW(k) ∩ BI(c) and HW(k) ⊆ `-HW(k) ∩ BI(c) hold for each c ≥ 1.
Therefore, Theorem 1 tells us that `-TW(k)∩BI(c) and `-HW(k)∩BI(c) define relevant
extensions of TW(k) and HW(k), respectively, that preserve tractability of evaluation.

3.2   Partial evaluation of WDPTs:
Given the nature of WDPTs, it is also interesting to check whether a mapping h is
a partial answer to the WDPT p over D [11, 1], i.e., whether h can be extended to
some answer h0 to p over D. This gives rise to the partial evaluation problem PARTIAL -
E VAL(C) for a class C of WDPTs defined as follows: Given a database D and a WDPT
p ∈ C over σ, as well as a partial mapping h : X → U, where X is the set of variables
mentioned in p, does there exists some h0 ∈ p(D) such that h0 extends h?
     If projection is allowed, then partial evaluation is NP-complete even under local
tractability, i.e., even for the classes `-TW(k) and `-HW(k), for each k ≥ 1 [10].
     It is easy to modify the proof of Theorem 1 to show that adding bounded inter-
face to local tractability yields efficient partial evaluation; that is, PARTIAL -E VAL(`-
TW(k) ∩ BI(c)) and PARTIAL -E VAL(`-HW(k) ∩ BI(c)) are in P TIME. However, par-
tial evaluation is seemingly easier than exact evaluation. Hence, the question naturally
arises if tractability of partial evaluation of WDPTs can be ensured by a weaker condi-
tion. Indeed, we give a positive answer to this question below. This condition will be re-
ferred to as global tractability. Intuitively, it states that there is a bound on the treewidth
(resp., hypertreewidth) of the CQs defined by the different subtrees of a WDPT (T, λ, x̄)
rooted in r. Formally, let C be TW(k) or HW(k), for k ≥ 1. A WDPT (T, λ, x̄) is glob-
ally in C, if for each subtree T 0 of T rooted in r it is the case that the CQ qT 0 is in C.
We denote by g-C the set of all WDPTs that are globally in C.
     The following proposition formally states that global tractability is a strictly weaker
condition than the conjunction of local tractability and bounded interface.

Proposition 1. 1. Let k, c ≥ 1. Then `-TW(k) ∩ BI(c) ⊆ g-TW(k + 2c) and `-
    HW(k) ∩ BI(c) ⊆ g-HW(k + 2c).
 2. For every k ≥ 1 there is a family Ck of WDPTs in g-TW(k) (resp., in g-HW(k))
    such that Ck 6⊆ BI(c), for each c ≥ 1.

    We can now formally state that global tractability leads to tractability of the partial
evaluation problem for WDPTs:

Theorem 2. PARTIAL -E VAL(g-TW(k)) and PARTIAL -E VAL(g-HW(k)) are in P TIME
for every k ≥ 1.

    It remains to answer the question if global tractability also suffices to ensure
tractability of (exact) evaluation for WDPTs. It turns out that this is not the case.

Proposition 2. E VAL(g-TW(k)) and E VAL(g-HW(k)) are NP-complete for all k ≥ 1.

3.3   Semantics based on maximal mappings:
The semantics of projection-free WDPTs is only based on maximal mappings, i.e., map-
pings that are not subsumed by any other mapping in the answer. This is no longer the
case in the presence of projection [10]. Recent work on query answering for SPARQL
under entailment regimes has established the need for a semantics for WDPTs that is
uniquely based on maximal mappings [1]. This semantics is formalized as follows. As-
sume D is a database and p is a WDPT over σ. The evaluation of p over D under
maximal mappings, denoted pm (D), corresponds to the restriction of p(D) to those
mappings h ∈ p(D) that are not extended by any other mapping h0 ∈ p(D). This natu-
rally leads to the decision problem M AX -E VAL(C) defined as follows: Given a database
D and a WDPT p ∈ C over σ, as well as a partial mapping h : X → U, where X is the
set of variables mentioned in p, is h ∈ pm (D)?
    It follows from [1] that M AX -E VAL(C) is clearly intractable for the class C of all
WDPTs. Analogously to PARTIAL -E VAL, local tractability is not sufficient to ensure
tractability of M AX -E VAL:
Proposition 3. For every k ≥ 1 the problems M AX -E VAL(`-TW(k)) and M AX -
E VAL(`-HW(k)) are NP-hard.
To obtain tractability in this case it is however sufficient to impose global tractability,
which is exactly the same condition that yields tractability of partial evaluation for
WDPTs (as stated in Theorem 2):

Theorem 3. M AX -E VAL(g-TW(k)) and M AX -E VAL(g-HW(k)) are in P TIME for ev-
ery k ≥ 1.


4   Further Results

Taking these results as a starting point, we were also able to show that in several cases
the complexity of static analysis tasks, like deciding containment and equivalence [12],
decreases. Next, we also studied the problem of testing if some WDPT is equivalent to
one from a tractable class (cf. e.g. [2, 4, 7]), and showed that evaluating such queries
is fixed-parameter tractable w.r.t. the size of the query. Finally, we also studied the
problem of approximating WDPTs by one from a tractable class (a problem that is now
well-understood in the context of CQs [2]).
Acknowledgments. The work of Pablo Barceló is funded by the Millenium Nucleus
Center for Semantic Web Research under grant NC120004. Part of this work was done
while Reinhard Pichler and Sebastian Skritek were visiting Pablo Barceló on invitation
by the Millenium Nucleus Center for Semantic Web Research. Reinhard Pichler and Se-
bastian Skritek were supported by the Vienna Science and Technology Fund (WWTF)
through project ICT12-015 and by the Austrian Science Fund (FWF):P25207-N23.


References
 1. S. Ahmetaj, W. Fischl, R. Pichler, M. Simkus, and S. Skritek. Towards reconciling sparql
    and certain answers. In WWW’15, 2015. Accepted for publication.
 2. P. Barceló, L. Libkin, and M. Romero. Efficient approximations of conjunctive queries.
    SIAM J. Comput., 43(3):1085–1130, 2014.
 3. P. Barcelo, R. Pichler, and S. Skritek. Efficient evaluation and approximation of well-
    designed pattern trees. In PODS’15, 2015. Accepted for publication.
 4. P. Barceló, M. Romero, and M. Y. Vardi. Semantic acyclicity on graph databases. In
    PODS’13, pages 237–248, 2013.
 5. P. Barceló, M. Romero, and M. Y. Vardi. Does query evaluation tractability help query
    containment? In PODS’14, pages 188–199, 2014.
 6. C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. Theor. Comput.
    Sci., 239(2):211–229, 2000.
 7. V. Dalmau, P. G. Kolaitis, and M. Y. Vardi. Constraint satisfaction, bounded treewidth, and
    finite-variable logics. In CP’02, pages 310–326, 2002.
 8. G. Gottlob, N. Leone, and F. Scarcello. The complexity of acyclic conjunctive queries. J.
    ACM, 48(3):431–498, 2001.
 9. G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable queries. J.
    Comput. Syst. Sci., 64(3):579–627, 2002.
10. A. Letelier, J. Pérez, R. Pichler, and S. Skritek. Static analysis and optimization of semantic
    web queries. ACM Trans. Database Syst., 38(4):25, 2013.
11. J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. ACM Trans.
    Database Syst., 34(3), 2009.
12. R. Pichler and S. Skritek. Containment and equivalence of well-designed SPARQL. In
    PODS’14, pages 39–50, 2014.
13. M. Yannakakis. Algorithms for acyclic database schemes. In VLDB’81, pages 82–94, 1981.