<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Efficient Evaluation of Well-designed Pattern Trees (Extended Abstract)?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pablo Barcelo´</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reinhard Pichler</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Skritek</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Semantic Web Research &amp; Department of Computer Science, University of Chile</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Informatics, Vienna University of Technology</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Conjunctive queries (CQs) constitute the core of the query languages for relational databases and also the most intensively studied querying mechanism in the database theory community. But CQs suffer from a serious drawback when dealing with incomplete information: If it is not possible to match the complete query with the data, they return no answer at all. The semantic web therefore provides a formalism - known as well-designed pattern trees (WDPTs) - that tackles this problem. In particular, WDPTs allow us to match patterns over the data if available, but do not fail to give an answer otherwise. Here, we abstract away the specifics of semantic web applications and study WDPTs over arbitrary relational schemas. Since our language properly subsumes the class of CQs, the evaluation problem associated with it is intractable. In this paper we identify natural structural properties of WDPTs that lead to tractability of various variants of the evaluation problem.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Conjunctive queries (CQs) constitute the core of the query languages for relational
databases and also the most intensively studied querying mechanism in the database
theory community. But CQs suffer from a serious drawback when dealing with
incomplete information: they fail to provide an answer when the pattern described by the
query cannot be matched completely into the data.</p>
      <p>The semantic web therefore provides formalisms to overcome this problem. One
simple such formalism corresponds to the fAND,OPTg-fragment of SPARQL – the
standard query language for RDF, the semantic web data model. The OPT-operator
extends the AND-operator by the possibility to return partial answers. I.e., instead of
returning no answer at all if not the complete query can be matched into the data, it
allows to match parts of the query. Pe´rez et al. noticed that a non-constrained interaction
of these two operators may lead to undesired behavior [11]. This motivated the
definition of a better behaved syntactic restriction of the language, known as well-designed
fAND,OPTg-SPARQL. Queries in this fragment allow for a natural tree
representation, called well-designed pattern trees (WDPTs) [10]. Here, we abstract away from the
specifics of RDF and define WDPTs over arbitrary relational schemas.</p>
      <p>Despite the importance of WDPTs, very little is known about some fundamental
problems related to them. In particular, no in-depth study has been carried out regarding
? This is an extended abstract of [3]
efficient evaluation of these queries, a problem that permeates the literature on CQs and
its extensions [13, 8, 9]. The main goal of this work is to initiate a systematic study of
tractable fragments of WDPTs for the different variants of query evaluation that have
been studied in the literature. We explain this in more detail below.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Well-designed Pattern Trees</title>
      <p>We define the class of WDPTs below. Intuitively, the nodes of a WDPT represent CQs
(called “basic graph patterns” in the semantic web context) while the nesting of optional
matching is represented by the tree structure of a WDPT.</p>
      <p>A well-designed pattern tree (WDPT) over schema is a pair (T; ; x), such that:
1. T is a rooted tree and maps each node t in T to a set of relational atoms over .
2. For every variable y mentioned in T , the set of nodes where y occurs is connected.
3. The tuple x of distinct variables from T denotes the free variables of the WDPT.
We say that (T; ; x) is projection-free, if x contains all variables mentioned in T .</p>
      <p>Assume p = (T; ; x) is a WDPT over . We write r to denote the root of
T . Given a subtree T 0 of T rooted in r, we define qT 0 to be the CQ Ans(y)
R1(v1); : : : ; Rm(vm), where fR1(v1); : : : ; Rm(vm)g = St2T 0 (t), and y are all the
variables that are mentioned in T 0. That is, all variables in qT 0 appear free.</p>
      <p>We define the semantics of WDPTs by naturally extending their interpretation
under semantic web vocabularies [10]. The intuition behind the semantics of a WDPT
(T; ; x) is as follows. A mapping h is an answer to (T; ) over a database D, if it is
“maximal” among the mappings that satisfy the patterns qT 0 defined by the subtrees T 0
of T . This means, h is a solution to qT 0 and there is no way to “extend” h to a solution
of some qT 00 for some bigger subtree T 00 of T . The evaluation of a WDPT (T; ; x)
over D corresponds then to the projection over the variables in x of the mappings h that
satisfy (T; ) over D. We formalize this next: Let D be a database and p = (T; ; x) a
WDPT over schema . Assume that dom(D) is the set of elements in the active domain
of D and X are the variables mentioned in p. Then:
– A homomorphism from p to D is a partial mapping h : X ! dom(D), for which it
is the case that there is a subtree T 0 of T rooted in r such that h is a homomorphism
from the CQ qT 0 to D.
– The homomorphism h is maximal if there is no homomorphism h0 from p to D
such that h0 extends h.</p>
      <p>If h is a homomorphism from p = (T; ; x) to D we denote by hx the restriction
of h to the variables in x. The evaluation of p over D, denoted p(D), corresponds to all
mappings of the form hx, such that h is a maximal homomorphism from p to D.</p>
      <p>Notice that WDPTs properly extend CQs. In fact, assume q(x) is a CQ of the form
Ans(x) R1(v1); : : : ; Rm(vm). Then q(x) is equivalent to WDPT p = (T; ; x),
where T consists of a single node r and (r) = fR1(v1); : : : ; Rm(vm)g. In other
words, q(D) = p(D), for each database D.</p>
    </sec>
    <sec id="sec-3">
      <title>Efficient evaluation of WDPTs</title>
      <sec id="sec-3-1">
        <title>Evaluation of WDPTs:</title>
        <p>We study the complexity of the evaluation problem EVAL(C) for different classes C of
WDPTs. This problem is formally defined as follows: Given a database D and a WDPT
p over , as well as a partial mapping h : X ! dom(D), where X is the set of variables
mentioned in p, is it the case that h belongs to p(D)?</p>
        <p>The complexity of EVAL(C) has been studied for the case when C is the class Call
of all WDPTs or the class Cpf of projection-free WDPTs. In particular, EVAL(Call) is
2P -complete [10] and EVAL(Cpf ) is CONP-complete [11]. This raises the need for
understanding which classes of WDPTs can be evaluated in polynomial time.</p>
        <p>Evaluation of WDPTs is defined in terms of CQ evaluation, which is an intractable
problem in general. Therefore, our goal of identifying tractable classes of WDPTs
naturally calls for a restriction of the classes of CQ patterns allowed in them. In particular,
there has been a flurry of activity around the topic of determining which classes of CQs
admit efficient evaluation that could be reused in our scenario [13, 8, 9]. These include
classes of bounded treewidth [6], hypertreewidth [9], etc. We concentrate on the first
two. From now on, we denote by TW(k) (resp., HW(k)), for k 1, the class of CQs
of treewidth (resp., hypertreewidth) at most k.</p>
        <p>A condition that has been shown to help identifying relevant tractable fragments of
WDPTs is local tractability [10]. This refers to restricting the CQ defined by each node
in a WDPT to belong to a tractable class. Formally, let C be either TW(k) or HW(k),
for k 1. A WDPT (T; ; x) is locally in C, if for each node t 2 T such that (t) =
fR1(v1); : : : ; Rm(vm)g the CQ Ans() R1(v1); : : : ; Rm(vm) is in C. We write `-C
for the set of all WDPTs that are locally in C.</p>
        <p>It is known that local tractability leads to tractability of evaluation for
projectionfree WDPTs [10]. On the other hand, this result does not hold in the presence of
projection, even when C is of bounded treewidth. Formally, EVAL(`-TW(k)) and
EVAL(`HW(k)) are NP-complete for every k 1 [10].</p>
        <p>This raises the question of which further restrictions on WDPTs are needed to
achieve tractability. Here we identify a natural such restriction, called bounded
interface. Intuitively, this restricts the number of variables shared between a node in a WDPT
and its children. Formally, a WDPT (T; ; x) has c-bounded interface, for c 1, if for
each node t 2 T with children t1; : : : ; tk it is the case that the number of variables
that appear both in a relational atom in (t) and in a relational atom in (ti), for some
1 i k, is at most c. We denote by BI(c) the set of WDPTs of c-bounded interface.
Interestingly, similar restrictions on the number of variables shared by different atoms
of CQs have been recently applied for obtaining reasonable bounds for the problem of
containment of Datalog into unions of CQs [5]. One of our main results states that local
tractability and bounded interface yield tractability of WDPT evaluation:</p>
        <sec id="sec-3-1-1">
          <title>Theorem 1. Let C be TW(k) or HW(k) and c</title>
          <p>PTIME.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>1. Then EVAL(`-C \ BI(c)) is in</title>
          <p>Notice that CQs are a special case of WDPTs consisting of the root node only. Hence,
TW(k) `-TW(k) \ BI(c) and HW(k) `-HW(k) \ BI(c) hold for each c 1.
Therefore, Theorem 1 tells us that `-TW(k)\BI(c) and `-HW(k)\BI(c) define relevant
extensions of TW(k) and HW(k), respectively, that preserve tractability of evaluation.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Partial evaluation of WDPTs:</title>
        <p>
          Given the nature of WDPTs, it is also interesting to check whether a mapping h is
a partial answer to the WDPT p over D [
          <xref ref-type="bibr" rid="ref1">11, 1</xref>
          ], i.e., whether h can be extended to
some answer h0 to p over D. This gives rise to the partial evaluation problem
PARTIALEVAL(C) for a class C of WDPTs defined as follows: Given a database D and a WDPT
p 2 C over , as well as a partial mapping h : X ! U, where X is the set of variables
mentioned in p, does there exists some h0 2 p(D) such that h0 extends h?
        </p>
        <p>If projection is allowed, then partial evaluation is NP-complete even under local
tractability, i.e., even for the classes `-TW(k) and `-HW(k), for each k 1 [10].</p>
        <p>It is easy to modify the proof of Theorem 1 to show that adding bounded
interface to local tractability yields efficient partial evaluation; that is,
PARTIAL-EVAL(`TW(k) \ BI(c)) and PARTIAL-EVAL(`-HW(k) \ BI(c)) are in PTIME. However,
partial evaluation is seemingly easier than exact evaluation. Hence, the question naturally
arises if tractability of partial evaluation of WDPTs can be ensured by a weaker
condition. Indeed, we give a positive answer to this question below. This condition will be
referred to as global tractability. Intuitively, it states that there is a bound on the treewidth
(resp., hypertreewidth) of the CQs defined by the different subtrees of a WDPT (T; ; x)
rooted in r. Formally, let C be TW(k) or HW(k), for k 1. A WDPT (T; ; x) is
globally in C, if for each subtree T 0 of T rooted in r it is the case that the CQ qT 0 is in C.
We denote by g-C the set of all WDPTs that are globally in C.</p>
        <p>The following proposition formally states that global tractability is a strictly weaker
condition than the conjunction of local tractability and bounded interface.
Proposition 1. 1. Let k; c 1. Then `-TW(k) \ BI(c) g-TW(k + 2c) and
`</p>
        <p>HW(k) \ BI(c) g-HW(k + 2c).
2. For every k 1 there is a family Ck of WDPTs in g-TW(k) (resp., in g-HW(k))
such that Ck 6 BI(c), for each c 1.</p>
        <p>We can now formally state that global tractability leads to tractability of the partial
evaluation problem for WDPTs:
Theorem 2. PARTIAL-EVAL(g-TW(k)) and PARTIAL-EVAL(g-HW(k)) are in PTIME
for every k 1.</p>
        <p>It remains to answer the question if global tractability also suffices to ensure
tractability of (exact) evaluation for WDPTs. It turns out that this is not the case.
Proposition 2. EVAL(g-TW(k)) and EVAL(g-HW(k)) are NP-complete for all k
1.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Semantics based on maximal mappings:</title>
        <p>
          The semantics of projection-free WDPTs is only based on maximal mappings, i.e.,
mappings that are not subsumed by any other mapping in the answer. This is no longer the
case in the presence of projection [10]. Recent work on query answering for SPARQL
under entailment regimes has established the need for a semantics for WDPTs that is
uniquely based on maximal mappings [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This semantics is formalized as follows.
Assume D is a database and p is a WDPT over . The evaluation of p over D under
maximal mappings, denoted pm(D), corresponds to the restriction of p(D) to those
mappings h 2 p(D) that are not extended by any other mapping h0 2 p(D). This
naturally leads to the decision problem MAX-EVAL(C) defined as follows: Given a database
D and a WDPT p 2 C over , as well as a partial mapping h : X ! U, where X is the
set of variables mentioned in p, is h 2 pm(D)?
        </p>
        <p>
          It follows from [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] that MAX-EVAL(C) is clearly intractable for the class C of all
WDPTs. Analogously to PARTIAL-EVAL, local tractability is not sufficient to ensure
tractability of MAX-EVAL:
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Proposition 3. For every k</title>
        <p>EVAL(`-HW(k)) are NP-hard.</p>
        <p>1 the problems MAX-EVAL(`-TW(k)) and
MAXTo obtain tractability in this case it is however sufficient to impose global tractability,
which is exactly the same condition that yields tractability of partial evaluation for
WDPTs (as stated in Theorem 2):
Theorem 3. MAX-EVAL(g-TW(k)) and MAX-EVAL(g-HW(k)) are in PTIME for
every k 1.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Further Results</title>
      <p>
        Taking these results as a starting point, we were also able to show that in several cases
the complexity of static analysis tasks, like deciding containment and equivalence [12],
decreases. Next, we also studied the problem of testing if some WDPT is equivalent to
one from a tractable class (cf. e.g. [
        <xref ref-type="bibr" rid="ref2">2, 4, 7</xref>
        ]), and showed that evaluating such queries
is fixed-parameter tractable w.r.t. the size of the query. Finally, we also studied the
problem of approximating WDPTs by one from a tractable class (a problem that is now
well-understood in the context of CQs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]).
      </p>
      <p>Acknowledgments. The work of Pablo Barcelo´ is funded by the Millenium Nucleus
Center for Semantic Web Research under grant NC120004. Part of this work was done
while Reinhard Pichler and Sebastian Skritek were visiting Pablo Barcelo´ on invitation
by the Millenium Nucleus Center for Semantic Web Research. Reinhard Pichler and
Sebastian Skritek were supported by the Vienna Science and Technology Fund (WWTF)
through project ICT12-015 and by the Austrian Science Fund (FWF):P25207-N23.
3. P. Barcelo, R. Pichler, and S. Skritek. Efficient evaluation and approximation of
welldesigned pattern trees. In PODS’15, 2015. Accepted for publication.
4. P. Barcelo´, M. Romero, and M. Y. Vardi. Semantic acyclicity on graph databases. In</p>
      <p>PODS’13, pages 237–248, 2013.
5. P. Barcelo´, M. Romero, and M. Y. Vardi. Does query evaluation tractability help query
containment? In PODS’14, pages 188–199, 2014.
6. C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. Theor. Comput.</p>
      <p>Sci., 239(2):211–229, 2000.
7. V. Dalmau, P. G. Kolaitis, and M. Y. Vardi. Constraint satisfaction, bounded treewidth, and
finite-variable logics. In CP’02, pages 310–326, 2002.
8. G. Gottlob, N. Leone, and F. Scarcello. The complexity of acyclic conjunctive queries. J.</p>
      <p>ACM, 48(3):431–498, 2001.
9. G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable queries. J.</p>
      <p>Comput. Syst. Sci., 64(3):579–627, 2002.
10. A. Letelier, J. Pe´rez, R. Pichler, and S. Skritek. Static analysis and optimization of semantic
web queries. ACM Trans. Database Syst., 38(4):25, 2013.
11. J. Pe´rez, M. Arenas, and C. Gutierrez. Semantics and complexity of SPARQL. ACM Trans.</p>
      <p>Database Syst., 34(3), 2009.
12. R. Pichler and S. Skritek. Containment and equivalence of well-designed SPARQL. In</p>
      <p>PODS’14, pages 39–50, 2014.
13. M. Yannakakis. Algorithms for acyclic database schemes. In VLDB’81, pages 82–94, 1981.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmetaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fischl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skritek</surname>
          </string-name>
          .
          <article-title>Towards reconciling sparql and certain answers</article-title>
          .
          <source>In WWW'15</source>
          ,
          <year>2015</year>
          .
          <article-title>Accepted for publication</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo´</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Romero</surname>
          </string-name>
          .
          <article-title>Efficient approximations of conjunctive queries</article-title>
          .
          <source>SIAM J. Comput.</source>
          ,
          <volume>43</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1085</fpage>
          -
          <lpage>1130</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>