<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Complexity of Enumerating the Answers to Well-Designed Pattern Trees (Ext. Abstract)?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Markus Kroll</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reinhard Pichler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Skritek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Information Systems</institution>
          ,
          <addr-line>TU Wien</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the steadily increasing amount of incomplete data on the web, the need for partial matching as an extension of Conjunctive Queries (CQs) is gaining more and more importance. Therefore, the OPTIONAL operator is a crucial feature in the semantic web query language SPARQL, and has also been studied for arbitrary relational vocabulary recently [3]. Intuitively, this operator (which corresponds to the left outer join in the Relational Algebra) allows the user to extend CQs by optional parts for which answers are retrieved from the data if available, but which do not cause the query to return no answer at all otherwise. In [9], based on a query fragment with desirable properties presented in [10], so-called well-designed pattern trees (wdPTs) were introduced as a convenient graphical representation of CQs extended by this optional matching feature. Intuitively, the nodes in a wdPT correspond to CQs while the tree structure represents the optional extensions. Several computational problems of wdPTs have been studied in recent years, such as the evaluation problem, the counting problem, as well as static analysis tasks including the containment and equivalence problems. In contrast, despite recent interest in the enumeration problem (i.e., computing one solution after the other without outputting duplicates) for FO queries and CQs [5, 2, 11], this natural problem has been hardly considered for wdPTs so far. A notable exception is [9]. Similar ideas were pursued in [4] for the enumeration of the answers to full disjunctions. However, being associative and commutative, full disjunctions di er signi cantly from wdPTs. Hence, the results in [4] do not apply to wdPTs. In this work, we embark on a systematic study of the complexity of the enumeration problem of wdPTs. We identify several tractable and intractable cases of this problem both from a classical complexity point of view and from a parameterized complexity point of view.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>A well-designed pattern tree (wdPT) p over a schema
is a triple (T; ; x) s.t.:
1. T is a rooted tree;</p>
      <p>
        maps each node in T to a set of relational atoms over .
? This is an extended abstract of a paper published at ICDT 2016 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
2. For every variable y from T , the set of nodes where y occurs is connected.
3. The tuple x of distinct variables from T denotes the free variables of p.
Assume p = (T; ; x) is a wdPT over . We write r to denote the root of
T . For a subtree T 0 of T rooted in r, we de ne qT 0 to be the CQ Ans(y)
R1(v1); : : : ; Rm(vm), where fR1(v1); : : : ; Rm(vm)g = St2T 0 (t), and y are all
the variables that are mentioned in T 0. That is, all variables in qT 0 appear free.
      </p>
      <p>The intuition behind the semantics of a wdPT p = (T; ; x) is as follows. A
mapping h is an answer to (T; ) over a database D, if h is a solution to qT 0
and there is no way to \extend" h to a solution of some qT 00 , where T 0, T 00 are
subtrees of T rooted in r. The evaluation of p over D corresponds then to the
projection of the answers to (T; ) over D to x. Formally: Let D be a database
and p = (T; ; x) a wdPT over schema . Let dom(D) be the set of elements in
the active domain of D and X the variables mentioned in p. Then:
{ A homomorphism from p to D is a partial mapping h : X ! dom(D), for
which it is the case that there is a subtree T 0 of T rooted in r such that h is
a homomorphism from the CQ qT 0 to D.
{ The homomorphism h is maximal if there is no homomorphism h0 from p to
D such that h0 extends h.</p>
      <p>If h is a homomorphism from p to D, let hx denote the restriction of h to
the variables in x. The evaluation of p over D, denoted p(D), corresponds to all
mappings of the form hx, such that h is a maximal homomorphism from p to D.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Enumeration of wdPTs</title>
      <p>
        Let C be a class of wdPTs. Given a wdPT p 2 C and a database D, we denote
with Enum(C) the problem to enumerate p(D). Because of projection, it may
happen that both, a mapping h and a proper extension of h are in p(D). However,
in some settings only the maximal solutions are of interest, e.g. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We thus also
study the problem Max-Enum(C) of enumerating pm(D), the set of maximal
solutions in p(D).
      </p>
      <p>
        We aim at identifying the boundary between tractability and intractability
for these problems. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], various notions of tractable enumeration are de ned.
We concentrate on two such notions, namely polynomial delay (the time until the
next output or termination is bounded by a polynomial in the size of the input)
and output polynomial time (the total runtime of the algorithm is bounded by
a polynomial in the size of the input plus the output). We denote with DelayP
and OutputP the corresponding enumeration complexity classes.
      </p>
      <p>
        The following decision problem ExtSol(C) (where C is a class of wdPTs)
is closely related to the enumeration problems de ned above: Given a wdPT
p = (T; ; x) 2 C, a database D, a partial mapping h and a set x0 x, does
there exist some mapping h0 2 p(D) s.t. h h0 and dom(h0) \ x0 = ;? The
relationship with enumeration is given by the following property.
`-TW(k)\ BI(c) g-TW(k)\ SBI(c) `-TW(k)\ SBI(c)
in P [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in FPT W[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]-complete
in P [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in P [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] W[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]-hard
      </p>
      <p>DelayP
not OutputP1</p>
      <p>DelayP
DelayFPT</p>
      <p>open
not OutputP1</p>
      <p>
        DelayFPT
DelayFPT
not OutputP2
not OutputP1
not OutputFPT2 not OutputFPT2
not OutputFPT2 DelayFPT
Lemma 1. Let C be a class of wdPTs, p = (T; ; x) 2 C, and D a database.
If there is a computable function f s.t. for every partial mapping and every
subset x0 of x, the problem ExtSol(C) can be decided in O(f (jpj; jDj)), then
there exists an algorithm enumerating p(D) with delay O(f (jpj; jDj) jDj jpj).
Also ExtSol(C) is a generalization of the evaluation problems (Max-)Eval(C):
Given p 2 C, a database D, and a mapping h, is h 2 p(D) (or pm(D))? Since
these problems are known to be intractable in general, so is ExtSol(C). As a
result, the evaluation problem for fragments of wdPTs has been studied in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
We use the same classes of wdPTs, and also introduce further restrictions to get
a better understanding of the sources of complexity of enumeration.
      </p>
      <p>Fragments of wdPTs have been de ned in two di erent ways: On the one hand
by restrictions on some associated CQs, and on the other hand by restricting the
number of variables shared between nodes. Let TW(k) be the class of CQs with
bounded treewidth. A wdPT p = (T; ; x) is in the class `-TW(k) if for every node
n in T the CQ Ans() (n) is in TW(k), and in the class g-TW(k) if for every
subtree T 0 of T rooted in r the CQ qT 0 is in TW(k). Furthermore, we say p is in
BI(c) if every node n in T contains at most c variables that also occur elsewhere
in T , while p is in SBI(c) if no two nodes in T share more than c variables.</p>
      <p>Table 1 summarizes our results. Observe that beside the classical
complexity, we also study the parameterized complexity of the enumeration problems.
I.e., p-Enum(C) and p-Max-Enum(C) denote the parameterized variants (with
jpj as the parameter) of Enum(C) and Max-Enum(C), respectively. Also,
DelayFPT and OutputFPT are de ned analogously to DelayP and OutputP. Since
the parameterized complexity of wdPTs has not yet been studied, by p-Eval(C)
and p-Max-Eval(C) we also consider the parameterized variant of the common
evaluation problem for wdPTs, i.e., given p; D; h, is h 2 p(D) (resp. pm(D))?</p>
      <p>
        We would like to point out that by introducing the class SBI(c) and
performing a parameterized complexity analysis, we are able to draw a much more
ne grained picture of the complexity of the decision problem. So far, only
Eval(`-TW(k) \ BI(c)) and Eval(g-TW(k)) have been known to be tractable
and NP-complete, respectively [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        For the enumeration problems, we observe a surprising discrepancy between
enumerating all answers and enumerating only the maximal ones. For the former
setting, in most of the cases the corresponding decision problem is at least as hard
as for the second setting (this is also the case for the non-parameterized problem
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). A similar behaviour is observed for p-Enum(C) and p-Max-Enum(C). This,
however, completely changes for Enum(C) and Max-Enum(C), where
MaxEnum(C) is not even tractable in the most restricted case.
      </p>
      <p>In the remainder, we would like to sketch some of the tools used to show the
results in Table 1. Lemma 1 established a relationship between ExtSol(C) and
enumeration. This relationship is put to use by the following result.
Proposition 1. The following complexity results hold for ExtSol(C):
1. Let k; c 1 and C be a class of CQs for which the evaluation problem is in</p>
      <p>P. Then ExtSol(`-C \ BI(c)) is in P.
2. ExtSol(g-TW(k)) is NP-complete for every k 1.
3. For k; c 1, ExtSol(g-TW(k) \ SBI(c)) parameterized by jpj is in FPT.</p>
      <p>A useful tool to show negative results is Theorem 1. We call a class C robust
if it is closed under some simple transformations (like replacing all occurrences
of a variable by the same constant). All classes considered here are robust.
Theorem 1. Let C be a robust class of wdPTs. If p-Enum(C) is in OutputFPT,
then p-Eval(C) is in FPT.</p>
      <p>
        Another way of providing intractability results for enumeration problems is
by showing that, given a set of solutions, deciding if there exists another one
is not possible in polynomial time (cf. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). We used this relationship to show
the intractability of Max-Enum(C) (assuming P 6= NP) by proving the decision
problem to be NP-complete for C = `-TW(k) \ BI(c).
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Further Results and Future Work</title>
      <p>
        In addition to the results presented above, following an active line of research
[
        <xref ref-type="bibr" rid="ref11 ref2">2, 11</xref>
        ], we also had a brief look at the data complexity of the enumeration
problems. In this case, the goal is to nd an enumeration algorithm that works with
constant delay after some linear time preprocessing. We show that even for very
simple settings such algorithms are very unlikely to exist, as for certain wdPTs
without projection consisting of only two nodes, containing one respectively two
binary atoms. In the presence of projection, this already holds for wdPTs
consisting of two nodes with one atom each.
      </p>
      <p>Besides closing the open case in Table 1, future work also includes the search
for further tractable fragments. E.g., we do not know any interesting tractable
classes for Max-Enum(C) yet.</p>
      <p>Acknowledgments. This work was supported by the Vienna Science and
Technology Fund (WWTF) through project ICT12-015 and by the Austrian Science
Fund (FWF): P25207-N23. Markus Kroll was also supported by FWF project
W1255-N23.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmetaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fischl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skritek</surname>
          </string-name>
          .
          <article-title>Towards reconciling SPARQL and certain answers</article-title>
          .
          <source>In Proc. WWW</source>
          <year>2015</year>
          , pages
          <fpage>23</fpage>
          {
          <fpage>33</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>G.</given-names>
            <surname>Bagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Durand</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Grandjean</surname>
          </string-name>
          .
          <article-title>On acyclic conjunctive queries and constant delay enumeration</article-title>
          .
          <source>In Computer Science Logic</source>
          , pages
          <volume>208</volume>
          {
          <fpage>222</fpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Barcelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skritek</surname>
          </string-name>
          .
          <article-title>E cient evaluation and approximation of well-designed pattern trees</article-title>
          .
          <source>In Proc. PODS</source>
          <year>2015</year>
          , pages
          <fpage>131</fpage>
          {
          <fpage>144</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Fadida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kanza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kimelfeld</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sagiv</surname>
          </string-name>
          .
          <article-title>Full disjunctions: Polynomial-delay iterators in action</article-title>
          .
          <source>In Proc. VLDB</source>
          <year>2006</year>
          , pages
          <fpage>739</fpage>
          {
          <fpage>750</fpage>
          . ACM,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Durand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schweikardt</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Segou n.</surname>
          </string-name>
          <article-title>Enumerating answers to FO queries over databases of low degree</article-title>
          .
          <source>In Proc. PODS</source>
          <year>2014</year>
          , pages
          <fpage>121</fpage>
          {
          <fpage>131</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Papadimitriou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          .
          <article-title>On generating all maximal independent sets</article-title>
          .
          <source>Inf</source>
          . Process. Lett.,
          <volume>27</volume>
          (
          <issue>3</issue>
          ):
          <volume>119</volume>
          {
          <fpage>123</fpage>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Kimelfeld</surname>
          </string-name>
          and
          <string-name>
            <surname>P. G. Kolaitis.</surname>
          </string-name>
          <article-title>The complexity of mining maximal frequent subgraphs</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>39</volume>
          (
          <issue>4</issue>
          ):
          <volume>32</volume>
          :1{
          <fpage>32</fpage>
          :
          <fpage>33</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. M. Kroll, R. Pichler, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skritek</surname>
          </string-name>
          .
          <article-title>On the complexity of enumerating the answers to well-designed pattern trees</article-title>
          .
          <source>In Proc. ICDT</source>
          <year>2016</year>
          , volume
          <volume>48</volume>
          <source>of LIPIcs</source>
          , pages
          <volume>22</volume>
          :1{
          <fpage>22</fpage>
          :
          <fpage>18</fpage>
          .
          <string-name>
            <surname>Schloss</surname>
          </string-name>
          Dagstuhl - Leibniz-Zentrum fuer Informatik,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.</given-names>
            <surname>Letelier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skritek</surname>
          </string-name>
          .
          <article-title>Static analysis and optimization of semantic web queries</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>38</volume>
          (
          <issue>4</issue>
          ):
          <fpage>25</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>J. Perez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Arenas</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>Semantics and complexity of SPARQL</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>34</volume>
          (
          <issue>3</issue>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. L.
          <article-title>Segou n. Enumerating with constant delay the answers to a query</article-title>
          .
          <source>In Proc. ICDT</source>
          <year>2013</year>
          , pages
          <fpage>10</fpage>
          {
          <fpage>20</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>