=Paper=
{{Paper
|id=Vol-2954/abstract-23
|storemode=property
|title=Enumerating Answers to Ontology-Mediated Queries: Partial Answers and Efficiency (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-2954/abstract-23.pdf
|volume=Vol-2954
|authors=Carsten Lutz,Marcin Przybyłko
|dblpUrl=https://dblp.org/rec/conf/dlog/LutzP21
}}
==Enumerating Answers to Ontology-Mediated Queries: Partial Answers and Efficiency (Extended Abstract)==
Enumerating Answers to Ontology-Mediated Queries: Partial Answers and Efficiency (Extended Abstract) Carsten Lutz and Marcin Przybylko Department of Computer Science, University of Bremen, Germany Ontology-mediated query evaluation has mostly been studied in the form of single-testing: given an ontology-mediated query (OMQ) Q(x̄) = (O, S, q), a database D over schema S, and a candidate answer ā ∈ adom(D)|x̄| , decide whether ā ∈ Q(D) [2, 4, 7, 8]. From a practical perspective, however, it is often not realistic to assume that a candidate answer is available. This leads us to study answer enumeration where only Q and D are given as an input, and the task is to produce all answers without repetitions, in an unspecified order. More precisely, an enumeration algorithm works in two phases. In the preprocessing phase, the algorithm uses Q and D to construct a data structure to be used later on, but no output. In the enumeration phase, it uses the precomputed structure to output all tuples from Q(D). Related to enumeration is all-testing which initially gets the same two inputs and has the same preprocessing phase, followed by a testing phase where the algorithm repeatedly receives candidate answers ā ∈ adom(D)|x̄| as additional inputs and returns ‘yes’ or ’no’ depending on whether ā ∈ Q(D). These modes of query evaluation have been extensively studied in database theory, see for example [3, 6, 9–13, 15]. A case of particular importance is enu- meration in CD◦Lin, where the preprocessing takes time linear in the size of the database D and the delay between two answers is independent of D. Note that there is no restriction on how the running time of the preprocessing or how the delay depends on the OMQ Q. This corresponds to data complexity in single- testing where Q is fixed and thus of constant size. An excellent recent survey of the work on answer enumeration in database theory is [5]. We consider enumeration and all-testing for two kinds of answers: the tradi- tional certain answers, where ā ∈ Q(D) if and only if ā is a tuple of constants from D such that ā ∈ q(I) for every model I of D and O, and a novel notion of partial answers that is able to take into account fresh constants introduced by existential quantifiers in O (‘nulls’). We next define the latter. Fix a wildcard symbol ‘∗’ that cannot occur as a constant in a database. A wildcard tuple for a database D takes the form (c1 , . . . , cn ) ∈ (adom(D)∪{∗})n where n ≥ 0 and adom(D) denotes the set of constants used in D. For wildcard tuples c̄ = (c1 , . . . , cn ) and c̄0 = (c01 , . . . , c0n ), we write c̄ c̄0 if c0i ∈ {ci , ∗} for 1 ≤ i ≤ n. Moreover, c̄ ≺ c̄0 if c̄ c̄0 and c̄ 6= c̄0 . Intuitively, c̄ ≺ c̄0 expresses that tuple Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 C. Lutz and M. Przybylko c̄ carries more information than tuple c̄0 . For example, (a, b) ≺ (a, ∗) ≺ (∗, ∗). A partial answer to OMQ Q(x̄) = (O, S, q) on S-database D is a wildcard tuple c̄ for D of length |x̄| such that for every model I of D and O, there is a c̄0 ∈ q(I) such that c̄0 c̄. Note that some positions in c̄0 may contain constants that are not in adom(D), and that the corresponding positions in c̄ must have the wildcard. We say that a partial answer c̄ to Q on S-database D is a least partial answer if there is no partial answer c̄0 to Q on D with c̄0 ≺ c̄. We are then interested in enumerating the set Q(D)∗ of all least partial answers to Q on D. Example 1. Consider the ontology O that contains the TGDs Researcher(x) → ∃y HasOffice(x, y) HasOffice(x, y) → Office(y) Office(x) → ∃y InBuilding(x, y) the schema S that consists of all relation symbols in O, and the CQ q(x1 , x2 , x3 ) = Researcher(x1 ) ∧ HasOffice(x1 , x2 ) ∧ InBuilding(x2 , x3 ) giving rise to OMQ Q(x1 , x2 , x3 ) = (O, S, q). Further consider the database D = { Researcher(mary), Researcher(mike), HasOffice(mary, room1) } Then Q(D) = ∅, but Q(D)∗ = {(mary, room1, ∗), (mike, ∗, ∗)}. This abstract reports about the forthcoming article [14] where we consider guarded TGDs G and the description logic ELI as the ontology language and conjunctive queries (CQs) as the query language. Recall that, up to syntactic normalization, ELI is a fragment of G. Our main result is as follows where complete answers mean the traditional certain answers. Theorem 1. Let Q = (O, S, q) be an OMQ from the OMQ language (G, CQ). If Q is acyclic and free-connex, then the following problems are in CD◦Lin: 1. enumeration of complete answers and of least partial answers to Q; 2. all-testing of complete answers to Q. Let us clarify the notions used in Theorem 1. A CQ q(x̄) is acyclic if it has a join tree. An acyclic CQ q(x̄) is free-connex if it remains acyclic after adding an atom R(x̄) with R a fresh relation symbol of arity |x̄|. The results for complete answers in Theorem 1 are obtained by reduction to the case without ontologies whereas the result for least partial answers requires the design of a novel enumeration algorithm. Theorem 1 is accompanied by lower bounds that identify significant chal- lenges in extending enumeration in CD◦Lin beyond OMQs that satisfy the struc- tural properties mentioned in the theorem. As in the case without ontologies, these lower bounds (i) are conditional on certain assumptions whose failure would imply a remarkable advance in algorithm theory and (ii) do not result in fully fledged dichotomies as they rely on additional assumptions regarding the query. Enumerating Answers to OMQs 3 The triangle conjecture states that it is not possible, given an undirected graph G with m edges as an adjacency list, to decide in time O(m) whether G contains a triangle [1]. Sparse Boolean matrix multiplication means to compute, given two Boolean matrices A and B as a list of their non-zero entries, the non- zero entries of the matrix product AB over the Boolean semiring, see e.g. [16]. There is no known algorithm that solves sparse Boolean matrix multiplication in time O(m), m the sum of the numbers of non-zero entries of A, B, and AB. If such an algorithm exists, then finding it requires dramatic advances in algorithm theory. See e.g. [5] for more information. Theorem 2. Let Q = (O, S, q) be an OMQ from the OMQ language (ELI, CQ) that is non-empty and self-join free. 1. If q is not acyclic, then enumeration of Q is not in CD◦Lin unless the triangle conjecture fails, both for complete answers and for least partial answers. 2. If q is connected and acyclic, but not free-connex, then enumeration of Q is not in CD◦Lin unless sparse Boolean matrix multiplication is possible in time O(m), both for complete answers and for least partial answers. We also show that least partial answers cannot be added to Point 2 of Theorem 1 as there is an OMQ Q ∈ (ELI, CQ) that is free-connex acyclic such that all- testing least partial answers to Q is not in CD◦Lin unless the triangle conjecture fails. Finally, enumeration and all-testing in CD◦Lin is closely related to single- testing in linear time (in data complexity), and we also clarify the limits of that. Theorem 3. 1. Single-testing is in linear time for weakly acyclic OMQs from (G, CQ). 2. Let Q be an OMQ from (ELI, CQ) that is non-empty and self-join free. If Q is not weakly acyclic, single-testing for Q is not in linear time unless the triangle conjecture fails. Here, a CQ q(x̄) is weakly acyclic if it becomes acyclic after consistently replacing all answer variables with fresh constants (and thus the connectedness condition of join trees only applies to quantified variables). Acknowledgements. This research was funded by DFG project QTEC. We thank the anonymous reviewers for useful comments. References 1. Abboud, A., Williams, V.V.: Popular conjectures imply strong lower bounds for dynamic problems. In: Proceedings of FOCS 2014. pp. 434–443. IEEE Computer Society (2014). https://doi.org/10.1109/FOCS.2014.53 2. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995), http://webdam.inria.fr/Alice/ 4 C. Lutz and M. Przybylko 3. Bagan, G., Durand, A., Grandjean, E.: On acyclic conjunctive queries and constant delay enumeration. In: Duparc, J., Henzinger, T.A. (eds.) Proceedings of CSL 2007. Lecture Notes in Computer Science, vol. 4646, pp. 208–222. Springer (2007). https://doi.org/10.1007/978-3-540-74915-8 18 4. Barceló, P., Dalmau, V., Feier, C., Lutz, C., Pieris, A.: The limits of efficiency for open- and closed-world query evaluation under guarded TGDs. In: Suciu, D., Tao, Y., Wei, Z. (eds.) Proceedings of PODS 2020. pp. 259–270. ACM (2020). https://doi.org/10.1145/3375395.3387653 5. Berkholz, C., Gerhardt, F., Schweikardt, N.: Constant delay enumeration for conjunctive queries: a tutorial. ACM SIGLOG News 7(1), 4–33 (2020). https://doi.org/10.1145/3385634.3385636 6. Berkholz, C., Schweikardt, N.: Constant delay enumeration with fpt-preprocessing for conjunctive queries of bounded submodular width. In: Rossmanith, P., Heg- gernes, P., Katoen, J. (eds.) Proceedings of MFCS 2019. LIPIcs, vol. 138, pp. 58:1–58:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019). https://doi.org/10.4230/LIPIcs.MFCS.2019.58 7. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP. ACM Trans. Database Syst. 39(4), 33:1–33:44 (2014). https://doi.org/10.1145/2661643 8. Bienvenu, M., Ortiz, M.: Ontology-mediated query answering with data-tractable description logics. In: Faber, W., Paschke, A. (eds.) Proceedings of Reasoning Web. Lecture Notes in Computer Science, vol. 9203, pp. 218–307. Springer (2015). https://doi.org/10.1007/978-3-319-21768-0 9 9. Carmeli, N., Kröll, M.: Enumeration complexity of conjunctive queries with functional dependencies. Theory Comput. Syst. 64(5), 828–860 (2020). https://doi.org/10.1007/s00224-019-09937-9 10. Carmeli, N., Kröll, M.: On the enumeration complexity of unions of conjunctive queries. ACM Trans. Database Syst. 46(2), 5:1–5:41 (2021). https://doi.org/10.1145/3450263 11. Carmeli, N., Zeevi, S., Berkholz, C., Kimelfeld, B., Schweikardt, N.: Answering (unions of) conjunctive queries using random access and random-order enumera- tion. In: Suciu, D., Tao, Y., Wei, Z. (eds.) Proceedings of PODS 2020. pp. 393–409. ACM (2020). https://doi.org/10.1145/3375395.3387662 12. Deep, S., Hu, X., Koutris, P.: Enumeration algorithms for conjunctive queries with projection. In: Yi, K., Wei, Z. (eds.) Proceedings of ICDT 2021. LIPIcs, vol. 186, pp. 14:1–14:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021). https://doi.org/10.4230/LIPIcs.ICDT.2021.14 13. Deep, S., Koutris, P.: Ranked enumeration of conjunctive query results. In: Yi, K., Wei, Z. (eds.) Proceedings of ICDT 2021. LIPIcs, vol. 186, pp. 5:1–5:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021). https://doi.org/10.4230/LIPIcs.ICDT.2021.5 14. Lutz, C., Przybylko, M.: Enumerating answers to ontology-meditated queries. To appear on arXiv 15. Segoufin, L.: Constant delay enumeration for conjunctive queries. SIGMOD Rec. 44(1), 10–17 (2015). https://doi.org/10.1145/2783888.2783894 16. Yuster, R., Zwick, U.: Fast sparse matrix multiplication. ACM Trans. Algorithms 1(1), 2–13 (2005). https://doi.org/10.1145/1077464.1077466