-

Answering Expressive Conjunctive Queries over RDFS Knowledge Bases (Extended Abstract)

Gianluca Cima

cima@diag.uniroma1.it 0

Marco Console

console@diag.uniroma1.it 0

Roberto Maria Delfino

delfino@diag.uniroma1.it 0

Maurizio Lenzerini

lenzerini@diag.uniroma1.it 0

Antonella Poggi

poggi@diag.uniroma1.it 0 0 Sapienza University of Rome , via Ariosto 25, Rome , Italy

2025

Answering conjunctive queries (CQs) equipped with basic forms of negation is a challenging task, which happens to be undecidable even for lightweight Description Logic (DL) ontologies. Interestingly, the DL counterpart of RDFS seems to be partially unafected by those negative results, even when equipped with disjointness axioms. This paper summarises our recent work on this subject, where we present a refined complexity analysis of answering CQs with inequality atoms and safe negation posed over such ontologies. We introduce a unified Π2 algorithm for the general case; we prove that two inequality atoms already lead to Π2-hardness, and we show similar results for the case of safe negation: answering CQs with one negated atom is in NP, but two negated atoms are enough to reach Π2-hardness. These results close key gaps in the literature and refine the complexity analysis of the query containment problem.

eol>Query Answering DL Ontologies Negative conditions

in [ 10 ] for UCQs with inequality atoms only, providing a Π 2 upper bound in combined complexity and a coNP upper bound in data complexity, where data complexity [ 11 ] concerns the case where only the ABox is regarded as the input of the problem. We also provide a matching lower bound for the problem of answering conjunctive queries with at most two inequality atoms, for which a matching lower bound in data complexity was already known. We also show that our result afects the query containment problem, previously proven Π 2-complete [ 12, 13 ], and studied under various syntactic and semantic restrictions in [ 14 ]. To our knowledge, only [ 10 ] examined the impact of bounding the number of inequalities in the queries. That work showed NP-completeness when the containing query has at most one inequality atom, matching the complexity of CQs [15]. Here, we complete the analysis by proving that the problem remains Π 2-hard even when the containing and contained queries have one and two inequality atoms, respectively. We also provide lower bounds for CQs with safe negations, showing that they presents a behavior similar to that of CQs with inequality atoms, i.e., allowing for two safe negations is enough to obtain Π 2-hardness. Finally, we present results for the case where the UNA holds. We denote by CQ¬,̸= the language of conjunctive queries containing both safe negations and inequality atoms, where a safe negation is a negated atom whose variables occur in at least one positive atom, and an inequality atom is an atom of the form ̸= ′, with and ′ being either distinguished variables, existential variables, or constants. We denote the language of conjunctive queries with at most negated atoms and no inequality atoms by CQ¬, , while the language of conjunctive queries with at most inequality atoms and no negated atoms is denoted by CQ̸= . We denote by UCQ the language of unions of CQ, where is a combination of the symbols denoting the presence of either negations or inequality atoms.

Definition 1. We denote by ans(ℒ, ) the problem of deciding, given = ⟨ , ⟩ in the language ℒ, a query ∈ of arity , and an -tuple ¯ of constants occurring in , whether ¯ is an answer to in every model of .

A local interpretation ℐ = ⟨∆ ℐ , · ℐ ⟩ w.r.t. and has its domain ∆ ℐ restricted to constants in the ABox and in the query, and interpretations of concepts and roles not appearing in and in are empty. It can be shown that ¯ ∈/ (, ), where (, ) is the set of certain answers to w.r.t. , i.e., the set of answers satisfying in every model of , if and only if there exists a local interpretation ℐ w.r.t. and such that ℐ is a model of and (1ℐ , ..., ℐ) ∈/ (ℐ). This suggests a nondeterministic algorithm for deciding ¯ ∈/ (, ): guess a local interpretation ℐ and check if it is a model of and if ¯ ∈/ (, ). The size of ℐ is linear in the size of the input. Checking whether ℐ is a model of is feasible in AC0 in the size of ℐ, while checking whether ¯ ∈/ (ℐ) is feasible in NP. Thus, the overall verification can be performed in NP.

This approach leads to the following result.

Theorem 1. The problem ans(DL-LiteR¬DFS, UCQ¬,̸=) is in Π 2 in combined complexity and in coNP in data complexity.

This provides an upper bound. Matching lower bounds regarding data complexity were known for the case of two negated atoms [16] and two inequalities [ 10 ]. In the following, we show that matching lower bounds also hold for combined complexity for CQ¬,2 and CQ̸=2 .

Theorem 2. The problem ans(DL-LiteRDFS, UCQ¬,2 ) is Π 2-hard in combined complexity and coNP-hard in data complexity.

This hardness result is obtained via a LogSpace reduction from ∀∃-CNF [17]. The reduction maps a ∀∃3CNF formula to a DL-LiteR¬DFS ontology and a Boolean UCQ¬ such that |= if and only if is true. The initial construction uses predicates of arity greater than two. However, the encoding can be transformed into an equivalent one that uses only unary and binary predicates, thus complying with the syntactic restrictions of DL-LiteR¬DFS. A similar result holds for the query language CQ¬,2 over DL-LiteR¬DFS.

Theorem 3. The problem ans(DL-LiteR¬DFS, CQ¬,2 ) is Π 2-hard in combined complexity and coNP-hard in data complexity.

The coNP-hardness comes from [16], while the Π 2-hardness is shown by an adaptation of the reduction from ∀∃3CNF used for Theorem 2, involving a fixed TBox with disjointness assertions and a slightly modified query and ABox construction. In contrast to the case of two negated atoms, the complexity drops significantly when only one safe negation atom is allowed per disjunct. In particular, ans(DL-LiteR¬DFS, UCQ¬,1 ) can be polynomially reduced to checking entailment of a ground atom for a Datalog program [18] that can be obtained by means of a transformation from an ontology , a UCQ¬,1 and a tuple ¯, and is such that ¯ ∈ (, ) if and only if entails a specific propositional atom. Combining this property with the known Datalog complexity for predicates with bounded arity [19, 20] results yield the following result.

Theorem 4. The problem ans(DL-LiteR¬DFS, UCQ¬,1 ) is NP-complete in combined complexity and Pcomplete in data complexity. The hardnesses already hold for ans(DL-LiteR¬DFS, CQ¬,1 ).

Note that NP-hardness follows from the NP-hardness of evaluating CQs over relational databases [21], while P-hardness in data complexity comes from [16].

Regarding the case of queries containing inequalities, previous work thoroughly analysed the case of UCQ̸=s over DL-LiteR¬DFS ontologies, but only conjectured the Π 2-hardness of ans(DL-LiteR¬DFS, UCQ̸=2 ). The next result confirms the conjecture.

Theorem 5. The problem ans(DL-LiteR¬DFS, CQ̸=2 ) is Π 2-hard in combined complexity.

This is proved via a LogSpace reduction from ∀∃3CNF, similar to the safe negation cases. The reduction constructs a DL-LiteR¬DFS ontology and a Boolean CQ̸=2 from a ∀∃3CNF formula such that |= if and only if the formula is true. The ontology simulates propositional assignments, and the query structure checks for satisfiability.

Interestingly, our results on hardness for the case of CQ̸=2 s have implications for the query containment problem cont(, ′), which asks if () ⊆ ′() for all databases , with ∈ , and ′ ∈ ′ for some query languages and ′. Through Theorem 5 and a reduction from ontology-mediated query answering to query containment for ontologies without role disjointness assertions, we provide a tight complexity characterization based on the number of inequality atoms present in the queries. Proposition 1. Let = ⟨ , ⟩ be a DL-LiteR¬DFS ontology without role disjointness assertions. It is possible to construct in polynomial time a Boolean CQ̸= such that, for any Boolean CQ̸= : |= if and only if ⊑ .

By combining the reduction used for Theorem 5 and Proposition 1, we obtained the following: Theorem 6. The problem cont(CQ̸=1 , CQ̸=2 ) is Π 2-complete.

This is a significant finding, as cont(CQ̸= , CQ̸= ) is NP-complete if = 0 or ≤ 1 [ 15, 10 ].

Finally, we analysed the problem of query answering under the UNA. In this case, the decision problem of interest is denoted by ansU(ℒ, ). The same Π 2 combined complexity and coNP data complexity upper bounds as Theorem 1 hold for ansU(DL-LiteR¬DFS, UCQ¬,̸=) by using U-local interpretations, i.e., local interpretations under the UNA.

Theorem 7. ansU(DL-LiteR¬DFS, UCQ¬,̸=) is in Π 2 in combined complexity and in coNP in data complexity.

For UCQ¬ queries, DL-LiteR¬DFS is insensitive to the UNA, since it is a sub-logic of DL-Liteℛ, which is insensitive to the UNA for CQ answering [22]. This allows us to derive that ans(, ) = ansU(, ), where is a DL-LiteR¬DFS, and ∈ UCQ¬ . However, answering ̸=s over DL-LiteR¬DFS is significantly easier under the UNA, becoming tractable in data complexity.

Theorem 8. ansU(DL-LiteR¬DFS, UCQ̸=) is NP-complete in combined complexity and in AC0 in data complexity.

We provided a thorough analysis of the complexity of answering queries with safe negation and inequalities over DL-LiteR¬DFS ontologies, confirming its potential as a theoretical foundation for AI systems based on KGs. We completed the picture of the complexity of answering CQs with fixed numbers of inequality atoms by showing that, for CQ̸=2 , it is Π 2-hard, verifying a long-standing conjecture. We showed that CQs with safe negation exhibit a similar complexity jump from one to two negated atoms. These results provide tight complexity bounds and contribute to the understanding of query containment, showing that cont(CQ̸=1 , CQ̸=2 ) is Π 2-complete. The decidability of ans(DL-Lite, CQ̸=) remains an open problem. Our results suggest that answering UCQ¬,̸=s over DL-LiteR¬DFS can potentially be implemented using systems designed for Π 2-complete problems, such as ASP solvers [23].

Acknowledgments

This work has been supported by MUR under the PNRR project FAIR (PE0000013). This work has been carried out while Roberto Maria Delfino was enrolled in the Italian National Doctorate on Artificial Intelligence run by Sapienza University of Rome.

Declaration on Generative AI

During the preparation of this work, the author(s) used OpenAI ChatGPT-4 in order to: Grammar and spelling check; Paraphrase and reword. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s) full responsibility for the publication’s content. [15] A. K. Chandra, P. M. Merlin, Optimal implementation of conjunctive queries in relational data bases, 1977, pp. 77–90. [16] V. Gutiérrez-Basulto, Y. A. Ibáñez-García, R. Kontchakov, E. V. Kostylev, Queries with negation and inequalities over lightweight ontologies 35 (2015) 184–202. [17] L. J. Stockmeyer, The polynomial-time hierarchy, Theor. Comput. Sci. 3 (1976) 1–22. [18] S. Ceri, G. Gottlob, L. Tanca, What you always wanted to know about datalog (and never dared to ask) 1 (1989) 146–166. [19] E. Dantsin, T. Eiter, G. Gottlob, A. Voronkov, Complexity and expressive power of logic programming 33 (2001) 374–425. [20] G. Gottlob, T. Schwentick, Rewriting ontological queries into small nonrecursive Datalog programs, 2011. [21] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, 1995. [22] A. Artale, D. Calvanese, R. Kontchakov, M. Zakharyaschev, The DL-Lite Family and Relations, Technical Report BBKCS-09-03, School of Computer Science and Information Systems, Birbeck College, London, 2009. Available at http://www.dcs.bbk.ac.uk/research/techreps/2009/bbkcs-09-03. pdf. [23] V. Lifschitz, Answer Set Programming, Springer, 2019.

[1]

Khan , Knowledge graphs querying 52 ( 2023 ) 18 - 29 . URL: https://doi.org/10.1145/3615952. 3615956. doi: 10 .1145/3615952.3615956.

[2]

Yang ,

Chen ,

Li ,

Ding ,

Wu , Give us the facts: Enhancing large language models with knowledge graphs for fact-aware language modeling , IEEE Transactions on Knowledge and Data Engineering 36 ( 2024 ) 3091 - 3110 . doi: 10 .1109/TKDE. 2024 . 3360454 .

[3]

Pan ,

Luo ,

Wang ,

Chen ,

Wang ,

Wu , Unifying large language models and knowledge graphs: A roadmap , IEEE Transactions on Knowledge and Data Engineering 36 ( 2024 ) 3580 - 3599 . doi: 10 .1109/TKDE. 2024 . 3352100 .

[4]

Baader , I. Horrocks,

Lutz ,

Sattler , An Introduction to Description Logic, 2017 .

[5]

Calvanese , G. De Giacomo,

Lembo ,

Lenzerini ,

Rosati , Tractable reasoning and eficient query answering in description logics: The DL-Lite family 39 ( 2007 ) 385 - 429 .

[6]

Bienvenu ,

Ortiz , Ontology-mediated query answering with data-tractable description logics , volume 9203 , 2015 , pp. 218 - 307 .

[7]

Cuenca Grau , A possible simplification of the semantic web architecture , 2004 , pp. 704 - 713 .

[8]

Rosati , The limits of querying ontologies , volume 4353 , 2007 , pp. 164 - 178 .

[9]

Cima ,

Poggi ,

Lenzerini , The notion of abstraction in ontology-based data management 323 ( 2023 ) 103976 .

[10]

Cima ,

Lenzerini ,

Poggi , Answering conjunctive queries with inequalities in DL-Lite , 2020 , pp. 2782 - 2789 .

[11]

M. Y.

Vardi , The complexity of relational query languages (extended abstract ), 1982 , pp. 137 - 146 .

[12]

A. C.

Klug , On conjunctive queries containing inequalities , J. ACM 35 ( 1988 ) 146 - 160 .

[13] R. van der Meyden , The complexity of querying indefinite data about linearly ordered domains , J. Comput. Syst. Sci . 54 ( 1997 ) 113 - 135 .

[14]

P. G.

Kolaitis ,

D. L.

Martin ,

M. N.

Thakur , On the complexity of the containment problem for conjunctive queries with built-in predicates , 1998 , pp. 197 - 204 .