-

Informativeness of Query Answers for Knowledge Bases (Extended Abstract)

Luca Andolfi

Gianluca Cima

Marco Console

Maurizio Lenzerini

0 0 Department of Computer, Control and Management Engineering , 25 Via Ariosto, Rome, 00185

2024

37 18 21

This extended abstract summarises our recent work on the informativeness of query answers over Description Logics Knowledge Bases (KBs). We introduce a framework to characterise the information that query answers for KBs can represent and under its lens we study the informative power of current query answering definitions across the literature. Moreover, we also present novel notions of certain and possible answers, showing that they are able to represent a meaningful form of information. Lastly, we study the data complexity properties for relevant problems associated to the answers we introduced.

eol>Knowledge Bases Query Answers Knowledge Representation Description Logics

certain and possible answers by providing novel notions of informative query answers. Once again, we can achieve this in virtue of our framework, as it sheds light on properties of answer objects needed to increase their informative content. Indeed, equivalent results would be hard to obtain without a tool of this kind.

Specifically, we describe which is the logical behaviour of the information to be represented ( what answers need to represent), and the core idea behind our framework is the one of characterising accordingly how efectively a given answer object mimics this behaviour (how informative answers are).

To formally model what answers need to represent, we leverage the following construction. First, we introduce a new predicate (not occurring in the KB alphabet) ans whose arity matches the one of the given query ; when evaluating over a first order interpretation ℐ, the answer to over ℐ are constituted by the assertion for ans over the constant answer tuples from ℐ satisfying (e.g., ans2(Bea, Carl ) is one of these assertions when evaluating the query in Example 1 over any model of the corresponding KB). Second, this approach is generalised to a KB: the set () = {(ℐ) | ℐ ∈ Mod()} expresses all the information that can be derived from a query and a KB , where Mod() are models of and (ℐ) is the answer to over model ℐ. The construction just introduced allows us to define the behaviour of the set () (i.e., what answers need to represent) using properties definable with logical formulae over the predicate ans, as demonstrated in the example below.

Example 2. The set () from Example 1 satisfies the formula 1 = ∃, .ans2(Ava, ) ∧ ans2(x,y) for every (ℐ) ∈ (), because ans2(Avaℐ , 1) ∈ (ℐ) and ans2(1, 2) ∈ (ℐ) for each ℐ ∈ Mod() and for some constants 1, 2. Likewise, () does not satisfy 2 = ∃, .ans2(x,y) ∧ ans2(y,x) for any (ℐ) ∈ (), because there is no model ℐ of and no constants (, ) from ℐ s.t. ans2(, ) ∈ (ℐ) and ans2(, ) ∈ (ℐ).

Defining how informative answers are w.r.t. the set () introduced above constitutes the heart of our approach. The core idea is the one of associating to the objects returned as query answers a formal semantics and leverage it to determine the information (either present in every or at least one object) in () that the answers are able to convey. More precisely, in our framework every answer object that is returned in response to a query over a KB belongs to a family of objects (call it ); each of these families is equipped with a relation specifying the (first order definable) properties their objects satisfy (call it |=). Below we show the case in which certain answers are used in the framework as definition of answers.

Example 3. In relation to Example 1, assume the object returned as the answer to over is the set of the certain answers Θ = {ans2(Bea, Carl )}, where Θ belongs to a family of objects constituted by all the ground atoms over the alphabet {ans}. In this case Θ |= , if is true over Θ seen as an interpretation. For instance, let = ans2(Bea, Carl ). Then it is readily seen that Θ |= .

Now, we can compare the set () with answer objects and define how similar they are according to the languages of properties (expressible as formulae over {ans}) for which they satisfy the same formulae. In order to be as general as needed to accommodate for many query answers definitions, we use the notion of Query Answering System (QAS). A QAS is an abstraction of the query answering task that takes queries and KBs, from two languages of queries and KBs Q and K resp., and associates to them an answer from a family of objects having semantics |=. This mapping happens in virtue of a function denoted as : Q × K → . For each query answer definition its informative power comes down to specify the (largest) language of formulae for which it exhibits an identical behaviour w.r.t. the set ().

Definition 1. Let ℱ be a language of FO formulae over {ans} and S = ⟨Q, K, ⟨, |=⟩, eval⟩ be a QAS. We say that S preserves the certain knowledge of ℱ if, for each ∈ Q, ∈ K, and ∈ ℱ it holds that eval(, ) |= if (ℐ) |= for every ℐ ∈ Mod(); likewise, S preserves the possible knowledge of ℱ if, for each ∈ Q, ∈ K and ∈ ℱ , it holds that eval(, ) |= if there exists ℐ ∈ Mod() so that (ℐ) |= .

QASs characterise the information content of query answers: the larger the language ℱ that a QAS preserves, the more informative such a QAS is. We use the technical tool provided by Definition 1 to quantify and compare the informativeness of query answering definitions in the literature. In this study we consider queries in the language of conjunctive queries (CQ) and union thereof (UCQ), as well as KBs belonging to languages with widely accepted and well-known properties, in which many common description logic languages fall, such as the profiles of OWL 2 and tuple generating dependencies. For simplicity of exposition, we refer our results to the answer objects returned by the various query answering definitions, rather than their corresponding QAS; however, the findings we present are easily extended to QASs.

For certain answers, recall that the objects returned as answers have the shape of the one introduced in Example 3. Answer objects of this flavour preserve the certain knowledge of the language of existential positive formulae over {ans} with ground atoms only and they do not preserve the certain knowledge of those languages where existential variables occur.

Next, we consider an extended form of certain answers, called minimal partial answers, recently presented in [20, 21]. We prove that the answers constructed according to this definition preserve the certain knowledge expressible using the existential positive formulae where variables do not repeat across diferent atoms; moreover, they do not preserve those fragments of existential positive formulae in which these repetitions occur.

Lastly, we consider possible answers: in this case the answer object is constituted by all the assertions over {ans} for the constant answer tuples satisfying the query in at least one model of the KB. This definition preserves the possible knowledge of the language of ground atoms, but it does not preserve their conjunctions.

Finally, we focus on the definition of novel notions of (certain and possible) answers as sets of tuples, which are able to represent meaningful form of information. In particular, our idea consists in defining answers containing tuples sharing the same (unknown) existential individuals, represented through repeated variables. To this end, we say that a set of atoms is Variable-Connected if for every atom ∈ there is one atom ∈ s.t. and share at least one variable. Additionally, whenever contains at most distinct variables it is called -Connected Answer (-CA). Now, given a KB and a query , among all the -CAs we are interested in those that are certain (resp. possible) for over ; we say that an -CA is certain (resp. possible) for over if for every (resp. some) ℐ ∈ Mod() there is a constant-preserving homomorphism from to (ℐ).

Example 4. Consider again Example 1. For = 0 the certain 0-CA is {ans2(Bea,Carl)}; if = 1, some certain 1-CAs are {ans2(Carl, 1)} and {ans2(Ava, 1)}. For = 2, some certain 2-CAs are {ans2(Carl, 1), ans2(1, 2)} and {ans2(Ava, 1), ans2(1, 2)}.

Our novel answer, called certain (resp. possible) -answer, collects together all the -CA that are certain (resp. possible and constant-free) for the query over KB .

Definition 2. Let = ⋃︀

=1 , where 1, . . . , are -CAs that share no variables (resp. share no variables and mention no constants). The set is a certain (resp. possible) -answer for over if: is certain (resp. possible) for over , for each ∈ {1, ..., } and for every -CA that is certain (resp. possible) for over there is ∈ {1, ..., } and a bijection ℎ s.t. ℎ() = .

Using the tools from our framework we prove that certain -answers can preserve the certain knowledge that can be expressed using existential positive formulae in which at most distinct variables occur; on the other hand, possible -answers preserve the possible knowledge expressed by constant-free positive formulae with at most diferent variables.

We then studied the data complexity versions [22, 23, 24] of relevant decision problems for -CAs, as the ones of checking whether an -CA is certain and possible for ∈ UCQ and a KB . Notably, these problems can be reduced to Boolean UCQ entailment and consistency check over , resp. Our results imply that, given any UCQ and KB for which Boolean UCQ entailment (resp., consistency check) is decidable, we can construct a set of certain (resp. possible) -answers to over by means of calls to an answer recognition oracle. We investigated the non-redundancy check as well. Consider the case of certain -answers: intuitively, given variable , constants , , and the certain 1-answers = {⟨, ⟩, ⟨, ⟩} and = {⟨, ⟩, ⟨, ⟩}, then is redundant, as conveys more information and || ≤ | |. Resorting to the query language EQL-Lite(UCQ), the non-redundancy check has the same data complexity as EQL-entailment [25, 26]. Similar results are obtained for the non-redundancy check of possible -answers.

Regarding the open problems for our work, a natural extension of the results we presented is the one of providing a definition of answers that is powerful enough to preserve the certain or possible knowledge of arbitrary existential positive formulae. This problem is far from trivial, because it forces us to rethink the language of the query answers: indeed, one notable negative result we showed in [19], regarding the expressive power of answers as sets of tuples, suggests that the solution requires a new representation of answers based on logical theories. Second, another research direction involves the design of eficient enumeration algorithms working with our novel notion of answers and their implementation in practice. In addition, the framework could also be used to improve the informativeness of answers in many diferent scenarios, such as those involving queries with aggregation or Consistent Query Answering. Acknowledgments This work has been supported by MUR under the PNRR project FAIR (PE0000013) and by the EU under the H2020-EU.2.1.1 project TAILOR (grant id. 952215). [11] M. Calautti, M. Console, A. Pieris, Benchmarking approximate consistent query answering, in: L. Libkin, R. Pichler, P. Guagliardo (Eds.), PODS’21: Proceedings of the 40th ACM SIGMODSIGACT-SIGAI Symposium on Principles of Database Systems, 2021. [12] M. Console, P. Guagliardo, L. Libkin, E. Toussaint, Coping with incomplete data: Recent advances, in: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, ACM, 2020, pp. 33–47. [13] L. Libkin, Certain answers as objects and knowledge, Artificial Intelligence 232 (2016) 1–19. [14] C. Civili, L. Libkin, Approximating certainty in querying data and metadata, in: Proceedings of the Sixteenth International Conference on the Principles of Knowledge Representation and Reasoning (KR 2018), 2018, pp. 582–591. [15] A. Borgida, D. Toman, G. E. Weddell, On referring expressions in query answering over first order knowledge bases, in: Proceedings of the Fifteenth International Conference on the Principles of Knowledge Representation and Reasoning (KR 2016), 2016, pp. 319–328. [16] A. Borgida, D. Toman, G. E. Weddell, Concerning referring expressions in query answers, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI 2017), 2017, pp. 4791–4795. [17] M. Arenas, J. Pérez, J. L. Reutter, Data exchange beyond complete data, Journal of the ACM 60 (2013) 28:1–28:59. URL: https://doi.org/10.1145/2508028.2505985. doi:10.1145/2508028.2505985. [18] G. Grahne, A. Onet, Representation systems for data exchange, in: A. Deutsch (Ed.), 15th International Conference on Database Theory, ICDT ’12, Berlin, Germany, March 26-29, 2012, ACM, 2012, pp. 208–221. URL: https://doi.org/10.1145/2274576.2274599. doi:10.1145/2274576. 2274599. [19] Andolfi, Cima, Console, Lenzerini, What does a query answer tell you? informativeness of query answers for knowledge bases, Proceedings of the AAAI Conference on Artificial Intelligence (2024). [20] C. Lutz, M. Przybylko, Eficiently enumerating answers to ontology-mediated queries, in: Proceedings of the Forty-First ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2022), 2022, pp. 277–289. [21] C. Lutz, M. Przybylko, Eficient answer enumeration in description logics with functional roles, in: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023), 2023, pp. 6483–6490. [22] M. Y. Vardi, The complexity of relational query languages (extended abstract), in: Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing (STOC 1982), 1982, pp. 137–146. [23] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Data complexity of query answering in description logics, Artificial Intelligence 195 (2013) 335–360. doi: 10.1016/j.artint. 2012.10.003. [24] J.-F. Baget, M.-L. Mugnier, S. Rudolph, M. Thomazo, Walking the complexity lines for generalized guarded existential rules, in: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI 2011), 2011, pp. 712–717. [25] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, EQL-Lite: Efective first-order query processing in description logics, in: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007), 2007, pp. 274–279. [26] G. Cima, M. Console, M. Lenzerini, A. Poggi, Epistemic disjunctive datalog for querying knowledge bases, in: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23), 2023, pp. 6280–6288.

[1]

Calvanese , G. De Giacomo,

Lembo ,

Lenzerini ,

Rosati , Tractable reasoning and eficient query answering in description logics: The DL-Lite family , Journal of Automated Reasoning 39 ( 2007 ) 385 - 429 .

[2]

Bienvenu ,

Ortiz , Ontology-mediated query answering with data-tractable description logics , in: Reasoning Web. Semantic Technologies for Intelligent Data Access - Eleventh International Summer School Tutorial Lectures (RW 2015 ), volume 9203 of Lecture Notes in Computer Science, 2015 , pp. 218 - 307 .

[3]

Calì , G. Gottlob, M. Kifer, Taming the infinite chase: Query answering under expressive relational constraints , Journal of Artificial Intelligence Research 48 ( 2013 ) 115 - 174 .

[4]

Catarci ,

Scannapieco ,

Console ,

Demetrescu , My (fair) big data , in: J. Nie , Z.

Obradovic , T.

Suzumura , R.

Ghosh , R.

Nambiar , C.

Wang , H.

Zang , R.

Baeza-Yates , X.

Hu , J.

Kepner , A.

Cuzzocrea , J.

Tang , M. Toyoda (Eds.), 2017 IEEE International Conference on Big Data (IEEE BigData 2017 ), Boston, MA, USA, December 11 - 14 , 2017 , IEEE Computer Society, 2017 , pp. 2974 - 2979 . URL: https://doi.org/10.1109/BigData. 2017 . 8258267 . doi: 10 .1109/BIGDATA. 2017 . 8258267 .

[5]

Lukasiewicz , E. Malizia,

M. V.

Martinez ,

Molinaro ,

Pieris , G. I. Simari , Inconsistency-tolerant query answering for existential rules , Artificial Intelligence 307 ( 2022 ) 103685 .

[6]

Abiteboul ,

Hull ,

Vianu , Foundations of Databases, Addison-Wesley, 1995 .

[7]

Imielinski ,

Lipski , Jr, Incomplete information in relational databases, Journal of the ACM 31 ( 1984 ) 761 - 791 .

[8]

Lipski , On semantic issues connected with incomplete information databases , ACM Trans. Database Syst . 4 ( 1979 ) 262 - 296 . URL: https://doi.org/10.1145/320083.320088. doi: 10 .1145/ 320083.320088.

[9]

Console ,

Guagliardo , L. Libkin, Approximations and refinements of certain answers via many-valued logics , in: Principles of Knowledge Representation and Reasoning: Proceedings of the Fifteenth International Conference, KR , 2016 .

[10]

Console ,

Guagliardo , L. Libkin, Propositional and predicate logics of incomplete information , in: Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR , 2018 .