Informativeness of Query Answers for Knowledge Bases
                         (Extended Abstract)
                         Luca Andolfi, Gianluca Cima, Marco Console and Maurizio Lenzerini
                         Department of Computer, Control and Management Engineering, 25 Via Ariosto, Rome, 00185


                                     Abstract
                                     This extended abstract summarises our recent work on the informativeness of query answers over Description
                                     Logics Knowledge Bases (KBs). We introduce a framework to characterise the information that query answers for
                                     KBs can represent and under its lens we study the informative power of current query answering definitions
                                     across the literature. Moreover, we also present novel notions of certain and possible answers, showing that
                                     they are able to represent a meaningful form of information. Lastly, we study the data complexity properties for
                                     relevant problems associated to the answers we introduced.

                                     Keywords
                                     Knowledge Bases, Query Answers, Knowledge Representation, Description Logics.


                            Query answering over a Knowledge Base (KB) [1, 2, 3, 4, 5, 6] is the problem of extracting the
                         information specified by a query from the models of some KB and returning an object (answer) that
                         represents it to the user. We are interested in characterising the properties that are needed to make
                         this object meaningful: indeed, in order to be relevant, query answers should represent as much as
                         possible the information required by the queries issued by users and, at the same time, they should be
                         simple enough. The first aspect is related to their informativeness, whereas the second is crucial for
                         their comprehension. We focus on answers expressed as sets of tuples, as this language is universally
                         accepted and understood, and we characterise their informative content introducing novel formal tools
                         tailored to this purpose.
                            So far, in order to construct answers for queries over KBs, most of the approaches in the literature
                         chose either the certain or the possible semantics [7, 8] of the answer; according to the former, the
                         answer is constituted by the answer tuples of constants satisfying the query in every model of the KB,
                         while to the latter in at least one of them. Nevertheless, it is well known that in many practical scenarios
                         [9, 10, 11, 12, 13, 14, 15, 16, 17, 18] both certain and possible answers are a lossy approximation of all the
                         information entailed by a query and a KB. Indeed, as the following example shows, they suffer severe
                         shortcomings w.r.t. their informative content.

                         Example 1. Assuming the predicates Employee/1 (𝐸) and hasSupervisor/2 (hS), let 𝒯 be the TBox
                         {𝐸 ⊑ ∃ℎ𝑆, ∃ℎ𝑆 − ⊑ 𝐸, ℎ𝑆 ⊑ ¬ℎ𝑆 − } (i.e. each employee has as a supervisor another employee
                         and cannot supervise a supervisor) and the ABox 𝒟 = {𝐸(Ava), 𝐸(Bea), ℎ𝑆(Bea, Carl )}. For
                         𝑞 = {(𝑥, 𝑦) | hS(𝑥, 𝑦)} over 𝒦 = ⟨𝒯 , 𝒟⟩, the certain answers are {(Bea, Carl )}. For every model ℐ of
                         𝒦 the information that there exist constants 𝑎 and 𝑏 s.t. (Ava ℐ , 𝑎) and (𝑎, 𝑏) satisfy 𝑞 in ℐ is lost in the
                         certain answers. Similar considerations hold for possible answers as well.

                            This extended abstract summarises our recent results from [19]: we build a framework to formally
                         determine the informativeness of query answers and we show how to use it to quantify objectively
                         the informative content of different query answers definitions across the literature. This is a core
                         contribution of our work, since there existed no rigorous approach conveying a similar measure of
                         informativeness. Moreover, we address some of the limitations highlighted in Example 1 w.r.t. the

                             DL 2024: 37th International Workshop on Description Logics, June 18–21, 2024, Bergen, Norway
                          $ andolfi@diag.uniroma1.it (L. Andolfi); cima@diag.uniroma1.it (G. Cima); console@diag.uniroma1.it (M. Console);
                          lenzerini@diag.uniroma1.it (M. Lenzerini)
                           https://www.diag.uniroma1.it/users/luca_andolfi (L. Andolfi)
                           0009-0004-5528-9224 (L. Andolfi); 0000-0003-1783-5605 (G. Cima); 0009-0004-5526-019X (M. Console);
                          0000-0003-2875-6187 (M. Lenzerini)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
certain and possible answers by providing novel notions of informative query answers. Once again, we
can achieve this in virtue of our framework, as it sheds light on properties of answer objects needed to
increase their informative content. Indeed, equivalent results would be hard to obtain without a tool of
this kind.
   Specifically, we describe which is the logical behaviour of the information to be represented (what an-
swers need to represent), and the core idea behind our framework is the one of characterising accordingly
how effectively a given answer object mimics this behaviour (how informative answers are).
   To formally model what answers need to represent, we leverage the following construction. First,
we introduce a new predicate (not occurring in the KB alphabet) ans𝑘 whose arity 𝑘 matches the
one of the given query 𝑞; when evaluating 𝑞 over a first order interpretation ℐ, the answer to 𝑞 over
ℐ are constituted by the assertion for ans𝑘 over the constant answer tuples from ℐ satisfying 𝑞 (e.g.,
ans2 (Bea, Carl ) is one of these assertions when evaluating the query in Example 1 over any model of the
corresponding KB). Second, this approach is generalised to a KB: the set 𝑞(𝒦) = {𝑞(ℐ) | ℐ ∈ Mod(𝒦)}
expresses all the information that can be derived from a query 𝑞 and a KB 𝒦, where Mod(𝒦) are models
of 𝒦 and 𝑞(ℐ) is the answer to 𝑞 over model ℐ. The construction just introduced allows us to define the
behaviour of the set 𝑞(𝒦) (i.e., what answers need to represent) using properties definable with logical
formulae over the predicate ans𝑘 , as demonstrated in the example below.

Example 2. The set 𝑞(𝒦) from Example 1 satisfies the formula 𝜙1 = ∃𝑥, 𝑦.ans2 (Ava, 𝑥) ∧ ans2 (x,y)
for every 𝑞(ℐ) ∈ 𝑞(𝒦), because ans2 (Avaℐ , 𝑐1 ) ∈ 𝑞(ℐ) and ans2 (𝑐1 , 𝑐2 ) ∈ 𝑞(ℐ) for each ℐ ∈ Mod(𝒦)
and for some constants 𝑐1 , 𝑐2 . Likewise, 𝑞(𝒦) does not satisfy 𝜙2 = ∃𝑥, 𝑦.ans2 (x,y) ∧ ans2 (y,x) for any
𝑞(ℐ) ∈ 𝑞(𝒦), because there is no model ℐ of 𝒦 and no constants (𝑎, 𝑏) from ℐ s.t. ans2 (𝑎, 𝑏) ∈ 𝑞(ℐ) and
ans2 (𝑏, 𝑎) ∈ 𝑞(ℐ).

   Defining how informative answers are w.r.t. the set 𝑞(𝒦) introduced above constitutes the heart of
our approach. The core idea is the one of associating to the objects returned as query answers a formal
semantics and leverage it to determine the information (either present in every or at least one object)
in 𝑞(𝒦) that the answers are able to convey. More precisely, in our framework every answer object
that is returned in response to a query over a KB belongs to a family of objects (call it 𝒜); each of these
families is equipped with a relation specifying the (first order definable) properties their objects satisfy
(call it |=𝐴 ). Below we show the case in which certain answers are used in the framework as definition
of answers.

Example 3. In relation to Example 1, assume the object returned as the answer to 𝑞 over 𝒦 is the set of the
certain answers Θ = {ans2 (Bea, Carl )}, where Θ belongs to a family 𝒜 of objects constituted by all the
ground atoms over the alphabet {ans}. In this case Θ |=𝐴 𝜙, iff 𝜙 is true over Θ seen as an interpretation.
For instance, let 𝜙 = ans2 (Bea, Carl ). Then it is readily seen that Θ |=𝐴 𝜙.

   Now, we can compare the set 𝑞(𝒦) with answer objects and define how similar they are according
to the languages of properties (expressible as formulae over {ans}) for which they satisfy the same
formulae. In order to be as general as needed to accommodate for many query answers definitions, we
use the notion of Query Answering System (QAS). A QAS is an abstraction of the query answering task
that takes queries and KBs, from two languages of queries and KBs Q and K resp., and associates to
them an answer from a family of objects 𝒜 having semantics |=𝐴 . This mapping happens in virtue of a
function denoted as 𝑒𝑣𝑎𝑙 : Q × K → 𝒜. For each query answer definition its informative power comes
down to specify the (largest) language of formulae for which it exhibits an identical behaviour w.r.t. the
set 𝑞(𝒦).

Definition 1. Let ℱ be a language of FO formulae over {ans} and S = ⟨Q, K, ⟨𝒜, |=𝐴 ⟩, eval⟩ be a QAS.
We say that S preserves the certain knowledge of ℱ if, for each 𝑞 ∈ Q, 𝒦 ∈ K, and 𝜙 ∈ ℱ it holds that
eval(𝑞, 𝒦) |=𝐴 𝜙 iff 𝑞(ℐ) |= 𝜙 for every ℐ ∈ Mod(𝒦); likewise, S preserves the possible knowledge of ℱ
if, for each 𝑞 ∈ Q, 𝒦 ∈ K and 𝜙 ∈ ℱ, it holds that eval(𝑞, 𝒦) |=𝐴 𝜙 iff there exists ℐ ∈ Mod(𝒦) so that
𝑞(ℐ) |= 𝜙.
   QASs characterise the information content of query answers: the larger the language ℱ that a QAS
preserves, the more informative such a QAS is. We use the technical tool provided by Definition 1 to
quantify and compare the informativeness of query answering definitions in the literature. In this study
we consider queries in the language of conjunctive queries (CQ) and union thereof (UCQ), as well as
KBs belonging to languages with widely accepted and well-known properties, in which many common
description logic languages fall, such as the profiles of OWL 2 and tuple generating dependencies.
For simplicity of exposition, we refer our results to the answer objects returned by the various query
answering definitions, rather than their corresponding QAS; however, the findings we present are easily
extended to QASs.
   For certain answers, recall that the objects returned as answers have the shape of the one introduced in
Example 3. Answer objects of this flavour preserve the certain knowledge of the language of existential
positive formulae over {ans} with ground atoms only and they do not preserve the certain knowledge
of those languages where existential variables occur.
   Next, we consider an extended form of certain answers, called minimal partial answers, recently
presented in [20, 21]. We prove that the answers constructed according to this definition preserve the
certain knowledge expressible using the existential positive formulae where variables do not repeat
across different atoms; moreover, they do not preserve those fragments of existential positive formulae
in which these repetitions occur.
   Lastly, we consider possible answers: in this case the answer object is constituted by all the assertions
over {ans} for the constant answer tuples satisfying the query in at least one model of the KB. This
definition preserves the possible knowledge of the language of ground atoms, but it does not preserve
their conjunctions.
   Finally, we focus on the definition of novel notions of (certain and possible) answers as sets of tuples,
which are able to represent meaningful form of information. In particular, our idea consists in defining
answers containing tuples sharing the same (unknown) existential individuals, represented through
repeated variables. To this end, we say that a set of atoms 𝐴 is Variable-Connected if for every atom
𝛼 ∈ 𝐴 there is one atom 𝛽 ∈ 𝐴 s.t. 𝛼 and 𝛽 share at least one variable. Additionally, whenever 𝐴
contains at most 𝑛 distinct variables it is called 𝑛-Connected Answer (𝑛-CA). Now, given a KB 𝒦 and a
query 𝑞, among all the 𝑛-CAs we are interested in those that are certain (resp. possible) for 𝑞 over 𝒦;
we say that an 𝑛-CA 𝐴 is certain (resp. possible) for 𝑞 over 𝒦 if for every (resp. some) ℐ ∈ Mod(𝒦)
there is a constant-preserving homomorphism from 𝐴 to 𝑞(ℐ).

Example 4. Consider again Example 1. For 𝑛 = 0 the certain 0-CA is {ans2 (Bea,Carl)}; if 𝑛 = 1,
some certain 1-CAs are {ans2 (Carl, 𝑥1 )} and {ans2 (Ava, 𝑦1 )}. For 𝑛 = 2, some certain 2-CAs are
{ans2 (Carl, 𝑥1 ), ans2 (𝑥1 , 𝑥2 )} and {ans2 (Ava, 𝑦1 ), ans2 (𝑦1 , 𝑦2 )}.

   Our novel answer, called certain (resp. possible) 𝑛-answer, collects together all the 𝑛-CA that are
certain (resp. possible and constant-free) for the query 𝑞 over KB 𝒦.

Definition 2. Let 𝐴 = 𝑚
                           ⋃︀
                              𝑖=1 𝐴𝑖 , where 𝐴1 , . . . , 𝐴𝑚 are 𝑛-CAs that share no variables (resp. share no
variables and mention no constants). The set 𝐴 is a certain (resp. possible) 𝑛-answer for 𝑞 over 𝒦 if: 𝐴𝑖 is
certain (resp. possible) for 𝑞 over 𝒦, for each 𝑖 ∈ {1, ..., 𝑚} and for every 𝑛-CA 𝐵 that is certain (resp.
possible) for 𝑞 over 𝒦 there is 𝑗 ∈ {1, ..., 𝑚} and a bijection ℎ s.t. ℎ(𝐵) = 𝐴𝑗 .

   Using the tools from our framework we prove that certain 𝑛-answers can preserve the certain
knowledge that can be expressed using existential positive formulae in which at most 𝑛 distinct
variables occur; on the other hand, possible 𝑛-answers preserve the possible knowledge expressed by
constant-free positive formulae with at most 𝑛 different variables.
   We then studied the data complexity versions [22, 23, 24] of relevant decision problems for 𝑛-CAs,
as the ones of checking whether an 𝑛-CA is certain and possible for 𝑞 ∈ UCQ and a KB 𝒦. Notably,
these problems can be reduced to Boolean UCQ entailment and consistency check over 𝒦, resp. Our
results imply that, given any UCQ 𝑞 and KB 𝒦 for which Boolean UCQ entailment (resp., consistency
check) is decidable, we can construct a set of certain (resp. possible) 𝑛-answers to 𝑞 over 𝒦 by means
of calls to an answer recognition oracle. We investigated the non-redundancy check as well. Consider
the case of certain 𝑛-answers: intuitively, given variable 𝑥, constants 𝑎, 𝑏, 𝑐 and the certain 1-answers
𝐴 = {⟨𝑎, 𝑥⟩, ⟨𝑥, 𝑐⟩} and 𝐵 = {⟨𝑎, 𝑏⟩, ⟨𝑏, 𝑐⟩}, then 𝐴 is redundant, as 𝐵 conveys more information and
|𝐵| ≤ |𝐴|. Resorting to the query language EQL-Lite(UCQ), the non-redundancy check has the same
data complexity as EQL-entailment [25, 26]. Similar results are obtained for the non-redundancy check
of possible 𝑛-answers.
   Regarding the open problems for our work, a natural extension of the results we presented is the one of
providing a definition of answers that is powerful enough to preserve the certain or possible knowledge
of arbitrary existential positive formulae. This problem is far from trivial, because it forces us to rethink
the language of the query answers: indeed, one notable negative result we showed in [19], regarding the
expressive power of answers as sets of tuples, suggests that the solution requires a new representation
of answers based on logical theories. Second, another research direction involves the design of efficient
enumeration algorithms working with our novel notion of answers and their implementation in practice.
In addition, the framework could also be used to improve the informativeness of answers in many
different scenarios, such as those involving queries with aggregation or Consistent Query Answering.


Acknowledgments
This work has been supported by MUR under the PNRR project FAIR (PE0000013) and by the EU under
the H2020-EU.2.1.1 project TAILOR (grant id. 952215).


References
 [1] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and efficient
     query answering in description logics: The DL-Lite family, Journal of Automated Reasoning 39
     (2007) 385–429.
 [2] M. Bienvenu, M. Ortiz, Ontology-mediated query answering with data-tractable description logics,
     in: Reasoning Web. Semantic Technologies for Intelligent Data Access – Eleventh International
     Summer School Tutorial Lectures (RW 2015), volume 9203 of Lecture Notes in Computer Science,
     2015, pp. 218–307.
 [3] A. Calì, G. Gottlob, M. Kifer, Taming the infinite chase: Query answering under expressive
     relational constraints, Journal of Artificial Intelligence Research 48 (2013) 115–174.
 [4] T. Catarci, M. Scannapieco, M. Console, C. Demetrescu, My (fair) big data, in: J. Nie, Z. Obradovic,
     T. Suzumura, R. Ghosh, R. Nambiar, C. Wang, H. Zang, R. Baeza-Yates, X. Hu, J. Kepner, A. Cuz-
     zocrea, J. Tang, M. Toyoda (Eds.), 2017 IEEE International Conference on Big Data (IEEE BigData
     2017), Boston, MA, USA, December 11-14, 2017, IEEE Computer Society, 2017, pp. 2974–2979. URL:
     https://doi.org/10.1109/BigData.2017.8258267. doi:10.1109/BIGDATA.2017.8258267.
 [5] T. Lukasiewicz, E. Malizia, M. V. Martinez, C. Molinaro, A. Pieris, G. I. Simari, Inconsistency-tolerant
     query answering for existential rules, Artificial Intelligence 307 (2022) 103685.
 [6] S. Abiteboul, R. Hull, V. Vianu, Foundations of Databases, Addison-Wesley, 1995.
 [7] T. Imielinski, W. Lipski, Jr, Incomplete information in relational databases, Journal of the ACM 31
     (1984) 761–791.
 [8] W. Lipski, On semantic issues connected with incomplete information databases, ACM
     Trans. Database Syst. 4 (1979) 262–296. URL: https://doi.org/10.1145/320083.320088. doi:10.1145/
     320083.320088.
 [9] M. Console, P. Guagliardo, L. Libkin, Approximations and refinements of certain answers via
     many-valued logics, in: Principles of Knowledge Representation and Reasoning: Proceedings of
     the Fifteenth International Conference, KR, 2016.
[10] M. Console, P. Guagliardo, L. Libkin, Propositional and predicate logics of incomplete informa-
     tion, in: Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth
     International Conference, KR, 2018.
[11] M. Calautti, M. Console, A. Pieris, Benchmarking approximate consistent query answering, in:
     L. Libkin, R. Pichler, P. Guagliardo (Eds.), PODS’21: Proceedings of the 40th ACM SIGMOD-
     SIGACT-SIGAI Symposium on Principles of Database Systems, 2021.
[12] M. Console, P. Guagliardo, L. Libkin, E. Toussaint, Coping with incomplete data: Recent advances,
     in: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database
     Systems, PODS, ACM, 2020, pp. 33–47.
[13] L. Libkin, Certain answers as objects and knowledge, Artificial Intelligence 232 (2016) 1–19.
[14] C. Civili, L. Libkin, Approximating certainty in querying data and metadata, in: Proceedings
     of the Sixteenth International Conference on the Principles of Knowledge Representation and
     Reasoning (KR 2018), 2018, pp. 582–591.
[15] A. Borgida, D. Toman, G. E. Weddell, On referring expressions in query answering over first order
     knowledge bases, in: Proceedings of the Fifteenth International Conference on the Principles of
     Knowledge Representation and Reasoning (KR 2016), 2016, pp. 319–328.
[16] A. Borgida, D. Toman, G. E. Weddell, Concerning referring expressions in query answers, in: Pro-
     ceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI 2017),
     2017, pp. 4791–4795.
[17] M. Arenas, J. Pérez, J. L. Reutter, Data exchange beyond complete data, Journal of the ACM 60 (2013)
     28:1–28:59. URL: https://doi.org/10.1145/2508028.2505985. doi:10.1145/2508028.2505985.
[18] G. Grahne, A. Onet, Representation systems for data exchange, in: A. Deutsch (Ed.), 15th
     International Conference on Database Theory, ICDT ’12, Berlin, Germany, March 26-29, 2012,
     ACM, 2012, pp. 208–221. URL: https://doi.org/10.1145/2274576.2274599. doi:10.1145/2274576.
     2274599.
[19] Andolfi, Cima, Console, Lenzerini, What does a query answer tell you? informativeness of query
     answers for knowledge bases, Proceedings of the AAAI Conference on Artificial Intelligence
     (2024).
[20] C. Lutz, M. Przybylko, Efficiently enumerating answers to ontology-mediated queries, in: Pro-
     ceedings of the Forty-First ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database
     Systems (PODS 2022), 2022, pp. 277–289.
[21] C. Lutz, M. Przybylko, Efficient answer enumeration in description logics with functional roles,
     in: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023),
     2023, pp. 6483–6490.
[22] M. Y. Vardi, The complexity of relational query languages (extended abstract), in: Proceedings of
     the Fourteenth Annual ACM Symposium on Theory of Computing (STOC 1982), 1982, pp. 137–146.
[23] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Data complexity of query an-
     swering in description logics, Artificial Intelligence 195 (2013) 335–360. doi:10.1016/j.artint.
     2012.10.003.
[24] J.-F. Baget, M.-L. Mugnier, S. Rudolph, M. Thomazo, Walking the complexity lines for generalized
     guarded existential rules, in: Proceedings of the Twenty-Second International Joint Conference
     on Artificial Intelligence (IJCAI 2011), 2011, pp. 712–717.
[25] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, EQL-Lite: Effective first-order
     query processing in description logics, in: Proceedings of the Twentieth International Joint
     Conference on Artificial Intelligence (IJCAI 2007), 2007, pp. 274–279.
[26] G. Cima, M. Console, M. Lenzerini, A. Poggi, Epistemic disjunctive datalog for querying knowledge
     bases, in: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23),
     2023, pp. 6280–6288.