<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>DL</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Informativeness of Query Answers for Knowledge Bases (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luca Andolfi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianluca Cima</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Console</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Lenzerini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer, Control and Management Engineering</institution>
          ,
          <addr-line>25 Via Ariosto, Rome, 00185</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>37</volume>
      <fpage>18</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>This extended abstract summarises our recent work on the informativeness of query answers over Description Logics Knowledge Bases (KBs). We introduce a framework to characterise the information that query answers for KBs can represent and under its lens we study the informative power of current query answering definitions across the literature. Moreover, we also present novel notions of certain and possible answers, showing that they are able to represent a meaningful form of information. Lastly, we study the data complexity properties for relevant problems associated to the answers we introduced.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Bases</kwd>
        <kwd>Query Answers</kwd>
        <kwd>Knowledge Representation</kwd>
        <kwd>Description Logics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>certain and possible answers by providing novel notions of informative query answers. Once again, we
can achieve this in virtue of our framework, as it sheds light on properties of answer objects needed to
increase their informative content. Indeed, equivalent results would be hard to obtain without a tool of
this kind.</p>
      <p>Specifically, we describe which is the logical behaviour of the information to be represented ( what
answers need to represent), and the core idea behind our framework is the one of characterising accordingly
how efectively a given answer object mimics this behaviour (how informative answers are).</p>
      <p>To formally model what answers need to represent, we leverage the following construction. First,
we introduce a new predicate (not occurring in the KB alphabet) ans whose arity  matches the
one of the given query ; when evaluating  over a first order interpretation ℐ, the answer to  over
ℐ are constituted by the assertion for ans over the constant answer tuples from ℐ satisfying  (e.g.,
ans2(Bea, Carl ) is one of these assertions when evaluating the query in Example 1 over any model of the
corresponding KB). Second, this approach is generalised to a KB: the set () = {(ℐ) | ℐ ∈ Mod()}
expresses all the information that can be derived from a query  and a KB , where Mod() are models
of  and (ℐ) is the answer to  over model ℐ. The construction just introduced allows us to define the
behaviour of the set () (i.e., what answers need to represent) using properties definable with logical
formulae over the predicate ans, as demonstrated in the example below.</p>
      <p>Example 2. The set () from Example 1 satisfies the formula  1 = ∃, .ans2(Ava, ) ∧ ans2(x,y)
for every (ℐ) ∈ (), because ans2(Avaℐ , 1) ∈ (ℐ) and ans2(1, 2) ∈ (ℐ) for each ℐ ∈ Mod()
and for some constants 1, 2. Likewise, () does not satisfy  2 = ∃, .ans2(x,y) ∧ ans2(y,x) for any
(ℐ) ∈ (), because there is no model ℐ of  and no constants (, ) from ℐ s.t. ans2(, ) ∈ (ℐ) and
ans2(, ) ∈ (ℐ).</p>
      <p>Defining how informative answers are w.r.t. the set () introduced above constitutes the heart of
our approach. The core idea is the one of associating to the objects returned as query answers a formal
semantics and leverage it to determine the information (either present in every or at least one object)
in () that the answers are able to convey. More precisely, in our framework every answer object
that is returned in response to a query over a KB belongs to a family of objects (call it ); each of these
families is equipped with a relation specifying the (first order definable) properties their objects satisfy
(call it |=). Below we show the case in which certain answers are used in the framework as definition
of answers.</p>
      <p>Example 3. In relation to Example 1, assume the object returned as the answer to  over  is the set of the
certain answers Θ = {ans2(Bea, Carl )}, where Θ belongs to a family  of objects constituted by all the
ground atoms over the alphabet {ans}. In this case Θ |=  , if  is true over Θ seen as an interpretation.
For instance, let  = ans2(Bea, Carl ). Then it is readily seen that Θ |=  .</p>
      <p>Now, we can compare the set () with answer objects and define how similar they are according
to the languages of properties (expressible as formulae over {ans}) for which they satisfy the same
formulae. In order to be as general as needed to accommodate for many query answers definitions, we
use the notion of Query Answering System (QAS). A QAS is an abstraction of the query answering task
that takes queries and KBs, from two languages of queries and KBs Q and K resp., and associates to
them an answer from a family of objects  having semantics |=. This mapping happens in virtue of a
function denoted as  : Q × K → . For each query answer definition its informative power comes
down to specify the (largest) language of formulae for which it exhibits an identical behaviour w.r.t. the
set ().</p>
      <p>Definition 1. Let ℱ be a language of FO formulae over {ans} and S = ⟨Q, K, ⟨, |=⟩, eval⟩ be a QAS.
We say that S preserves the certain knowledge of ℱ if, for each  ∈ Q,  ∈ K, and  ∈ ℱ it holds that
eval(, ) |=  if (ℐ) |=  for every ℐ ∈ Mod(); likewise, S preserves the possible knowledge of ℱ
if, for each  ∈ Q,  ∈ K and  ∈ ℱ , it holds that eval(, ) |=  if there exists ℐ ∈ Mod() so that
(ℐ) |=  .</p>
      <p>QASs characterise the information content of query answers: the larger the language ℱ that a QAS
preserves, the more informative such a QAS is. We use the technical tool provided by Definition 1 to
quantify and compare the informativeness of query answering definitions in the literature. In this study
we consider queries in the language of conjunctive queries (CQ) and union thereof (UCQ), as well as
KBs belonging to languages with widely accepted and well-known properties, in which many common
description logic languages fall, such as the profiles of OWL 2 and tuple generating dependencies.
For simplicity of exposition, we refer our results to the answer objects returned by the various query
answering definitions, rather than their corresponding QAS; however, the findings we present are easily
extended to QASs.</p>
      <p>For certain answers, recall that the objects returned as answers have the shape of the one introduced in
Example 3. Answer objects of this flavour preserve the certain knowledge of the language of existential
positive formulae over {ans} with ground atoms only and they do not preserve the certain knowledge
of those languages where existential variables occur.</p>
      <p>Next, we consider an extended form of certain answers, called minimal partial answers, recently
presented in [20, 21]. We prove that the answers constructed according to this definition preserve the
certain knowledge expressible using the existential positive formulae where variables do not repeat
across diferent atoms; moreover, they do not preserve those fragments of existential positive formulae
in which these repetitions occur.</p>
      <p>Lastly, we consider possible answers: in this case the answer object is constituted by all the assertions
over {ans} for the constant answer tuples satisfying the query in at least one model of the KB. This
definition preserves the possible knowledge of the language of ground atoms, but it does not preserve
their conjunctions.</p>
      <p>Finally, we focus on the definition of novel notions of (certain and possible) answers as sets of tuples,
which are able to represent meaningful form of information. In particular, our idea consists in defining
answers containing tuples sharing the same (unknown) existential individuals, represented through
repeated variables. To this end, we say that a set of atoms  is Variable-Connected if for every atom
 ∈  there is one atom  ∈  s.t.  and  share at least one variable. Additionally, whenever 
contains at most  distinct variables it is called -Connected Answer (-CA). Now, given a KB  and a
query , among all the -CAs we are interested in those that are certain (resp. possible) for  over ;
we say that an -CA  is certain (resp. possible) for  over  if for every (resp. some) ℐ ∈ Mod()
there is a constant-preserving homomorphism from  to (ℐ).</p>
      <p>Example 4. Consider again Example 1. For  = 0 the certain 0-CA is {ans2(Bea,Carl)}; if  = 1,
some certain 1-CAs are {ans2(Carl, 1)} and {ans2(Ava, 1)}. For  = 2, some certain 2-CAs are
{ans2(Carl, 1), ans2(1, 2)} and {ans2(Ava, 1), ans2(1, 2)}.</p>
      <p>Our novel answer, called certain (resp. possible) -answer, collects together all the -CA that are
certain (resp. possible and constant-free) for the query  over KB .</p>
      <p>Definition 2. Let  = ⋃︀</p>
      <p>=1 , where 1, . . . ,  are -CAs that share no variables (resp. share no
variables and mention no constants). The set  is a certain (resp. possible) -answer for  over  if:  is
certain (resp. possible) for  over , for each  ∈ {1, ..., } and for every -CA  that is certain (resp.
possible) for  over  there is  ∈ {1, ..., } and a bijection ℎ s.t. ℎ() =  .</p>
      <p>Using the tools from our framework we prove that certain -answers can preserve the certain
knowledge that can be expressed using existential positive formulae in which at most  distinct
variables occur; on the other hand, possible -answers preserve the possible knowledge expressed by
constant-free positive formulae with at most  diferent variables.</p>
      <p>We then studied the data complexity versions [22, 23, 24] of relevant decision problems for -CAs,
as the ones of checking whether an -CA is certain and possible for  ∈ UCQ and a KB . Notably,
these problems can be reduced to Boolean UCQ entailment and consistency check over , resp. Our
results imply that, given any UCQ  and KB  for which Boolean UCQ entailment (resp., consistency
check) is decidable, we can construct a set of certain (resp. possible) -answers to  over  by means
of calls to an answer recognition oracle. We investigated the non-redundancy check as well. Consider
the case of certain -answers: intuitively, given variable , constants , ,  and the certain 1-answers
 = {⟨, ⟩, ⟨, ⟩} and  = {⟨, ⟩, ⟨, ⟩}, then  is redundant, as  conveys more information and
|| ≤ | |. Resorting to the query language EQL-Lite(UCQ), the non-redundancy check has the same
data complexity as EQL-entailment [25, 26]. Similar results are obtained for the non-redundancy check
of possible -answers.</p>
      <p>Regarding the open problems for our work, a natural extension of the results we presented is the one of
providing a definition of answers that is powerful enough to preserve the certain or possible knowledge
of arbitrary existential positive formulae. This problem is far from trivial, because it forces us to rethink
the language of the query answers: indeed, one notable negative result we showed in [19], regarding the
expressive power of answers as sets of tuples, suggests that the solution requires a new representation
of answers based on logical theories. Second, another research direction involves the design of eficient
enumeration algorithms working with our novel notion of answers and their implementation in practice.
In addition, the framework could also be used to improve the informativeness of answers in many
diferent scenarios, such as those involving queries with aggregation or Consistent Query Answering.
Acknowledgments
This work has been supported by MUR under the PNRR project FAIR (PE0000013) and by the EU under
the H2020-EU.2.1.1 project TAILOR (grant id. 952215).
[11] M. Calautti, M. Console, A. Pieris, Benchmarking approximate consistent query answering, in:
L. Libkin, R. Pichler, P. Guagliardo (Eds.), PODS’21: Proceedings of the 40th ACM
SIGMODSIGACT-SIGAI Symposium on Principles of Database Systems, 2021.
[12] M. Console, P. Guagliardo, L. Libkin, E. Toussaint, Coping with incomplete data: Recent advances,
in: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database
Systems, PODS, ACM, 2020, pp. 33–47.
[13] L. Libkin, Certain answers as objects and knowledge, Artificial Intelligence 232 (2016) 1–19.
[14] C. Civili, L. Libkin, Approximating certainty in querying data and metadata, in: Proceedings
of the Sixteenth International Conference on the Principles of Knowledge Representation and
Reasoning (KR 2018), 2018, pp. 582–591.
[15] A. Borgida, D. Toman, G. E. Weddell, On referring expressions in query answering over first order
knowledge bases, in: Proceedings of the Fifteenth International Conference on the Principles of
Knowledge Representation and Reasoning (KR 2016), 2016, pp. 319–328.
[16] A. Borgida, D. Toman, G. E. Weddell, Concerning referring expressions in query answers, in:
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI 2017),
2017, pp. 4791–4795.
[17] M. Arenas, J. Pérez, J. L. Reutter, Data exchange beyond complete data, Journal of the ACM 60 (2013)
28:1–28:59. URL: https://doi.org/10.1145/2508028.2505985. doi:10.1145/2508028.2505985.
[18] G. Grahne, A. Onet, Representation systems for data exchange, in: A. Deutsch (Ed.), 15th
International Conference on Database Theory, ICDT ’12, Berlin, Germany, March 26-29, 2012,
ACM, 2012, pp. 208–221. URL: https://doi.org/10.1145/2274576.2274599. doi:10.1145/2274576.
2274599.
[19] Andolfi, Cima, Console, Lenzerini, What does a query answer tell you? informativeness of query
answers for knowledge bases, Proceedings of the AAAI Conference on Artificial Intelligence
(2024).
[20] C. Lutz, M. Przybylko, Eficiently enumerating answers to ontology-mediated queries, in:
Proceedings of the Forty-First ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database
Systems (PODS 2022), 2022, pp. 277–289.
[21] C. Lutz, M. Przybylko, Eficient answer enumeration in description logics with functional roles,
in: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023),
2023, pp. 6483–6490.
[22] M. Y. Vardi, The complexity of relational query languages (extended abstract), in: Proceedings of
the Fourteenth Annual ACM Symposium on Theory of Computing (STOC 1982), 1982, pp. 137–146.
[23] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Data complexity of query
answering in description logics, Artificial Intelligence 195 (2013) 335–360. doi: 10.1016/j.artint.
2012.10.003.
[24] J.-F. Baget, M.-L. Mugnier, S. Rudolph, M. Thomazo, Walking the complexity lines for generalized
guarded existential rules, in: Proceedings of the Twenty-Second International Joint Conference
on Artificial Intelligence (IJCAI 2011), 2011, pp. 712–717.
[25] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, EQL-Lite: Efective first-order
query processing in description logics, in: Proceedings of the Twentieth International Joint
Conference on Artificial Intelligence (IJCAI 2007), 2007, pp. 274–279.
[26] G. Cima, M. Console, M. Lenzerini, A. Poggi, Epistemic disjunctive datalog for querying knowledge
bases, in: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23),
2023, pp. 6280–6288.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <article-title>Tractable reasoning and eficient query answering in description logics: The DL-Lite family</article-title>
          ,
          <source>Journal of Automated Reasoning</source>
          <volume>39</volume>
          (
          <year>2007</year>
          )
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bienvenu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ortiz</surname>
          </string-name>
          ,
          <article-title>Ontology-mediated query answering with data-tractable description logics</article-title>
          ,
          <source>in: Reasoning Web. Semantic Technologies for Intelligent Data Access - Eleventh International Summer School Tutorial Lectures (RW</source>
          <year>2015</year>
          ), volume
          <volume>9203</volume>
          of Lecture Notes in Computer Science,
          <year>2015</year>
          , pp.
          <fpage>218</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Calì</surname>
          </string-name>
          , G. Gottlob,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Kifer, Taming the infinite chase: Query answering under expressive relational constraints</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>48</volume>
          (
          <year>2013</year>
          )
          <fpage>115</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Catarci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Scannapieco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Console</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Demetrescu</surname>
          </string-name>
          ,
          <article-title>My (fair) big data</article-title>
          , in: J.
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Obradovic</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Suzumura</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Nambiar</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Zang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kepner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Cuzzocrea</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
          </string-name>
          , M. Toyoda (Eds.),
          <source>2017 IEEE International Conference on Big Data (IEEE BigData</source>
          <year>2017</year>
          ), Boston, MA, USA, December
          <volume>11</volume>
          -
          <issue>14</issue>
          ,
          <year>2017</year>
          , IEEE Computer Society,
          <year>2017</year>
          , pp.
          <fpage>2974</fpage>
          -
          <lpage>2979</lpage>
          . URL: https://doi.org/10.1109/BigData.
          <year>2017</year>
          .
          <volume>8258267</volume>
          . doi:
          <volume>10</volume>
          .1109/BIGDATA.
          <year>2017</year>
          .
          <volume>8258267</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          , E. Malizia,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Molinaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pieris</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. I. Simari</surname>
          </string-name>
          ,
          <article-title>Inconsistency-tolerant query answering for existential rules</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>307</volume>
          (
          <year>2022</year>
          )
          <fpage>103685</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vianu</surname>
          </string-name>
          , Foundations of Databases, Addison-Wesley,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Imielinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lipski</surname>
          </string-name>
          , Jr, Incomplete information in relational databases,
          <source>Journal of the ACM</source>
          <volume>31</volume>
          (
          <year>1984</year>
          )
          <fpage>761</fpage>
          -
          <lpage>791</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lipski</surname>
          </string-name>
          ,
          <article-title>On semantic issues connected with incomplete information databases</article-title>
          ,
          <source>ACM Trans. Database Syst</source>
          .
          <volume>4</volume>
          (
          <year>1979</year>
          )
          <fpage>262</fpage>
          -
          <lpage>296</lpage>
          . URL: https://doi.org/10.1145/320083.320088. doi:
          <volume>10</volume>
          .1145/ 320083.320088.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Console</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Guagliardo</surname>
          </string-name>
          , L. Libkin,
          <article-title>Approximations and refinements of certain answers via many-valued logics</article-title>
          ,
          <source>in: Principles of Knowledge Representation and Reasoning: Proceedings of the Fifteenth International Conference, KR</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Console</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Guagliardo</surname>
          </string-name>
          , L. Libkin,
          <article-title>Propositional and predicate logics of incomplete information</article-title>
          ,
          <source>in: Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>