<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Approximate Query Answering over Inconsistent Knowledge Bases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergio Greco</string-name>
          <email>greco@dimes.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristian Molinaro</string-name>
          <email>cmolinaro@dimes.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irina Trubitsyna</string-name>
          <email>trubitsyna@dimes.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIMES, University of Calabria</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>24</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>Consistent query answering is a principled approach for querying inconsistent knowledge bases. It relies on the central notion of repair, that is, a maximal consistent subset of the facts in the knowledge base. One drawback of this approach is that entire facts are deleted to resolve inconsistency, even if they may still contain useful “reliable” information. To overcome this limitation, we propose an inconsistency-tolerant semantics for query answering based on a new notion of repair, allowing values within facts to be updated for restoring consistency. This more fine-grained repair primitive allows us to preserve more information in the knowledge base. We also introduce the notion of a universal repair, which is a compact representation of all repairs and can be computed in polynomial time. Then, we show that consistent query answering in our framework is intractable (coNP-complete). In light of this result, we develop a polynomial time approximation algorithm for computing a sound (but possibly incomplete) set of consistent query answers.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Reasoning in the presence of inconsistent information is a problem that has attracted
much interest in the last decades. Many inconsistency-tolerant semantics for query
answering have been proposed, and most of them rely on the notions of consistent query
answer and repair. A consistent answer to a query is a query answer that is entailed by
every repair, where a repair is a “maximal” consistent subset of the facts of the knowledge
base. Different maximality criteria have been investigated, but all the resulting notions of
repair share the same drawback: a fact is either kept or deleted altogether, and deleting
entire facts can cause loss of “reliable” information, as illustrated below.
Example 1. Consider the knowledge base (D;
), where D contains the following facts:
works
john cs nyc
john math rome
mary math sidney
and
is an ontology consisting of the following equality-generating dependency :
works(E1; D; C1) ^ works(E2; D; C2) ! C1 = C2:</p>
      <p>As an example, the fact works(john; cs; nyc) states that john is an employee working
in the cs department located in nyc. The dependency says that every department must
be located in a single city. Clearly, the last two facts violate , so every repair would
discard either of them.</p>
      <p>If we pose a query asking for the employees’ name, the only consistent answer is
john. However, we might consider reliable the information on mary being an employee,
as the only uncertainty concerns the math department and its city—roughly speaking,
the information in the first column of the works table can be considered “clean”.</p>
      <p>To overcome the drawback illustrated above, we propose a notion of repair based on
updating values within facts. Update-based repairing allows for rectifying errors in facts
without deleting them altogether, thereby preserving consistent values.
Example 2. Consider again the knowledge base of Example 1. Using value updates as
the primitive to restore consistency, and assuming that the only uncertain values are
math’s cities, we get the following two repairs:
john cs nyc
john math rome
mary math rome
john cs nyc
john math sidney
mary math sidney
If we ask again for the employees’ name, both mary and john are consistent answers.</p>
      <p>We show that consistent query answering in this setting is coNP-complete. Then, we
show how to compute a “universal” repair, which compactly represents all repairs and
can be used as a valuable tool to compute approximate query answers.
Example 3. A universal repair for the two repairs of Example 2 is reported below:
john cs nyc
john math ?1
mary math ?1
?1 = rome _ ?1 = sidney
where ?1 is a labeled null, and the “global” condition at the bottom restricts the
admissible values for ?1, which are rome and sidney.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Consistent query answering was first proposed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Query answering under various
inconsistency-tolerant semantics for ontologies expressed in DL languages has been
studied in several works (see, e.g., [13,14,3,4]), and in [17,18] for ontologies expressed
by fragments of Datalog+/–. [4] have proposed an approach for the approximation of
consistent query answers from above and from below. [7] proposed an approach based
on three-valued logic to compute a sound but possibly incomplete set of consistent query
answers. Different from our proposal, all the approaches above adopt the most common
notion of repair, where whole facts are removed. This can cause loss of information,
as illustrated in the toy scenario of Example 1—in real-life scenarios, it might well be
the case that facts have much more attributes and only a few of them are involved in
inconsistencies.
      </p>
      <p>There have also been different proposals adopting a notion of repair that allows
values to be updated [5,2,6]. Our repair strategy behaves similar to the one of [5] in that
values on the right-hand side of functional dependencies (FDs) are updated. However,
[5] focus on FDs only, while we consider more general EGDs, and they consider repairs
that are minimal according to a specific cost model. [2] allow only numerical attributes to
be updated, one primary key per relation is allowed, but keys are assumed to be satisfied
by the original DB: thus, no repairing is possible w.r.t. keys, while we allow it, and we
allow much more general constraints. Our repair strategy can be seen as an instantiation
of the value-based family of policies proposed in [19], even though the two approaches
differ in how multiple dependencies are handled, and while they focus on FDs only, we
consider EGDs. [6] consider numerical databases and a different class of (aggregate)
constraints. However, the main difference between this paper and the aforementioned
ones is that none of them has investigated the approximate computation of consistent
query answers.</p>
      <p>Approximation algorithms providing sound but possibly incomplete query answers
in the presence of nulls have been proposed in [11,15,16,9], but no dependencies are
considered therein, and thus data is assumed to be consistent.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Preliminaries</title>
      <p>We assume the existence of the following pairwise disjoint (countably infinite) sets: a
set Const of constants, a set Var of variables, and a set Null of labeled nulls. Nulls are
denoted by the symbol ? subscripted. A term is a constant, variable, or null. We also
assume a set of predicates, disjoint from the aforementioned sets, with each predicate
being associated with an arity, which is a non-negative integer.</p>
      <p>An atom A is of the form p(t1; : : : ; tn), where p is an n-ary predicate and the ti’s
are terms. We write an atom also as p(t), where t is a sequence of terms. An atom
without variables is also called a fact. An instance is a finite set of facts. A database is
an instance containing constants only.</p>
      <p>A homomorphism is a mapping h : Const [ Var [ Null ! Const [ Var [ Null that is
the identity on Const. Homomorphisms are also applied to atoms and set of atoms in the
natural fashion, that is, h(p(t1; : : : ; tn)) = p(h(t1); : : : ; h(tn)), and h(S) = fh(A) j
A 2 Sg for every set S of atoms. A valuation is a homomorphism whose image is
Const, that is, (t) 2 Const for every t 2 Const [ Var [ Null.</p>
      <p>Conditional instances. Conditional instances (also known as “conditional tables” [12,8])
are instances augmented with conditions restricting the set of admissible values for nulls.
Let E be the set of all expressions, called conditions, that can be built using the standard
logical connectives ^, _, :, ) and expressions of the form ti = tj , true, and false,
where ti; tj 2 Const [ Null. We will also use ti 6= tj as a shorthand for :(ti = tj ).We
say that a valuation satisfies a condition , denoted j= , if its assignment of
constants to nulls makes true. Formally, a conditional instance (CI) is a pair (I; ), where
I is an instance and 2 E . The semantics of C = (I; ) is given by the set of its possible
worlds, that is, the set of databases pw (C) = f (I) j is a valuation and j= g.
Equality generating dependencies. An equality generating dependency (EGD) is a
first-order formula of the form 8x '(x) ! xi = xj ; where '(x) is a conjunction of
atoms (without labeled nulls) whose variables are exactly x, and xi and xj are variables
from x. We call '(x) the body of , and call xi = xj the head of . We will omit
the universal quantification in front of dependencies and assume that all variables are
universally quantified. With a slight abuse of notation, we sometimes treat a conjunction
as the set of its atoms.</p>
      <p>An instance I satisfies , denoted I j= , if whenever there exists a homomorphism
h s.t. h('(x)) I, then h(xi) = h(xj ). A instance I satisfies a set of EGDs, denoted
I j= , if I j= for every 2 .</p>
      <p>Knowledge bases. A knowledge base (KB) is a pair (D; ), where D is a database and
is a finite set of EGDs. It is consistent if D j= , otherwise it is inconsistent.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Repairing and Querying Inconsistent KBs</title>
      <p>In this section, we present our notion of repair and analyze the computational complexity
of two central problems: repair checking and consistent query answering.</p>
      <p>We start by defining the EGDs we consider. Let be a set of EGDs. An argument
of is an expression of the form p[i], where p is an n-ary predicate appearing in and
1 i n.</p>
      <p>Definition 1 (Argument and dependency graphs). The argument graph of a set of
EGDs is a directed graph G = (V; E), where V is the set of all arguments of , and
E contains a directed edge from p[i] to q[j] labeled iff there is an EGD 2 such
that:
– the body of contains an atom p(t1; : : : ; tn) such that either ti is a constant or ti is
a variable occurring more than once in the body of , and
– the body of contains an atom q(u1; : : : ; um) such that uj is a variable also
appearing in the head of .</p>
      <p>The dependency graph of is a directed graph = ( ; ), where is the set
f( 1; 2) j G = (V; E) ^ (p([i]; q[j]; 1); (q[j]; r[k]; 2) 2 Eg: We say that is
acyclic if its dependency graph is acyclic.</p>
      <p>In the following, we write i &lt; j if ( i; j ) 2</p>
      <p>k &lt; j .</p>
      <sec id="sec-4-1">
        <title>Example 4. Consider the following set of EGDs :</title>
        <p>or there is k such that i &lt;
k and
1 : works(X; Y1; Z1) ^ works(X; Y2; Z2) ! Y1 = Y2
2 : works(X1; Y; Z1) ^ works(X2; Y; Z2) ! Z1 = Z2</p>
        <p>
          Then, G has two edges: one from works[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] to works[2] labeled 1, and another one
from works[2] to works[3] labeled 2. Moreover, has only the edge ( 1; 2). Thus,
is acyclic and a topological sorting of EGDs is ( 1; 2).
        </p>
        <p>In the rest of the paper we consider acyclic sets of EGDs only. Thus, from now on,
is understood to be acyclic. Before introducing our notion of repair, we provide some
intuitions in the following example.</p>
        <p>Example 5. Consider the database below and the set of EGDs of Example 4.
john cs rome
john math rome
mary math sidney</p>
        <p>By enforcing 1 into the first two facts, either cs or math can be chosen as john’s
department. If the latter is chosen, then the database D0 below is obtained.</p>
        <p>Suppose now that 2 is enforced into the last two facts of D0. Then, either rome
or sidney can be chosen as math’s city. If the former is chosen, then the database D00
below is obtained.</p>
        <p>john math rome
D0 = john math rome
mary math sidney</p>
        <p>john math rome
D00 = john math rome
mary math rome
No further dependency enforcement is applicable at this point and thus D00 is a repair.</p>
        <p>When repairing inconsistent databases, conditional instances can be used to keep
track of which values must be equal and which constants can be assigned to nulls. For
instance, in Example 5 above, after the last dependency enforcement, we have to make
sure that the last column contains the same city (which can be either rome or sidney).</p>
        <p>Below we introduce the technical definitions, starting with a repair step.</p>
        <p>Definition 2 (Repair step). Let (I; ) be a CI, a set of EGDs, and an EGD
'(x) ! xi = xj 2 . Let h be a homomorphism s.t. h('(x)) h(I), h j= , and
h(xi) 6= h(xj ).</p>
        <p>Moreover, let ?m be a fresh null and (I0; 0) the CI obtained from (I; ) as follows:
1. for each fact p(u1; :::; un) in I, if '(x) contains an atom p(t1; :::; tn) s.t. h(p(t1; :::; tn)) =
h(p(u1; :::; un)), then for every 1 k n, if tk 2 fxi; xj g,
– replace uk in p(u1; :::; un) with ?m;
– if uk 2 Null, replace every occurrence of uk elsewhere with ?m;
2. Either 0 = ^ (?m = h(xi)) or 0 = ^ (?m = h(xj )).</p>
      </sec>
      <sec id="sec-4-2">
        <title>We say that (I; )</title>
        <p>!;h (I0; 0) is a repair step.</p>
        <p>Intuitively, a repair step enforces an EGD that is not satisfied by the knowledge base.</p>
        <p>A repair sequence of a knowledge base (D; ) is a (possibly empty) finite sequence
of repair steps Ci i!;hi Ci+1 (0 i &lt; m) such that C0 = (D; true) and i 2 for
every i. We also say that the repair sequence is from C0 to Cm. We call Cm the result of
the repair sequence. Also, we say that the repair sequence is maximal if there does not
exist a repair step of the form Cm m!;hm Cm+1.</p>
        <p>It is easy to see that for each repair step (I; ) !;h (I0; 0), if h0 is a homomorphism
s.t. h0 j= 0, then it assigns constants to nulls as dictated by 0 (see item 2 of Definition 2).
Thus, all h0 satisfying 0 yield the same database h0(I0). For any conditional instance
(I; ), we use the notation (I) to denote the database derived from I by iteratively
replacing nulls with constants or other nulls as dictated by .</p>
        <p>For ease of presentation, we assume that one can keep of a given fact f in the
database during the repair process despite the value changes. Thus, we use D[f; i]
to denote the i-th term of a fact f in a database D. Given a repair sequence S from
C0 = (D; true) to Cm = (Im; m), we define the set of changes made by S as
update(S) = f(f; i) j D[f; i] 6= m(Im)[f; i]g, where f is a fact of the database.
Example 6. Consider the database of Example 5 and the set of EGDs of Example 4. By
enforcing first 1 using the first two facts, then 2 using the last two facts, and finally 2
using the first two facts, we get, respectively, the CIs C1, C2 and C3 reported below:</p>
        <p>C1
john ?1 rome
john ?1 rome
mary math sidney
?1 = math</p>
        <p>C2 C3
john ?1 rome john ?1 ?4
john ?1 ?3 john ?1 ?4
mary math ?3 mary math ?4
?1 = math ^ ?3 = sidney ?1 = math ^ ?3 = sidney ^ ?4 = rome
No further repair steps are applicable at this point. Notice that ?3 = sidney can be
deleted from the condition of C3 because it does not play a role anymore. A repair is
obtained from C3 by replacing every occurrence of ?1 with math, and every occurrence
of ?4 with rome.</p>
        <p>Definition 3 (Repair). A repair for a knowledge base K = (D; ) is a database (J )
such that there exists a maximal repair sequence S of K from (D; true) to (J; ) and
update(S) is minimal w.r.t. .</p>
        <p>We use repair (D; ) to denote the set of all repairs of a knowledge base (D; ).
Every repair is indeed consistent [10].</p>
        <p>The consistent answers to a query Q over a knowledge base (D; ) are defined in
the standard way as follows: Q(D; ) = TfQ(D0) j D0 2 repair (D; )g.</p>
        <p>In the context of managing inconsistent knowledge bases, two fundamental problems
are repair checking and consistent query answering, which are defined as follows. Let
(D; ) be a knowledge base, D0 a database, Q a query, and f a fact of constants.
Then, the following two problems are defined: repair checking: decide whether D0 2
repair (D; ); consistent query answering: decide whether f 2 Q(D; ).</p>
        <p>Repair checking is in PTIME while conjunctive query answering is coNP-complete
in the data complexity [10].</p>
        <p>[10] introduced the notion of a universal repair, which is a compact representation
of all repairs for a given knowledge base, and can be computed in polynomial time. An
example is provided below.</p>
        <p>Example 7. Consider the database of Example 5 and the EGDs of Example 4. The
following CI is a universal repair:
john ?1 ?5
john ?1 ?5
mary math ?4
(?1 = cs ^ ?4 = sidney ^ ?5 = rome) _
(?1 = math ^ ?5 = ?4 ^ (?4 = rome _ ?4 = sidney))</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Approximation Algorithm</title>
      <p>Since consistent query answering in intractable in our framework, [10] developed a
polynomial time approximation algorithm to compute a sound (but possibly incomplete)
set of consistent query answers. The basic idea is illustrated in the following example.
Example 8. Consider the knowledge base of Example 1 and the query asking for the
pairs of departments located in different cities.</p>
      <p>A universal repair, say U , is shown in Example 3. The first step of the approximation
algorithm consists in assigning the “local condition” true to every fact of U as follows:
john cs nyc true
john math ?1 true
mary math ?1 true
?1 = rome _ ?1 = sidney
cs cs true ^ true ^ nyc 6= nyc
cs math true ^ true ^ nyc 6= ?1
math cs true ^ true ^ ?1 6= nyc
math math true ^ true ^ ?1 6= ?1
Then, the “conditional evaluation” (cf. [8,10] for more details) of the query is performed—
roughly speaking, it means that the query is evaluated using conditions that express how
facts are derived. The result is the following set of facts associated with conditions:</p>
      <p>Finally, conditions are evaluated in such a way that each of them yields one of the
truth values true, false, unknown. If the evaluation of a fact’s condition yields true, then
the fact is a consistent query answer. In the scenario above, the approximate consistent
answers are (cs; math) and (math; cs), which are indeed consistent query answers.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We proposed a framework for query answering over inconsistent KBs that is based on (i)
a notion of repair that preserves more information than classical repair strategies, (ii)
a compact representation of all repairs called universal repair, (iii) an approximation
algorithm to compute under-approximations of consistent query answers.</p>
      <p>As a direction for future work, we plan to investigate further approximation
algorithms based on different conditions’ evaluations. Another issue to be investigated is the
generalization of our framework to more general classes of dependencies, such as TGDs
or arbitrary EGDs.
2. Leopoldo E. Bertossi, Loreto Bravo, Enrico Franconi, and Andrei Lopatenko. The complexity
and approximation of fixing numerical attributes in databases under integrity constraints.</p>
      <p>Information Systems, 33(4-5):407–434, 2008.
3. Meghyn Bienvenu. On the complexity of consistent query answering in the presence of simple
ontologies. In AAAI Conference on Artificial Intelligence (AAAI), 2012.
4. Meghyn Bienvenu and Riccardo Rosati. Tractable approximations of consistent query
answering for robust ontology-based data access. In International Joint Conference on Artificial
Intelligence (IJCAI), pages 775–781, 2013.
5. Philip Bohannon, Michael Flaster, Wenfei Fan, and Rajeev Rastogi. A cost-based model and
effective heuristic for repairing constraints by value modification. In International Conference
on Management of Data (SIGMOD), pages 143–154, 2005.
6. Sergio Flesca, Filippo Furfaro, and Francesco Parisi. Querying and repairing inconsistent
numerical databases. ACM Transactions on Database Systems, 35(2):14:1–14:50, 2010.
7. Filippo Furfaro, Sergio Greco, and Cristian Molinaro. A three-valued semantics for querying
and repairing inconsistent databases. Annals of Mathematics and Artificial Intelligence,
51(2-4):167–193, 2007.
8. Gösta Grahne. The Problem of Incomplete Information in Relational Databases, volume 554
of Lecture Notes in Computer Science. Springer, 1991.
9. Sergio Greco, Cristian Molinaro, and Irina Trubitsyna. Computing approximate certain
answers over incomplete databases. In Alberto Mendelzon International Workshop (AMW),
2017.
10. Sergio Greco, Cristian Molinaro, and Irina Trubitsyna. Computing approximate query answers
over inconsistent knowledge bases. In International Joint Conference on Artificial Intelligence
(IJCAI), pages 1838–1846, 2018.
11. Paolo Guagliardo and Leonid Libkin. Making SQL queries correct on incomplete databases:
A feasibility study. In Symposium on Principles of Database Systems (PODS), pages 211–223,
2016.
12. Tomasz Imielinski and Witold Lipski. Incomplete information in relational databases. Journal
of the ACM, 31(4):761–791, 1984.
13. Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio
Savo. Inconsistency-tolerant semantics for description logics. In International Conference on
Web Reasoning and Rule Systems (RR), pages 103–117, 2010.
14. Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio
Savo. Inconsistency-tolerant query answering in ontology-based data access. Journal of Web
Semantics, 33:3–29, 2015.
15. Leonid Libkin. How to define certain answers. In International Joint Conference on Artificial</p>
      <p>Intelligence (IJCAI), pages 4282–4288, 2015.
16. Leonid Libkin. Certain answers as objects and knowledge. Artificial Intelligence, 232:1–19,
2016.
17. Thomas Lukasiewicz, Maria Vanina Martinez, Andreas Pieris, and Gerardo I. Simari. From
classical to consistent query answering under existential rules. In AAAI Conference on
Artificial Intelligence (AAAI), pages 1546–1552, 2015.
18. Thomas Lukasiewicz, Maria Vanina Martinez, and Gerardo I. Simari. Inconsistency handling
in Datalog+/– ontologies. In European Conference on Artificial Intelligence (ECAI), pages
558–563, 2012.
19. Maria Vanina Martinez, Francesco Parisi, Andrea Pugliese, Gerardo I. Simari, and V. S.</p>
      <p>Subrahmanian. Policy-based inconsistency management in relational databases. International
Journal of Approximate Reasoning, 55(2):501–528, 2014.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Leopoldo E.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jan</given-names>
            <surname>Chomicki</surname>
          </string-name>
          .
          <article-title>Consistent query answers in inconsistent databases</article-title>
          .
          <source>In Symposium on Principles of Database Systems (PODS)</source>
          , pages
          <fpage>68</fpage>
          -
          <lpage>79</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>