<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Computing Approximate Certain Answers over Incomplete Databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergio Greco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristian Molinaro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irina Trubitsyna</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>greco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>cmolinaro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>trubitsynag@dimes.unical.it</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIMES, Universita` della Calabria</institution>
          ,
          <addr-line>87036 Rende (CS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Certain answers are a widely accepted semantics of query answering over incomplete databases. Since their computation is a coNP-hard problem, recent research has focused on developing evaluation algorithms with correctness guarantees, that is, techniques computing a sound but possibly incomplete set of certain answers. In this paper, we show how novel evaluation algorithms with correctness guarantees can be developed leveraging conditional tables and the conditional evaluation of queries, while retaining polynomial time data complexity.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Incomplete information arises naturally in many database applications, such as data
integration [9, 21], data exchange [
        <xref ref-type="bibr" rid="ref3">3, 10, 17, 7</xref>
        ], inconsistency management [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5, 16, 20</xref>
        ],
data cleaning [11], ontological reasoning [6, 8], and many others.
      </p>
      <p>A principled way of answering queries over incomplete databases is to compute
certain answers, which are query answers that can be obtained from all the complete
databases an incomplete database represents. This central notion is illustrated below.
Example 1. Consider the database D consisting of the two unary relations R = fhaig
and S = fh?ig, where ? is a null value. Under the missing value interpretation of nulls
(i.e., a value for ? exists but is unknown), D represents all the databases obtained by
replacing ? with an actual value.</p>
      <p>A certain answer to a query is a tuple that belongs to the query answers for every
database represented by D.</p>
      <p>For instance, consider the query $1=a(R), selecting the tuples in R whose first
value is equal to a. The certain answers to the query are fhaig, because no matter how
? is replaced, the query answers always contain hai. 2</p>
      <p>For databases containing (labeled) nulls, certain answers to positive queries can be
easily computed in polynomial time by applying a “naive” evaluation, that is,
treating nulls as standard constants. However, for more general queries with negation the
problem becomes coNP-hard.</p>
      <p>To make query answering feasible in practice, one might resort to SQL’s evaluation,
but unfortunately, the way SQL behaves in the presence of nulls may result in wrong
answers.</p>
      <p>Specifically, as evidenced in [25], there are two ways in which certain answers and
SQL’s evaluation may differ: (i) SQL can miss some of the tuples that belong to certain
answers, thus producing false negatives, or (ii) SQL can return some tuples that do not
belong to certain answers, that is, false positives. While the first case can be seen as an
under-approximation of certain answers (a sound but possibly incomplete set of certain
answers is returned), the second scenario must be avoided, as the result might contain
plain incorrect answers, that is, tuples that are not certain. The experimental analysis
in [18] showed that false positive are a real problem for queries involving negation—
they were always present and sometimes they constitute almost 100% of the answers.
Example 2. Consider again the database D of Example 1. There are no certain answers
to the query R S, as when ? is replaced with a, the query answers are the empty set.</p>
      <p>Assuming that R and S’s attribute is called A, the same query can be expressed in
SQL as follows:</p>
      <p>SELECT R.A FROM R WHERE NOT EXISTS (</p>
      <p>SELECT * FROM S WHERE R.A = S.A )</p>
      <p>However, the answer to the query above is hai, which is not a certain answer. The
problem with the SQL semantics is that every comparison involving at least one null
evaluates to the truth value unknown, then 3-valued logic is used to evaluate the classical
logical connectives and, or, and not, and eventually only those tuples whose condition
evaluates to true are kept.</p>
      <p>Going back to the query above, the nested subquery compares ? with a, and since
such a comparison evaluates to unknown (because a null is involved) then the result of
the nested subquery is empty. As a consequence, the final result of the query is hai. 2</p>
      <p>Thus, on the one hand, SQL’s evaluation is efficient but flawed, on the other hand,
certain answers are a principled semantics but with high complexity. One way of dealing
with this issue is to develop polynomial time evaluation algorithms computing
approximate certain answers.</p>
      <p>In this regard, there has been recent work on evaluation algorithms with correctness
guarantees, that is, techniques providing a sound but possibly incomplete set of certain
answers [18, 24, 25]. Such approaches consists in rewriting a query Q into two queries
Qt and Qf (resp., Q+ and Q?) computing approximations of certainly true and certainly
false answers (resp., certainly true and possible answers). However, there are still very
simple queries for which the approximation can be unsatisfactory.</p>
      <p>Example 3. Consider the database D of Example 1 and the query R $1=b(S). In
this case, the certain answers are fhaig. However, the approximation provided by the
approach in [18] is the empty set. 2</p>
      <p>In this paper, we show how conditional tables and the conditional evaluation of
queries [19] can be leveraged to develop new approximation algorithms with
correctness guarantees. The approach allows us to keep track of useful information that can
be profitably used to determine if a tuple is a certain answer. The very basic idea is
illustrated in the following example.</p>
      <p>Example 4. Consider again the database D of Example 1 and the query R $1=b(S).
The conditional evaluation of the query is carried out by applying the “conditional”
counterpart of each relational algebra operator. Rather than returning a set of tuples, the
conditional evaluation of a relational algebra operator returns pairs of the form ht; 'i,
where t is a tuple and ' is an expression stating under which conditions t can be derived.</p>
      <p>Considering the query above, first the conditional evaluation of $1=b(S) is
performed, which gives h?; '0i, where '0 is the condition (? = b). Then, the conditional
evaluation of the difference operator is carried out, yielding ha; '00i, where '00 is the
condition (? 6= a) _ (? 6= b). Indeed, this is the result of the conditional evaluation of
the whole query. 2</p>
      <p>As we will show, conditions are valuable information that can exploited to
determine which tuples are certain answers. For instance, from an analysis of '00 in the
example above, one can realize that the condition is always true—thus, hai is a certain
answer. There might be different ways of evaluating tuples’ conditions. In this paper,
we propose some strategies.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>In this section, we recall basic notions on incomplete databases, conditional tables, and
approximation algorithms for query answering over incomplete databases.</p>
      <sec id="sec-2-1">
        <title>2.1 Incomplete databases</title>
        <p>Basics. We assume the existence of the following disjoint countably infinite sets: a set
Const of constants and a set Null of (labeled) nulls. Nulls are denoted by the symbol ?
subscripted.</p>
        <p>A tuple t of arity k is an element of (Const [ Null)k, where k is a non-negative
integer. The i-th element of t is denoted as t[i], where 1 i k. Given a possibly
empty ordered sequence Z of integers i1; : : : ; ih in the range [1::k], we use t[Z] to
denote the tuple ht[i1]; : : : ; t[ih]i.</p>
        <p>A relation of arity k is a finite set of tuples of arity k. A relational schema is a set
of relation names, each associated with a non-negative arity. A database D associates
a relation RD of arity k with each relation name R of arity k. With a slight abuse of
notation, when the database is clear from the context, we simply write R instead of RD
for the relation itself. The arity of R is denoted as ar (R).</p>
        <p>
          The sets of all constants and nulls occurring in a database D are denoted by Const(D)
and Null(D), respectively. The active domain of D is adom(D) = Const(D)[Null(D).
If Null(D) = ;, we say that D is complete. Likewise, a relation is complete if it does not
contain nulls. Databases as defined above have been called also naive tables, V-tables,
and e-tables [
          <xref ref-type="bibr" rid="ref1">19, 1, 15</xref>
          ].
        </p>
        <p>A valuation is a mapping from Const [ Null to Const s.t. (c) = c for every c 2
Const. Valuations can be applied to tuples, relations, and databases in the obvious way.
For instance, the result of applying to a database D, denoted (D), is the complete
database obtained from D by replacing every null ?i with (?i).</p>
        <p>The semantics of a database D is given by the set of complete databases f (D) j
is a valuationg, which are also called possible worlds. Indeed, this is the semantics
under the missing value interpretation of nulls (i.e., every null stands for a value that
exists but is unknown), and is referred to as the closed-world semantics of
incompleteness.1</p>
        <p>Query answering. We consider queries expressed with the relational algebra, that
is, by means of the following operators: selection , projection , cartesian product ,
union [, intersection \, and difference . In the rest of the paper, a query is understood
to be a relational algebra expression built up from the above operators, unless otherwise
indicated. The result of evaluating a query Q on a database D, treating nulls as standard
constants (i.e., every (labeled) null or constant is equal to itself and different from every
other element of Const [ Null), is denoted as Q(D). A query Q returning k-tuples is
said to be of arity k, and ar (Q) denotes its arity.</p>
        <p>
          A widely accepted semantics of query answering relies on the notion of a
certain answer. The certain answers to a query Q on a database D are the tuples in
TfQ( (D)) j is a valuationg. Computing certain answers is coNP-hard (data
complexity), even in the case of the more restricted Codd tables, that is, databases as defined
above where the same null cannot occur multiple times [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>For query answering, we will use a more general notion first proposed in [26] and
called certain answers with nulls in [25].</p>
        <p>Definition 1. Given a query Q and a database D, the certain answers with nulls,
denoted cert(Q; D), is the set of all tuples t such that (t) 2 Q( (D)) for every valuation
.</p>
        <p>As opposed to the more commonly used notion of certain answers, certain answers
with nulls allow tuples with nulls in query answers. Consider for instance the relation
R = fha; ?i; hb; cig and the identity query. The certain answers consist only of hb; ci,
but it is more intuitive to return the entire relation, as we are certain that its tuples are
query answers. In this case, the certain answer with nulls are both ha; ?i and hb; ci. We
refer the interested reader to [22, 23] for a discussion of other drawbacks of the standard
notion of certain answers.</p>
        <p>Below we report the definition of evaluation algorithm with correctness
guarantees [18], which we use as the soundness property an approximation algorithm must
satisfy.</p>
        <p>Definition 2. A query evaluation algorithm has correctness guarantees for a query Q
if for every database D it returns a subset of cert(Q; D).</p>
        <p>When a query evaluation algorithm has correctness guarantees for every query, we
say that it has correctness guarantees.
1 Under the open-world semantics of incompleteness, which we do not consider in this paper,
the semantics of D is f (D) [ D0 j is a valuation and D0 is a complete databaseg.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Conditional tables</title>
        <p>Syntax and semantics. Conditional tables have been proposed in [19].2 Essentially,
they are relations (as defined in the previous subsection, thus possibly containing nulls)
extended by one additional special column that specifies a “condition” for each tuple.</p>
        <p>Formally, let E be the set of all expressions, called conditions, that can be built using
the standard logical connectives ^, _, and : with expressions of the form ( = ),
( 6= ), true, and false, where ; 2 Const [ Null. We say that a valuation satisfies
a condition ', denoted j= ', if its assignment of constants to nulls makes ' true.</p>
        <p>A conditional tuple (c-tuple for short) of arity k (k 0) is a pair ht; 'i, where t is a
tuple of arity k and ' 2 E . Notice that ' may involve nulls and constants not necessarily
appearing in t—e.g., t is the tuple ha; ?1i and ' is the condition (?2 = c)^(?1 6= ?3).</p>
        <p>A conditional table (c-table for short) of arity k is a finite set of c-tuples of arity k.
A conditional database C associates a c-table RC of arity k with each relation name R
of arity k. With a slight abuse of notation, when the conditional database is clear from
the context, we simply write R instead of RC for the c-table itself.</p>
        <p>Let T be a conditional table. The result of applying a valuation to T is
(T ) = f (t) j ht; 'i 2 T and
j= 'g:
Thus, (T ) is the (complete) relation obtained from T by keeping only the c-tuples in
T whose condition is satisfied by , and applying to such c-tuples.</p>
        <p>The set of complete relations represented by T is rep(T ) = f (T ) j is a valuationg:
Likewise, a conditional database C = fT1; : : : ; Tmg represents the following set of
complete databases: rep(C) = ff (T1); : : : ; (Tm)g j is a valuationg:</p>
        <p>Conditional evaluation. Below we recall the conditional evaluation of a query over
a conditional database (see [19, 15]). Basically, it consists in evaluating the relational
algebra operators so that they can take c-tables as input and return a c-table as output. The
conditional evaluation of a query over a conditional database is then simply obtained
by applying the conditional evaluation of each operator.</p>
        <p>Let T1 and T2 be c-tables of arity n and m, respectively. In the definitions below,
for the union and difference operators it is assumed that n = m. For projection, Z is
a possibly empty ordered sequence of integers in the range [1::n]. For selection, is
a Boolean combination of expressions of the form ($i = $j), ($i = c), ($i 6= $j),
($i 6= c), where 1 i; j n, and c 2 Const. The conditional evaluation of a relational
algebra operator op is denoted as o_p. In the following, given two tuples t1 and t2 of
arity n, we use (t1 = t2) as a shorthand for the condition V (t1[i] = t2[i]).
i2[1::n]
– Projection:
– Selection:</p>
        <p>_ Z (T1) = fht[Z]; 'i j ht; 'i 2 T1g:
_ (T1) = fht; '0i j ht; 'i 2 T1 and '0 = ' ^ (t)g;
where (t) is the condition obtained from
by replacing every $i with t[i].
2 A generalization of conditional tables augmenting them with “global conditions” has been
proposed in [14], see also [15].
– Union:
– Difference:</p>
        <p>T1 [_ T2 = fht; 'i j ht; 'i 2 T1 or ht; 'i 2 T2g:</p>
        <p>T1 _ T2 = fht1; '0i j ht1; '1i 2 T1 and '0 = '1 ^ 't1;T2 g;
where 't1;T2 =</p>
        <p>ht2;'2i2T2
– Cartesian product:</p>
        <p>V</p>
        <p>:('2 ^ (t1 = t2)).</p>
        <p>T1 _ T2 = fht1 t2; '1 ^ '2i j ht1; '1i 2 T1; ht2; '2i 2 T2g;
where t1 t2 is the tuple obtained as the concatenation of t1 and t2.</p>
        <p>The result of the conditional evaluation of a query Q over a conditional database C
is denoted as Q_ (C). Notice that Q_ (C) is a c-table. For a fixed query Q and a conditional
database C, Q_ (C) can be evaluated in polynomial time in the size of C (see [15]). The
size of a conditional table T is jjT jj = jT j + P jj'jj, where jT j is the number of
ht;'i2T
c-tuples in T and jj'jj is the length in symbols of condition '. The size of a conditional
database C is jjCjj = P jjTijj.</p>
        <p>Ti2C</p>
        <p>Recall that c-tables are a strong representation system for the relational algebra,
that is, for every relational algebra query Q and conditional database C, it is possible to
compute a c-table T s.t. rep(T ) = fQ(D) j D 2 rep(C)g. Indeed, such a c-table can
be computed through the conditional evaluation, that is, T = Q_ (C).</p>
        <p>In the rest of the paper we assume that every selection condition is a conjunction
of expressions of the form ($i = $j), ($i = c), ($i 6= $j), and ($i 6= c). There is
no loss of generality, as an arbitrary selection (R) can be rewritten as follows to
comply with our assumption. First, is rewritten in disjunctive normal form (DNF),
that is, into a formula 1 _ _ m, where each i is a conjunction of expressions of
the form ($i = $j), ($i = c), ($i 6= $j), and ($i 6= c). Then, (R) is replaced by
1 (R) [ [ m (R). Even though the conversion to DNF can lead to an exponential
blow-up in size, this does not affect the data complexity (as the query is fixed).
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approximation Algorithms</title>
      <p>In this section, we show how to exploit conditional tables to compute a sound (but
possibly incomplete) set of certain answers with nulls. The basic idea is to rely on the
conditional evaluation of the relational algebra operators, in order to keep track of how
each tuple is derived during query evaluation, and then apply a strategy to evaluate
conditions, so that each tuple is eventually associated with a truth value. Tuples that are
associated with the condition true are certain answers with nulls.</p>
      <p>We start by introducing an explicit conditional evaluation for intersection. In the
previous section, we recalled the conditional evaluation of different relational algebra
operators. Clearly, even if intersection was not reported, it can be expressed in terms of
the other operators. Let T1 and T2 be c-tables of arity n. Then,</p>
      <p>T1 \_ T2 = fht1; '0i j ht1; '1i 2 T1; ht2; '2i 2 T2; '0 = '1 ^ '2 ^ (t1 = t2)g:
We slightly generalize c-tables to allow also unknown as a condition. Thus, from
now on, E is the set of all expressions that can be built using the standard logical
connectives with expressions, called atomic conditions, of the form ( = ), ( 6= ), true,
false, and unknown, where ; 2 Const [ Null. We will discuss different strategies
to “evaluate” conditions, that is, to reduce them to either true or false or unknown—as
shown in the following, tuples having condition true are certain answers with nulls.</p>
      <p>We assume the following strict ordering false &lt; unknown &lt; true.</p>
      <p>The three-valued evaluation of a condition ' 2 E , denoted eval ('), is defined
inductively as follows:
– eval ((
8&gt;true
&lt;
= )) = false
&gt;:unknown
if = ;
if 6= and ;
otherwise:
8&gt;true if 6= and ;
&lt;
– eval (( 6= )) = false if = ;</p>
      <p>&gt;:unknown otherwise:
– eval (('1 ^ '2)) = minfeval ('1) ; eval ('2)g.
– eval (('1 _ '2)) = maxfeval ('1) ; eval ('2)g.</p>
      <p>&gt;8true if eval (') = false;
&lt;
– eval ((:')) = false if eval (') = true;</p>
      <p>&gt;:unknown otherwise.
– eval (v) = v for v 2 ftrue; unknown; falseg.</p>
      <p>The basic idea of our first evaluation algorithm, which we call naive evaluation, is to
perform the three-valued evaluation of conditions after each relational algebra operator
is applied (that is, after its conditional evaluation). One interesting fact about the naive
evaluation is that it is equivalent to the evaluation algorithm of [18].</p>
      <p>Given a c-tuple t = ht; 'i, with a slight abuse of notation, we use eval (t) to denote
ht; eval (')i. Likewise, given a conditional table T , eval (T ) denotes feval (t) j t 2 T g.</p>
      <p>To define the naive evaluation, we first provide the following definitions:
Evaln(R) = eval (R)
Evaln(Q1 [ Q2) = eval (Evaln(Q1) [_ Evaln(Q2))
Evaln(Q1 \ Q2) = eval (Evaln(Q1) \_ Evaln(Q2))
Evaln(Q1 Q2) = eval Evaln(Q1) _ Evaln(Q2)
Evaln(Q1 Q2) = eval Evaln(Q1) _ Evaln(Q2)
Evaln( (Q)) = eval ( _ (Evaln(Q)))</p>
      <p>Evaln( Z (Q)) = eval ( _ Z (Evaln(Q)))</p>
      <p>Given a relation R, we define the c-table R = fht; truei j t 2 Rg. Analogously,
given a database D, we define D as the conditional database obtained from D by
replacing every relation R of it with R. Here the basic idea is to convert a database D into
a simple conditional database D where all conditions are true, so that D can be used as
the starting point for the conditional evaluation of queries.</p>
      <p>Given a query Q and a database D, we use Evaln(Q; D) to denote the result of
evaluating Evaln(Q) over D. Finally, we define:</p>
      <p>Evaltn(Q; D) = ft j ht; truei 2 Evaln(Q; D)g;</p>
      <p>Evalnp(Q; D) = ft j ht; 'i 2 Evaln(Q; D) and ' 6= falseg:
Example 5. Consider the database D consisting of the two unary relations R = fhaig
and S = fh?1ig. Consider also the query Q5 = R S. The conditional database D
consist of the two c-tables R = fha; trueig and S = fh?1; trueig. Then,
Evaln(Q5; D) = eval eval R _ eval S
= eval R _ S
= eval (fha; true ^ :(true ^ (a = ?1))ig)
= fha; unknownig:
Thus, Evaltn(Q5; D) = ; and Evalnp(Q5; D) = fhaig.</p>
      <p>The following theorem states that the naive evaluation is equivalent to the
evaluation algorithm of [18]. We point out that, in the following claim, D is a database (thus,
possibly containing nulls), but without conditions. However, the naive evaluation first
converts D into a conditional database D with all conditions being true, and then
performs the evaluation Evaln() over D, so that eventually tuples associated with true are
certain answers with nulls.</p>
      <p>Theorem 1. The naive evaluation is equivalent to the evaluation algorithm of [18].
Corollary 1. Evaltn(Q) has correctness guarantees.</p>
      <p>Theorem 2. Evaln(Q; D) can be computed in polynomial time in the size of D, for
every query Q and database D.</p>
      <p>Obviously, Evaltn(Q; D) and Evalnp(Q; D) can be computed in polynomial time too
(data complexity).</p>
      <p>The naive evaluation presented above is somehow limited, since it does not use
much the power of conditional tables, as shown in the following example.
Example 6. Consider the database D of Example 5 and the query Q6 = R $1=b(S).</p>
      <p>Clearly, cert(Q6; D) = fhaig, as the selection returns either fhbig or ; in every
possible world, and thus the result of the difference is fhaig in every possible world.
However, the naive evaluation is not able to realize this aspect and behaves as follows:
2
Evaln(Q6; D) =
= eval eval R _ eval _ $1=b eval S
= eval fha; trueig _ eval ( _ $1=b (fh?1; trueig))
= eval fha; trueig _ eval (fh?1; true ^ (?1 = b)ig)
= eval fha; trueig _ fh?1; unknownig
= eval (fha; true ^ :(unknown ^ (a = ?1))ig) = fha; unknownig:
Thus, Evaltn(Q6; D) = ;. In this case, the crucial point is that eval _ $1=b eval S
fh?1; unknownig, and the fact that ?1 can only be equal to b gets lost.</p>
      <p>The previous example suggests that equalities might be exploited to refine the
evaluation of conditions, in order to provide more accurate information, as illustrated in the
following example.</p>
      <p>Example 7. Consider again the database D of Example 5 and the query Q6 of
Example 6. By “propagating” equalities into conditions and tuples after each relational
algebra operator is conditionally evaluated, we get:</p>
      <p>fha; trueig _ _ $1=b(fh?1; trueig)
= fha; trueig _ fh?1; true ^ (?1 = b)ig
= fha; trueig _ fhb; unknownig
= fha; true ^ :(unknown ^ (a = b))ig
= fha; trueig:</p>
      <p>Thus, we conclude that hai is a certain answer with nulls. Recall that cert(Q6; D) =
fhaig and Evaltn(Q6; D) = ;. 2</p>
      <p>In the previous example, it is interesting to notice that the selection returns the
ctuple hb; unknowni, meaning that the only tuple that can be returned (in some cases) is
hbi, and thus providing more accurate information than the naive evaluation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>Certain answers are a principled manner to answer queries on incomplete databases.
Since their computation is a coNP-hard problem, recent research has focused on
polynomial time algorithms providing under-approximations. Leveraging conditional tables,
we have shown how new approximation algorithms can be developed.</p>
      <p>We are currently working on more refined evaluation strategies which better
exploit the power of conditional tables and the conditional evaluation, with the aim of
computing strictly more certain answers with nulls, while retaining polynomial time
data complexity. For instance, conditions might be rewritten into a more suitable form
(e.g., conjunctive normal form or disjunctive normal form [12, 13]) so as to allow better
analyses.
6. Meghyn Bienvenu and Magdalena Ortiz. Ontology-mediated query answering with
datatractable description logics. In Reasoning Web, pages 218–307, 2015.
7. Marco Calautti, Sergio Greco, Cristian Molinaro, and Irina Trubitsyna. Exploiting equality
generating dependencies in checking chase termination. PVLDB, 9(5):396–407, 2016.
8. Andrea Cal`ı, Georg Gottlob, and Thomas Lukasiewicz. A general datalog-based framework
for tractable query answering over ontologies. Journal of Web Semantics, 14:57–83, 2012.
9. Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and Riccardo Rosati. On
reconciling data exchange, data integration, and peer data management. In Proc. Symposium
on Principles of Database Systems (PODS), pages 133–142, 2007.
10. Ronald Fagin, Phokion G. Kolaitis, Rene´e J. Miller, and Lucian Popa. Data exchange:
semantics and query answering. Theoretical Computer Science, 336(1):89–124, 2005.
11. Floris Geerts, Giansalvatore Mecca, Paolo Papotti, and Donatello Santoro. The LLUNATIC
data-cleaning framework. Proceedings of the VLDB Endowment, 6(9):625–636, 2013.
12. Georg Gottlob and Enrico Malizia. Achieving new upper bounds for the hypergraph duality
problem through logic. In Proc. Symposium on Logic in Computer Science (LICS), pages
43:1–43:10, 2014.
13. Georg Gottlob and Enrico Malizia. Achieving new upper bounds for the hypergraph duality
problem through logic. SIAM Journal on Computing, 2017.
14. Go¨sta Grahne. Dependency satisfaction in databases with incomplete information. In Proc.</p>
      <p>Very Large Data Bases (VLDB) Conference, pages 37–45, 1984.
15. Go¨sta Grahne. The Problem of Incomplete Information in Relational Databases, volume 554
of Lecture Notes in Computer Science. Springer, 1991.
16. Sergio Greco, Cristian Molinaro, and Francesca Spezzano. Incomplete Data and Data
Dependencies in Relational Databases. Synthesis Lectures on Data Management. Morgan &amp;
Claypool Publishers, 2012.
17. Sergio Greco, Francesca Spezzano, and Irina Trubitsyna. Checking chase termination:
Cyclicity analysis and rewriting techniques. IEEE Transactions on Knowledge and Data
Engineering, 27(3):621–635, 2015.
18. Paolo Guagliardo and Leonid Libkin. Making SQL queries correct on incomplete databases:
A feasibility study. In Proc. Symposium on Principles of Database Systems (PODS), pages
211–223, 2016.
19. Tomasz Imielinski and Witold Lipski Jr. Incomplete information in relational databases.</p>
      <p>Journal of the ACM, 31(4):761–791, 1984.
20. Paraschos Koutris and Jef Wijsen. The data complexity of consistent query answering for
self-join-free conjunctive queries under primary key constraints. In Proc. Symposium on
Principles of Database Systems (PODS), pages 17–29, 2015.
21. Maurizio Lenzerini. Data integration: A theoretical perspective. In Proc. Symposium on</p>
      <p>Principles of Database Systems (PODS), pages 233–246, 2002.
22. Leonid Libkin. Certain answers as objects and knowledge. In Proc. International Conference
on Principles of Knowledge Representation and Reasoning (KR), pages 328–337, 2014.
23. Leonid Libkin. Incomplete data: what went wrong, and how to fix it. In Proc. Symposium
on Principles of Database Systems (PODS), pages 1–13, 2014.
24. Leonid Libkin. Sql’s three-valued logic and certain answers. In Proc. International
Conference on Database Theory (ICDT), pages 94–109, 2015.
25. Leonid Libkin. SQL’s three-valued logic and certain answers. ACM Transactions Database</p>
      <p>Systems, 41(1):1, 2016.
26. Witold Lipski. On relational algebra with marked nulls. In Proc. Symposium on Principles
of Database Systems (PODS), pages 201–203, 1984.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Serge</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          and
          <article-title>Go¨sta Grahne. Update semantics for incomplete databases</article-title>
          .
          <source>In Proc. Very Large Data Bases (VLDB) Conference</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Serge</surname>
            <given-names>Abiteboul</given-names>
          </string-name>
          , Paris C. Kanellakis, and Go¨sta Grahne.
          <article-title>On the representation and querying of sets of possible worlds</article-title>
          .
          <source>Theoretical Computer Science</source>
          ,
          <volume>78</volume>
          (
          <issue>1</issue>
          ):
          <fpage>158</fpage>
          -
          <lpage>187</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          , Pablo Barcelo´,
          <string-name>
            <given-names>Leonid</given-names>
            <surname>Libkin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Filip</given-names>
            <surname>Murlak</surname>
          </string-name>
          .
          <source>Foundations of Data Exchange</source>
          . Cambridge University Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Leopoldo E.</given-names>
            <surname>Bertossi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jan</given-names>
            <surname>Chomicki</surname>
          </string-name>
          .
          <article-title>Consistent query answers in inconsistent databases</article-title>
          .
          <source>In Proc. Symposium on Principles of Database Systems (PODS)</source>
          , pages
          <fpage>68</fpage>
          -
          <lpage>79</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Leopoldo</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Bertossi</surname>
          </string-name>
          .
          <source>Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management</source>
          . Morgan &amp; Claypool Publishers,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>