<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Causality in Databases: Answer-Set Programs and Integrity Constraints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leopoldo Bertossi?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Carleton University, School of Computer Science</institution>
          ,
          <addr-line>Ottawa</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Causality in databases (DBs) was introduced in [15]; and, building on work on causality as found in artificial intelligence, appeals to the notions of counterfactuals, interventions and structural models [14]. More specifically, [15] introduces the notions of: (a) a DB tuple as an actual cause for a query result, (b) a contingency set for a cause, as a set of tuples that must accompany the cause for it to be such, and (c) the responsibility of a cause as a numerical measure of its strength [11]. In our research on causality in DBs we have attempted to understand causality from different angles of data and knowledge management. In [5], precise reductions between causality in DBs, DB repairs, and consistency-based diagnosis were established; and the relationships were investigated and exploited. In [6], causality in DBs was related to view-based DB updates and abductive diagnosis, establishing fruitful connections. This work summarizes some directions and results of our recent and ongoing research in DB causality. Causality in databases. A notion of cause as an explanation for a query result was introduced in [15], as follows. For a relational instance D, a tuple 2 Dn is called a counterfactual cause for a Boolean conjunctive query (BCQ) Q, if D j= Q and D r f g 6j= Q. Now, 2 D is an actual cause for Q if there exists D, called a contingency set for , such that is a counterfactual cause for Q in D r . The notion of responsibility reflects the relative degree of causality of a tuple for a query result [15] (based on [11]). The responsibility of an actual cause for Q, is ( ) := j j1+1 , where j j is the size of a smallest contingency set for . If is not an actual cause, ( ) := 0. Tuples with higher responsibility are stronger explanations. The notion of cause can be applied to any monotonic query with free variables, e.g. conjunctive queries (CQs) with built-ins, unions of CQs (UCQs) [5], Datalog queries [6], etc. Here we consider only conjunctive queries. Example 1. Consider the relational DB D = fR(a4; a3); R(a2; a1); R(a3; a3); S(a4); S(a2); S(a3)g, and the query Q : 9x9y(S(x) ^ R(x; y) ^ S(y)). It holds, D j= Q. S(a3) is a counterfactual cause for Q: if S(a3) is removed from D, Q is no longer true. Its responsibility is 1. So, it is an actual cause with empty contingency set. R(a4; a3) is an actual cause for Q with contingency set fR(a3; a3)g: if R(a3; a3) is removed from D, Q is still true, but further removing R(a4; a3) makes Q false. The responsibility of R(a4; a3) is 21 . R(a3; a3) and S(a4) are actual causes, with responsibility 12 . Database repairs and causes. We introduce the main ideas around DB repairs [2] by means of an example. The ICs that we consider here can be enforced only by tupledeletions from the DB (as opposed to tuple-insertions). Example 2. The DB D = fP (a); P (e); Q(a; b); R(a; c)g is inconsistent with respect to the (set of) denial constraints (DCs) 1 : :9x9y(P (x) ^ Q(x; y)), and 2 : :9x9y(P (x) ^ R(x; y)). It holds D 6j= f 1; 2g. ? Member of the “Millenium Institute for Foundational Research on Data”, bertossi@scs.carleton.ca. Supported by NSERC Discovery Grant #06148.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Chile.</p>
      <p>A subset-repair, in short an S-repair, of D wrt. the set of DCs is a -maximal subset
of D that is consistent, i.e. no proper superset is consistent. The following are S-repairs:
D1 = fP (e); Q(a; b); R(a; b)g and D2 = fP (e); P (a)g. A cardinality-repair, in short
a C-repair, of D wrt. the set of DCs is a maximum-cardinality, consistent subset of D,
i.e. no subset of D with larger cardinality is consistent. D1 is the only C-repair.</p>
      <p>For an instance D and a set of DCs, the sets of S-repairs and C-repairs are denoted
with Srep(D; ) and Crep(D; ), resp.</p>
      <p>
        Now, consider a BCQ Q : 9x(P1(x1) ^ ^ Pm(xm)) with D j= Q. Notice that
:Q is logically equivalent to the DC: (Q) : :9x(P1(x1) ^ ^ Pm(xm)). So, if Q
is true in D, D is inconsistent wrt. (Q), giving rise to repairs of D wrt. (Q). In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
it was shown that S-repairs can be used to obtain actual causes with their contingency
sets; and C-repairs for causes’ responsibilities, which we show with an example. First
we need to build differences, containing a tuple , between D and S- or C-repairs:
(a) Di s(D; (Q); ) = fD r D0 j D0 2 Srep(D; (Q));
(b) Di c(D; (Q); ) = fD r D0 j D0 2 Crep(D; (Q));
2 (D r D0)g; (1)
2 (D r D0)g: (2)
Example 3. (ex. 1 cont.) With the same instance D and query Q, we consider the
DC (Q): :9x9y(S(x) ^ R(x; y) ^ S(y)), which is not satisfied by D. Here,
Srep(D; (Q)) = fD1; D2; D3g and Crep(D; (Q)) = fD1g, with D1 = fR(a4; a3);
R(a2; a1); R(a3; a3); S(a4); S(a2)g, D2 = fR(a2; a1); S(a4); S(a2); S(a3)g, D3 =
fR(a4; a3); R(a2; a1); S(a2); S(a3)g.
      </p>
      <p>
        For tuple R(a4; a3), Di s(D; (Q); R(a4; a3)) = fD r D2g = ffR(a4; a3);
R(a3; a3)gg. So, R(a4; a3) is an actual cause, with responsibility 21 . Similarly, R(a3; a3)
is an actual cause, with responsibility 12 . For tuple S(a3), Di c(D; (Q); S(a3)) =
fD r D1g = fS(a3)g. So, S(a3) is an actual cause, with responsibility 1.
Specification of causes with ASP. A DB may have several repairs wrt. a class of ICs.
Since we are mainly interested in reasoning with whole class of them, e.g. for consistent
query answering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], it is natural to try to specify this class in logical terms. This can
be achieved by means of answer-set programs (disjunctive logic programs with stable
model semantics) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the so-called repair-programs [
        <xref ref-type="bibr" rid="ref2 ref9">9, 2</xref>
        ]. The consistent answers to
a query are certainly and logically entailed by the program. As shown in [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], the
reduction of DB causality to DB repairs illustrated above can be used to take
advantage of repair programs for providing specifications in answer-set programming (ASP)
to reason about causes and compute causes, their contingency sets, and responsibility
degrees. The resulting causality-programs have the necessary and sufficient expressive
power to capture and compute not only causes, which can be done with less expressive
programs [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], but also minimal contingency sets and responsibilities (which can not).
Actually, weak program constraints [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are used to specify C-repairs.
      </p>
      <p>
        Repairing the DB by changing attribute values is also possible [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref5 ref8">8, 5</xref>
        ], a
particular notion of attribute-based repair was used to define attribute values (in tuples)
as causes for query answers. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] it is shown how to specify attribute-based causes
via ASP via the attribute-based repairs connection.
      </p>
      <p>
        Causality under ICs. The interventions at the basis of [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], i.e. actions on the
structural model that determine counterfactual scenarios, take in DBs the form of tuple
deletions. If a DB D is expected to satisfy a given set of ICs, they should also be considered
as a part of the model. Accordingly, the instances obtained from D by tuple deletions,
as used to determine causes, should also satisfy the ICs.
      </p>
      <p>Example 4. For the DB instance D and the open, i.e. non-Boolean, CQ, Q in (3).</p>
      <p>Course CName TStaff DName
Dep DName TStaff t4 COM08 John Computing
t1 Computing John t5 Math01 Kevin Math
t2 Philosophy Patrick t6 HIST02 Patrick Philosophy
t3 Math Kevin t7 Math08 Eli Math</p>
      <p>t8 COM01 John Computing</p>
      <p>AnsQ(TSta ) Dep(DName; TSta ); Course(CName; TSta ; DName): (3)
Q(D) = fJohn; Patrick; Keving, and hJohni has the actual causes: t1, t4 and t8. t1 is a
counterfactual cause, t4 has a single minimal contingency set 1 = ft8g; and t8 has a
single minimal contingency set 2 = ft4g. Now, for the query</p>
    </sec>
    <sec id="sec-2">
      <title>AnsQ0 (TSta ) Dep(DName; TSta )); (4)</title>
      <p>hJohni is still an answer from D, but it has a single cause, t1, which is also a
counterfactual cause. Now, if the following inclusion dependency is satisfied by D,
: 8x8y (Dep(x; y) ! 9u Course(u; y; x)); (5)
Q is equivalent to Q0. The question is whether t4 and t8 should still be considered as
causes for answer hJohni in the presence of . Finally, consider the query Q1:</p>
    </sec>
    <sec id="sec-3">
      <title>AnsQ1 (TSta ) Course(CName; TSta ; DName); (6)</title>
      <p>which, among others, has hJohni as an answer, with t4 and t8 as the only actual causes,
with contingency sets 1 = ft8g and 2 = ft4g, resp. In the presence of , one should
wonder if also t1 would be a cause (it contains the referring value John in table Dept ),
or, if not, whether its presence would make the previous causes less responsible.</p>
      <p>A definition of query-answer cause was introduced and investigated in [6, sec. 7],
as follows. For an instance D that satisfies a set of ICs, i.e. D j= , and a monotone
query Q with D j= Q(a), a tuple 2 D is an actual cause for a under if there is</p>
      <p>D, such that: (a) D r j= Q(a); (b) D r j= ; (c) D r ( [ ftg) 6j= Q(a);
and (d) D r ( [ ftg) j= . The responsibility of as a cause for an answer a to
query Q under a set of ICs, denoted by QD(;a) ( ), is defined exactly as above.
Example 5. (ex. 4 cont.) For query Q in (3) and its answer hJohni, without in (5),
t4 was a cause with minimal contingency set 1 = ft8g. Now, it holds D r 1 j= ,
but D r ( 1 [ ft4g) 6j= . So, in presence of , t4 is not an actual cause for hJohni.
Notice that Q and Q0 in (4) have the same actual cause t1 for hJohni under .</p>
      <p>For Q1 in (6), and its answer hJohni, t4 and t8 are still (non-counterfactual) actual
causes under . However, their previous contingency sets are not such anymore: D r
( 1 [ ft4g) 6j= , D r ( 2 [ ft8g) 6j= . Actually, the smallest contingency set for t4
is 3 = ft8; t1g; and for t8, 4 = ft4; t1g. Accordingly, the causal responsibilities of
t4; t8 decrease under : QD1(John) (t4) = 21 , but QD1;(John) (t4) = 13 . Under , tuple t1 is
still not an actual cause for answer hJohni to Q1.</p>
      <p>
        Since denial constraints are never violated by tuple deletions, they do not affect
the causes. However, inclusion dependencies (INDs) may make a set of causes grow,
together with sizes of minimal contingency sets, and then, responsibilities decrease.
Intuitively, the responsibility is spread through INDs. Also, causes are preserved under
logically equivalent query rewriting under ICs. (Cf. [6, prop. 20] for general properties.)
For some classes of CQs, the complexity of deciding responsibility of causes (under
causality without ICs) may decrease from NP-complete to tractability when the DB
satisfies certain key constraints [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. On the other side, without ICs, deciding causality
for CQs is tractable [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], but their presence may make complexity grow: There are a
CQ Q and an IND , for which deciding causality is NP-complete [6, prop. 22].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] also an abductive/deductive methodology for computing causes for answers
to Datalog queries under ICs was proposed. The involved logical rewritings are
reminiscent of those used for semantic query optimization.
      </p>
      <p>
        As part of our ongoing research, as an extension of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we are appealing to ASPs to
specify causes under ICs. This can be achieved by taking advantage of repair programs
for DBs that may be inconsistent wrt. soft ICs, but consistent wrt. hard ICs. (CQA for
this kind of DBs is investigated in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].)
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Baral</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Knowledge Representation, Reasoning and Declarative Problem Solving</article-title>
          . Cambridge Univ. Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Database Repairing and Consistent Query Answering</article-title>
          . Morgan &amp; Claypool,
          <source>Synthesis Lectures on Data Management</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Achieving Data Privacy through Secrecy Views and Null-Based Virtual Updates</article-title>
          .
          <source>IEEE Trans. Knowledge and Data Engineering</source>
          ,
          <year>2013</year>
          ,
          <volume>25</volume>
          (
          <issue>5</issue>
          ):
          <fpage>987</fpage>
          -
          <lpage>1000</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bravo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Consistency and Trust in Peer Data Exchange Systems</article-title>
          .
          <source>Theory and Practice of Logic Programming</source>
          ,
          <year>2017</year>
          ,
          <volume>17</volume>
          (
          <issue>2</issue>
          ):
          <fpage>148</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Salimi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back</article-title>
          .
          <source>Theory of Computing Systems</source>
          ,
          <year>2017</year>
          ,
          <volume>61</volume>
          (
          <issue>1</issue>
          ):
          <fpage>191</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Salimi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Causes for Query Answers from Databases: Datalog Abduction, View-Updates, and Integrity Constraints</article-title>
          .
          <source>Int. J. Approximate Reasoning</source>
          ,
          <year>2017</year>
          ,
          <volume>90</volume>
          :
          <fpage>226</fpage>
          -
          <lpage>252</lpage>
          . Corr Arxiv Paper cs.
          <source>DB/1611</source>
          .01711.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>The Causality/Repair Connection in Databases: Causality-Programs</article-title>
          .
          <source>Proc. Scalable Uncertainty Management (SUM'17)</source>
          . Springer LNCS 10564,
          <year>2017</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Characterizing</surname>
          </string-name>
          and
          <article-title>Computing Causes for Query Answers in Databases from Database Repairs and Repair Programs</article-title>
          .
          <source>Corr Arxiv Paper cs.DB/1712.01001</source>
          ,
          <year>2017</year>
          . To appear
          <source>in Proc. FoIKS'18.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Caniupan-Marileo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>The Consistency Extractor System: Answer Set Programs for Consistent Query Answering in Databases”</article-title>
          .
          <source>Data &amp; Know. Eng.</source>
          ,
          <year>2010</year>
          ,
          <volume>69</volume>
          (
          <issue>6</issue>
          ):
          <fpage>545</fpage>
          -
          <lpage>572</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Chakravarthy</surname>
            ,
            <given-names>U. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grant</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Minker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Logic-Based Approach to Semantic Query Optimization</article-title>
          .
          <source>ACM TODS</source>
          ,
          <year>1990</year>
          ,
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <fpage>162</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Chockler</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Halpern</surname>
            ,
            <given-names>J. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Responsibility</surname>
          </string-name>
          and
          <string-name>
            <surname>Blame: A Structural-Model Approach</surname>
          </string-name>
          .
          <source>J. Artif. Intell. Res.</source>
          ,
          <year>2004</year>
          ,
          <volume>22</volume>
          :
          <fpage>93</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Cibele</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gatterbauer</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Immerman</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>and Meliou A. A Characterization of the Complexity of Resilience and Responsibility for Conjunctive Queries</article-title>
          . PVLDB,
          <year>2015</year>
          ,
          <volume>9</volume>
          (
          <issue>3</issue>
          ):
          <fpage>180</fpage>
          -
          <lpage>191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Greco</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pijcke</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wijsen</surname>
          </string-name>
          ,
          <source>J. Certain Query Answering in Partially Consistent Databases. PVLDB</source>
          ,
          <year>2014</year>
          ,
          <volume>7</volume>
          (
          <issue>5</issue>
          ):
          <fpage>353</fpage>
          -
          <lpage>364</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Halpern</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Pearl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Causes</surname>
          </string-name>
          and
          <article-title>Explanations: A Structural-Model Approach: Part 1</article-title>
          .
          <string-name>
            <surname>British</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <source>Philosophy of Science</source>
          ,
          <year>2005</year>
          ,
          <volume>56</volume>
          :
          <fpage>843</fpage>
          -
          <lpage>887</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Meliou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gatterbauer</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>K. F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Suciu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>The Complexity of Causality and Responsibility for Query Answers and Non-Answers</article-title>
          .
          <source>Proc. VLDB</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>34</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>