<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Causality in Databases, Database Repairs, and Consistency-Based Diagnosis (extended abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leopoldo Bertossi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Babak Salimi</string-name>
          <email>bsalimig@scs.carleton.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Carleton University, School of Computer Science Ottawa</institution>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        When querying a database, a user may not always obtain the expected results,
and the system could provide some explanations. Explanations that could be
useful to further understand the data or check if the query is the intended one.
Actually, the notion of explanation for a query result was introduced in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], on
the basis of the deeper concept of actual causation.
      </p>
      <p>
        Intuitively, a tuple t is an actual cause for an answer a¯ to a conjunctive
query Q from a relational database instance D if there is a “contingent” set of
tuples , such that, after removing from D, removing/inserting t from/into D
causes a¯ to switch from being an answer to being a non-answer. Actual causes
and contingent tuples are restricted to be among a pre-specified set of
endogenous tuples, which are admissible, possible candidates for causes, as opposed to
exogenous tuples. (For a formalization of non-causality-based explanations for
query answers in DL ontologies, see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].)
      </p>
      <p>
        Some causes may be stronger than others. In order to capture this
observation, [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] also introduces and investigates a quantitative metric, called
responsibility, which reflects the relative degree of causality of a tuple for a query result.
In applications involving large data sets, it is crucial to rank potential causes by
their responsibility [
        <xref ref-type="bibr" rid="ref19 ref20">20, 19</xref>
        ].
      </p>
      <p>
        Actual causation, as used in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], can be traced back to [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ], which
provides a model-based account of causation on the basis of the
counterfactual dependence. Responsibility was also introduced in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], to capture the
degree of causation. Apart from the explicit use of causality, research on
explanations for query results has focused mainly, and rather implicitly, on provenance
[
        <xref ref-type="bibr" rid="ref14 ref15 ref23 ref4 ref5 ref7 ref9">4, 5, 7, 9, 15, 14, 23</xref>
        ], and more recently, on provenance for non-answers [
        <xref ref-type="bibr" rid="ref13 ref6">6, 13</xref>
        ].
A close connection between causality and provenance has been established [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
However, causality is a more refined notion that identifies causes for query
results on the basis of user-defined criteria, and ranks causes according to their
responsibility [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        Consistency-based diagnosis [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] is an area of knowledge representation. The
main task here is, given the speci cation of a system in some logical formalism
and a usually unexpected observation about the system, to obtain explanations
for the observation, in the form of a diagnosis for the unintended behavior.
      </p>
      <p>
        In a different direction, a database instance, D, that is expected to satisfy
certain integrity constraints (ICs) may fail to do so. In this case, a repair of D is
a database D′ that does satisfy the ICs and minimally departs from D. Different
forms of minimality can be applied and investigated. A consistent answer to a
query from D and wrt. the ICs is a query answer that is obtained from all possible
repairs, i.e. is invariant or certain under the class of repairs. These notions were
introduced in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for a recent survey). We should mention that, although
not in the framework of database repairs, model-based diagnosis techniques have
been applied to restoring consistency of a database wrt. a set of ICs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
      </p>
      <p>These three forms of reasoning, namely inferring causality in databases,
consistency-based diagnosis, and consistent query answers (and repairs) are all
non-monotonic. For example, a (most responsible) cause for a query result may
not be such anymore after the database is updated. In this work we establish
natural, precise, useful, and deeper connections between causality for query answers
in databases, database repairs wrt. denial constraints, and consistency-based
diagnosis. The first two are relatively new problems in databases, and the third one
is an established subject of model-based diagnosis in knowledge representation.</p>
      <p>We show how to obtain database repairs from causes, and the other way
around. The vast body of research on database repairs can be applied to the
newer problem of determining actual causes for query answers. By formulating
a causality problem as a diagnosis problem, we manage to characterize causes
in terms of the system’s diagnoses. More specifically, we show that inferring
and computing actual causes and responsibility in a database setting become, in
different forms, consistency-based diagnosis reasoning problems and tasks.</p>
      <p>
        Informally, a causal explanation for a conjunctive query answer can be viewed
as a diagnosis, where in essence the first-order logical reconstruction of the
relational database provides the system description [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], and the observation is the
query answer. Furthermore, we unveil a strong connection between computing
causes and their responsibilities for conjunctive queries, on the one hand, and
computing repairs in databases [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] wrt. denial constraints, on the other hand.
These computational problems can be reduced to each other. More precisely, we
report on the following results:
1. For a boolean conjunctive query and its associated denial constraint (which
is violated iff the query is true), we establish a precise connection between
actual causes for the query (being true) and the subset-repairs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] of the
instance wrt. the constraint. Namely, we obtain causes from repairs.
2. In particular, we establish the connection between an actual cause’s
responsibility and cardinality repairs [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] wrt. the associated constraint.
3. We characterize and obtain subset- and cardinality- repairs for a database
under a denial constraint in terms of the causes for the associated query
being true.
4. We consider a set of denials constraints and a database that may be
inconsistent wrt. them. We obtain the database repairs by means of an algorithm
that takes as input the actual causes for constraint violations and their
contingency sets.
5. We establish a precise connection between consistency-based diagnosis for
a boolean conjunctive query being unexpectedly true according to a system
description, and causes for the query being true. In particular, we compute
actual causes, contingency sets, and responsibilities from minimal diagnosis.
6. As report on ongoing work, we discuss several extensions and open issues
that are under investigation.
      </p>
      <p>
        Acknowledgements: Leo Bertossi is grateful to Benny Kimelfeld for
stimulating conversations at LogicBlox, and pointing out to [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ], where an interesting
connection between updates through views and causality is established. He also
appreciates the hospitality of LogicBlox during part of his sabbatical.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chomicki</surname>
          </string-name>
          ,
          <source>J. Consistent Query Answers in Inconsistent Databases. Proc. ACM PODS</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Database Repairing and Consistent Query Answering</article-title>
          . Morgan &amp; Claypool,
          <source>Synthesis Lectures on Data Management</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Borgida</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Rodriguez-Muro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Explanation in DL-Lite</article-title>
          .
          <source>Proc. DL Workshop, CEUR-WS 353</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Buneman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khanna</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Tan</surname>
          </string-name>
          , W. C.
          <article-title>Why and Where: A Characterization of Data Provenance</article-title>
          .
          <source>Proc. ICDT</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Buneman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W. C.</given-names>
          </string-name>
          <article-title>Provenance in Databases</article-title>
          .
          <source>Proc. ACM SIGMOD</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Jagadish</surname>
            ,
            <given-names>H. V.</given-names>
          </string-name>
          <string-name>
            <surname>Why</surname>
            <given-names>Not</given-names>
          </string-name>
          ?
          <source>Proc. ACM SIGMOD</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Cheney</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiticariu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Tan</surname>
          </string-name>
          , W. C. Provenance in Databases: Why,
          <string-name>
            <surname>How</surname>
          </string-name>
          , And Where. Foundations and Trends in Databases,
          <year>2009</year>
          ,
          <volume>1</volume>
          (
          <issue>4</issue>
          ):
          <fpage>379</fpage>
          -
          <lpage>474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Chockler</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Halpern</surname>
            ,
            <given-names>J. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Responsibility</surname>
          </string-name>
          and
          <string-name>
            <surname>Blame: A Structural-Model Approach</surname>
          </string-name>
          .
          <source>J. Artif. Intell. Res.</source>
          ,
          <year>2004</year>
          ,
          <volume>22</volume>
          :
          <fpage>93</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Cui</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widom</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wiener</surname>
            ,
            <given-names>J. L. Tracing</given-names>
          </string-name>
          <article-title>The Lineage of View Data in a Warehousing Environment</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <year>2000</year>
          ,
          <volume>25</volume>
          (
          <issue>2</issue>
          ):
          <fpage>179</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Gertz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Diagnosis and Repair of Constraint Violations in Database Systems</article-title>
          .
          <source>PhD Thesis</source>
          , Universitat Hannover,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Halpern</surname>
            ,
            <given-names>Y. J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Pearl</surname>
          </string-name>
          ,
          <source>J. Causes and Explanations: A Structural-Model Approach: Part 1 Proc. UAI</source>
          ,
          <year>2001</year>
          , pp.
          <fpage>194</fpage>
          -
          <lpage>202</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Halpern</surname>
            ,
            <given-names>Y. J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Pearl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Causes</surname>
          </string-name>
          and
          <article-title>Explanations: A Structural-Model Approach: Part 1</article-title>
          .
          <string-name>
            <surname>British</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <source>Philosophy of Science</source>
          ,
          <year>2005</year>
          ,
          <volume>56</volume>
          :
          <fpage>843</fpage>
          -
          <lpage>887</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Naughton</surname>
            ,
            <given-names>J. F.</given-names>
          </string-name>
          <string-name>
            <surname>On</surname>
          </string-name>
          <article-title>The Provenance of NonAnswers to Queries over Extracted Data</article-title>
          . PVLDB,
          <year>2008</year>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <volume>736</volume>
          {
          <fpage>747</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Karvounarakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Green</surname>
            ,
            <given-names>T. J.</given-names>
          </string-name>
          <string-name>
            <surname>Semiring-Annotated</surname>
            <given-names>Data</given-names>
          </string-name>
          : Queries and Provenance? SIGMOD Record,
          <year>2012</year>
          ,
          <volume>41</volume>
          (
          <issue>3</issue>
          ):
          <fpage>5</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Karvounarakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z. G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Tannen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Querying Data Provenance</article-title>
          .
          <source>Proc. ACM SIGMOD</source>
          ,
          <year>2010</year>
          , pp.
          <volume>951</volume>
          {
          <fpage>962</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Kimelfeld</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies</article-title>
          .
          <source>Proc. ACM PODS</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Kimelfeld</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vondrak</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Maximizing Conjunctive Views in Deletion Propagation</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <year>2012</year>
          ,
          <volume>37</volume>
          (
          <issue>4</issue>
          ):
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Lopatenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bertossi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Complexity of Consistent Query Answering in Databases under Cardinality-Based and Incremental Repair Semantics</article-title>
          .
          <source>Proc. ICDT</source>
          ,
          <year>2007</year>
          , Springer LNCS 4353.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Meliou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gatterbauer</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>K. F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Suciu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>The Complexity of Causality and Responsibility for Query Answers and Non-Answers</article-title>
          .
          <source>Proc. VLDB</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>34</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Meliou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gatterbauer</surname>
          </string-name>
          . W.,
          <string-name>
            <surname>Halpern</surname>
            ,
            <given-names>J. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            <given-names>K. F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Suciu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Causality in Databases</article-title>
          .
          <source>IEEE Data Eng. Bull</source>
          ,
          <year>2010</year>
          ,
          <volume>33</volume>
          (
          <issue>3</issue>
          ):
          <fpage>59</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>A Theory of Diagnosis from First Principles</article-title>
          .
          <source>Arti cial Intelligence</source>
          ,
          <year>1987</year>
          ,
          <volume>32</volume>
          (
          <issue>1</issue>
          ):
          <fpage>57</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Towards</surname>
          </string-name>
          <article-title>a Logical Reconstruction of Relational Database Theory</article-title>
          . In On Conceptual Modelling,
          <string-name>
            <given-names>M.L.</given-names>
            <surname>Brodie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mylopoulos</surname>
          </string-name>
          and
          <string-name>
            <surname>J.W.</surname>
          </string-name>
          Schmidt (eds.), Springer,
          <year>1984</year>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Tannen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Provenance Propagation in Complex Queries</article-title>
          . In Buneman Festschrift,
          <year>2013</year>
          , Springer LNCS 8000, pp.
          <fpage>483</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>