<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Provenance for Database Transformations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Val Tannen</string-name>
          <email>val@cis.upenn.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Pennsylvania</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Database transformations (queries, views, mappings) take apart, lter, and
recombine source data in order to populate warehouses, materialize views, and
provide inputs to analysis tools. As they do so, applications often need to track
the relationship between parts and pieces of the sources and parts and pieces of
the transformations’ output. This relationship is what we call database
provenance.</p>
      <p>This tutorial presents an approach to database provenance that relies on two
observations. First, provenance is a kind of annotation, and we can develop a
general approach to annotation propagation that also covers other applications,
for example to uncertainty and access control. In fact, provenance turns out
to be the most general kind of such annotation, in a precise and practically
useful sense. Second, the propagation of annotation through a broad class of
transformations relies on just two operations: one when annotations are jointly
used and one when they are used alternatively. This leads to annotations forming
a speci c algebraic structure, a commutative semiring.</p>
      <p>The semiring approach works for annotating tuples, eld values and
attributes in standard relations, in nested relations (complex values), and for
annotating nodes in (unordered) XML. It works for transformations expressed in
the positive fragment of relational algebra, nested relational calculus, unordered
XQuery, as well as for Datalog, GLAV schema mappings, and tgd constraints.
Speci c semirings correspond to earlier approaches to provenance, while others
correspond to forms of uncertainty, trust, cost, and access control.</p>
      <p>This is joint work with J.N. Foster, T.J. Green, Z. Ives, and G. Karvounarakis,
done in part within the frameworks of the Orchestra and pPOD projects.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>