<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Tanja Auge Supervised by Prof. Andreas Heuer University of Rostock</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>Research data management deals with tracking and archiving of data collected during scienti c projects, experiments or observations. The path from data collection to publication should thus be kept comprehensible, reconstructable and plausible. The continuous growth of data, frequent schema changes as well as the varied evaluation of the data makes the storage of every possible database state a very complicated and lengthy task. With the help of data provenance, however, we can determine which part of the primary research data must be stored long-term in order to ensure the reproducibility of the evaluations. It should also be possible to recalculate changes to data and schemata so that old data records do not have to be archived completely. In addition, the stored data must not con ict with existing privacy guidelines.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>The presentation and publication of research results
increasingly requires the publication of the corresponding
research data, which ensures the ndability, accessibility,
interoperability, and reusability in the sense of FAIR Data
Principles1. In our research, we concentrate on the
structured data, e.g. resulting from measurement series,
experiments, or always-on sensors, stored in a relational database.
The FAIR principle does not necessarily mean, that the
entire database of a project has to be stored and published,
but only that part of the database which is necessary for the
traceability and/or reproducibility of the respective
publication. The di culty now is to generate exactly this minimal
part of the original research database, called sub-database.
The need for the data reduction can be due to high costs
in collecting or evaluating the data (expensive or
elaborately produced), to privacy aspects when evaluating
personal data, or to intellectual property preservation. In our
case, the evaluation of the research database is restricted to a</p>
      <p>I*
J*
Provenance</p>
      <p>Querynew</p>
      <p>Provenancenew
relational query language, starting from conjunctive queries,
and adding arithmetic or aggregation functions lateron.</p>
      <p>
        Thus, our goal is not only to store the evaluation query
and the query result itself, but also the relevant source data.
However, if data and/or schema change frequently, the
original database must be "frozen" and saved after each
evaluation carried out on the dataset. To avoid this, we use
provenance management techniques [
        <xref ref-type="bibr" rid="ref6 ref7">7, 6</xref>
        ] to calculate the
subdatabase before (red highlighted) or after evolution (blue
highlighted) required to reproduce the query result (green
highlighted in Figure 1).
      </p>
      <p>After the new general data protection regulation (GDPR)
becoming valid, it was apparent that the storage of research
data, even without containing personal data, may fall under
the aspect of privacy. Reasons for the additional privacy
requirements are high costs, a lot of time as well as the great
e ort required to collect the research data. This implies a
natural con ict of interest between publishing original data
(provenance) and protecting these data (privacy) for reasons
of competition.</p>
      <p>All in all, this results in three central research questions,
which are summarized in Figure 1:
(I.) How to calculate the minimal part of the original
research database that has to be stored permanently to
achieve replicable research? (red highlighted)
(II.) How to unify the theories behind data provenance and
schema evolution? (blue highlighted)
(III.) How to combine Data Provenance and Privacy aspects
in the case of query inversion? (locked tuple)</p>
    </sec>
    <sec id="sec-2">
      <title>PROBLEM DESCRIPTION</title>
      <p>
        Let us take a more detailed look at the di erent research
questions. A detailed description of the problems (I.) and
(II.) can be found in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the course of the PhD project
the third question (III.) arose, which illustrates the practical
relevance of the con ict between provenance and privacy.
Calculation of a minimal Sub-database (Figure 2).
Provided that the query result (green highlighted) and the
evaluation query is archived, one speci c problem is, to
determine the minimal (additional) information that is required
for the reconstruction of the sub-database (red highlighted).
Occasionally, we have to save entire tuples or parts of the
database directly. Using provenance management, we can
specify this necessary information. The so calculated
minimal sub-database is able to reconstruct the results of the
evaluation query under the following boundary conditions:
(1) The number of tuples of the original is retained, (2) the
sub-database can be homomorphically mapped to the
original database, and (3) the sub-database is an intensional
description of the original database.
      </p>
      <p>Data
Provenance</p>
      <p>Query
Unification of Provenance and Evolution (Figure 3).
Previous provenance queries have usually been processed
on a given xed database and an evaluation query. The
combination of data provenance with schema and data
evolution should enable the evaluation of provenance queries
with changing schemata. Under evolution the new query
evaluation (green dotted) can be directly calculated as a
composition of the original query evaluation and the inverse
evolution (black). It is therefore su cient to memorize one
of the two minimal sub-databases I (red highlighted) or J
(blue highlighted), the other sub-database can be calculated
with the help of the inverse.
Privacy in the case of Query Inversion (Figure 4). The
determination of a sub-database (red highlighted) based on
the query result (green highlighted) is not always possible or
permitted. For example, aggregated data cannot be inverted
without storing additional information. Personal data, on
the other hand, may not be published without a certain
anonymization (see locked tuple). It is therefore necessary
to generate a partial or generalized database that satis es
the provenance criteria on the one hand and does not
contradict the privacy aspect on the other. This implies a natural
con ict of interest between publishing original data
(provenance) and protecting these data (privacy).</p>
      <p>Data
Provenance</p>
      <p>
        Query
Dependencies. The best known conditions are key
dependencies, functional dependencies (FD) or join dependencies
(JD). These can be extended to much more general
dependencies called (source-to-target) tuple generating
dependencies ((s-t) tgd) and equality generating dependencies (egd).
While s-t tgds are used as a kind of inter-database
dependencies, tgds { an s-t tgd on only one database schema {
as well as egds can be seen as intra-database dependencies
representing integrity constraints within a database [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
CHASE. The CHASE is a procedure that modi es a given
object by incorporating a parameter ?. We represent
this by: chase?( ) = ? : While the object can represent
both queries and instances, we understand the parameter ?
as set of dependencies like (s-t) tgds and/or egds. There
are already rst approaches to generalize the CHASE to
(arbitrary) objects and parameters [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In our use case s-t tgds create new tuples and tgds/egds
clean the database by replacing null values until the CHASEd
database satis es all given dependencies [
        <xref ref-type="bibr" rid="ref13 ref5">13, 5</xref>
        ]. The CHASE
on instances can be used for data exchange, data integration,
query answering on incomplete databases, or data cleaning,
among others.
      </p>
      <p>
        CHASE-inverse. A CHASE-inverse is an inverse function
calculated via the CHASE algorithm. Weaker variants like
the relaxed CHASE-inverse [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], tuple-preserving relaxed and
result equivalent CHASE-inverse [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] do not guarantee an
exact and unique inverse.
      </p>
      <p>
        Data Provenance. Given a database instance I and an
evaluation query Q, data provenance describes (1) where a
result tuple r does come from (where-provenance), (2) why
and (3) how r exists in the result Q(I). Why -provenance [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
speci es a witness base that identi es the tuples involved in
the calculation of r. The question of how a result tuple r
is calculated is answered by how -provenance using
provenance polynomials. These polynomials give a concrete
calculation of r. They are de ned by a commutative semi-ring
(N[X]; +; ; 0; 1) with + for union and projection as well as
for natural join [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Why and where can be derived from the result of the
how -provenance. For this we can de ne a reduction based
on the information content: where why how.
Therefore, we often only concentrate on how -provenance. When
including privacy aspects, however, the why - and
whereprovenance should not be neglected.</p>
      <p>
        Provenance under Schema Evolution. The description
of schema development using schema modi cation
operators (SMO) such as CREATE table, ADD or DROP column
enables schema and corresponding data changes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. First
approaches to combining schema evolution and provenance are
given in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], the latter supporting a total of three
types of provenance queries: (1) data provenance queries,
(2) schema provenance queries and (3) statistics queries.
Privacy. Privacy refers the protection of (personal) data
against unauthorized collection, storage and publication.
Important criteria in this context are for example k-anonymity
and l-diversity [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. These are necessary, since a tuple can
often be uniquely identi ed by apparently harmless attributes,
so-called quasi-identi ers. The goal of our research is to
reconstruct the original database as accurately as possible.
This implies a natural con ict of interest between publishing
original data (provenance) and protecting these data
(privacy) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>PREVIOUS RESULTS</title>
      <p>
        In order to unify the di erent theories, we represent
evaluation queries, provenance queries and evolution functions as
s-t tgds, so that the CHASE algorithm can be applied as a
technique. The CHASE is thus a formalization of the
evaluation as well as evolution of the research database. In a
second step, called BACKCHASE process, we use the CHASE
again to generate a provenance query based on the result of
the evaluation query. Our theories of inverse functions can
be developed from the already existing work of Fagin [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Calculation of a minimal Sub-database. Let I be a
database instance, M a schema mapping de ned by a s-t tgd and
I = chaseM (chaseM(I)) the minimal sub-database
calculated by applying CHASE twice, red highlighted in
Figure 2. While an exact CHASE-inverse always reconstructs
the original database itself, weaker variants only require
data exchange equivalence and the existence of a
homomorphism between I and I. To preserve the number of tuples
we de ne the tuple preserving relaxed CHASE-inverse
(tprelaxed). To specify a CHASE-inverse for aggregated
functions as well, we de ne the result equivalent CHASE-inverse
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Both de nitions are based on the theory of Fagin [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        The result equivalent CHASE-inverse is therefore the
weakest CHASE-inverse. Overall, this results in the reduction
result equivalent
relaxed
tp-relaxed
exact;
which forms the su cient conditions for the existence of
a CHASE-inverse. The necessary conditions, on the other
hand, refer to the existence of homomorphisms, an equal
number of tuples as well as result equivalence [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        An exact (=), (tp-)relaxed ( tp) or result equivalent ($)
CHASE-inverse can be speci ed for each basic operation
like , ./, or AVG. Adding provenance information such as
provenance polynomials [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and (minimal) witness bases [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
allows the speci cation of stronger CHASE-inverse schema
mappings then without. Thus, in the case of a
projection, formalized as R(a; b; c) ! S(a; c), the inverse function
S(a; c) ! 9d : R(a; d; c) is tp-relaxed instead of relaxed. For
other operations such as selection on the other hand, the
inverse type cannot be improved despite additional
information [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Unification of Provenance and Evolution. Given a
database instance I and its evolution J , we can di erentiate
between 15 schema modi cations like CREATE table, ADD or
DROP column. Theses modi cations can be formalized
using the schema modi cation operators de ned in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We
examined the most common schema modi cation operators
with reference to their CHASE-inverses and extend them
with why - and how -provenance as well as additional
annotations. The most common operators are DECOMPOSE, JOIN
and MERGE Table as well as MERGE Column. Currently we
are evaluating the remaining 11 operators for their
CHASEinverse functions without and with using data provenance.
      </p>
      <p>
        Among other things, we found that in some research
institutions the SMOs de ned by Zaniolo et al. are not
sufcient [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In corporation with the Leibniz Institute for
Baltic Sea Research Warnemunde we were able to
determine that their schema modi cations contain a lot of
merging and splitting operations. We therefore de ne two
operators MERGE Column and SPLIT Column as sequence of ADD
and DROP operations. These merge function can be
formalized as R(a; b; c) ! S(b; f (a; c)) with an inverse function
S(b; f (a; c)) ! 9d; e : R(d; b; e). Depending on the use or
not use of provenance information we can generate a
tprelaxed or exact CHASE-inverse resulting in a better
reconstructed sub-database.
      </p>
      <p>Privacy in the case of Query Inversion. For us, the term
privacy goes beyond the term of (usually personal) data
protection. Rather, we refer to the protection of research data
in general. Reasons for protecting research data include
economic (company protection), personal (personal data) or
nancial aspects. The creation of such data is often
timeconsuming and expensive. The identi cation of personal or
internal company information should also be strictly
prevented.</p>
      <p>
        When reconstructing the minimal sub-database only those
tuples may be reconstructed which do not contradict privacy
aspects. Depending on the selected provenance, di erent
data protection problems have to be considered [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]: (1)
Using relation names as where-provenance, there is generally
not enough data worth protecting and reproducibility of the
data is not guaranteed. Data protection aspects are
therefore negligible. (2) In the case of why, we may encounter
privacy problems, if the variance of the distribution of
attribute values is equal to zero. However, this only applies
for special cases not known to the user interpreting the
results of the provenance queries. (3) How -provenance often
calculates too much recoverable information, so that privacy
aspects are likely to be a major problem with this approach.
(4) If we interpret where as tuple names and we save not
only the scheme but the tuple itself, this can lead to major
privacy problems. However, this second where approach is
subject of our current work.
      </p>
      <p>For solving problems generated by the di erent
provenance queries, di erent approaches such as generalization
and suppression, permutation of attribute values, di
erential privacy, and intensional (instead of extensional) answers
have been developed. The next step is now to examine
them for their compatibility with where-, why - and how
provenance.</p>
      <p>
        Generalization of the CHASE. We are currently working
on adapting the CHASE variant presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to
(arbitrary) objects and parameters ?. Initial approaches to
this are described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. So the parameter ? is always
represented as a intra- or inter-database dependency and the
prede ned hierarchy of dependencies JD tgd s-t tgd
and FD egd allows to display all dependencies as s-t tgds
respectively egds. Views can also be displayed as such
dependencies. This allows the usage of a general parameter
for all today's relevant CHASE applications like semantic
optimization, answering queries using views, data exchange
and data cleaning, query rewriting and many more.
      </p>
      <p>
        The CHASE object is either a query Q or a database
instance I. In both cases variables/null values can be replaced
by other variables/null values or constants. The variable
substitution depends on certain conditions shown in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our
goal here is to develop a tool that execute multiple CHASE
applications. For the best of our knowledge, all currently
existing tools are always designed for only one use case.
      </p>
    </sec>
    <sec id="sec-4">
      <title>FUTURE WORK</title>
      <p>Both in the calculation of a minimal sub-database and in
the uni cation of provenance and evolution, there are still
open questions to be answered. First, a concrete
BACKCHASE process must be de ned using provenance
polynomials and witness bases. Secondly, we have to evaluate the
remaining 11 SMOs for their CHASE-inverse functions
without and with the use of data provenance. Further, we need
to examine the di erent privacy approaches for their
compatibility with where-, why -, and how -provenance.</p>
      <p>In addition to these speci c questions, we also want to
know whether our results are applicable for other use cases
than research data management. And of course we hope to
further develop the theory of generalized CHASE a little bit.</p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION</title>
      <p>We presented the current status and plans to extend our
work in the eld of provenance management considering
evolution and privacy aspects. We de ned the tp-relaxed
and result equivalent CHASE-inverse, examined the di
erent evaluation and evolution operators for their inverse types
without and with using data provenance and started to
critically study the correlation between provenance and privacy.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <p>Special thanks go to my PhD supervisor Andreas Heuer
and my mentor Goetz Graefe as well as to my colleagues
from the database chair of the University of Rostock for
their support during my PhD studies so far. Thanks to the
Leibniz Institute for Baltic Sea Research Warnemunde for
providing their research data and thanks also to my
students for the many interesting discussions about privacy,
provenance, evolution and the CHASE algorithm.
8.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Auge</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Heuer</surname>
          </string-name>
          .
          <article-title>Combining Provenance Management and Schema Evolution</article-title>
          . In IPAW, volume
          <volume>11017</volume>
          of Lecture Notes in Computer Science, pages
          <volume>222</volume>
          {
          <fpage>225</fpage>
          . Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Auge</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Heuer</surname>
          </string-name>
          .
          <article-title>The Theory behind Minimizing Research Data | Result equivalent CHASE-inverse Mappings</article-title>
          .
          <source>In LWDA</source>
          , volume
          <volume>2191</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <fpage>1</fpage>
          <lpage>{</lpage>
          12. CEUR-WS.org,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Auge</surname>
          </string-name>
          and
          <string-name>
            <surname>A. Heuer.</surname>
          </string-name>
          <article-title>ProSA | Using the CHASE for Provenance Management</article-title>
          .
          <source>In ADBIS</source>
          , volume
          <volume>11695</volume>
          of Lecture Notes in Computer Science, pages
          <volume>357</volume>
          {
          <fpage>372</fpage>
          . Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Auge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Scharlau</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Heuer</surname>
          </string-name>
          .
          <source>Privacy Aspects of Provenance Queries. Accepted for ProvenanceWeek</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Benedikt</surname>
          </string-name>
          , G. Konstantinidis,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mecca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Santoro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Tsamoura</surname>
          </string-name>
          .
          <article-title>Benchmarking the Chase</article-title>
          .
          <source>In PODS</source>
          , pages
          <volume>37</volume>
          {
          <fpage>52</fpage>
          . ACM,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buneman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanna</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Why and Where: A Characterization of Data Provenance</article-title>
          .
          <source>In ICDT</source>
          , volume
          <year>1973</year>
          , pages
          <fpage>316</fpage>
          {
          <fpage>330</fpage>
          . Springer,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cheney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiticariu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Tan</surname>
          </string-name>
          . Provenance in Databases: Why, How, and
          <string-name>
            <surname>Where</surname>
          </string-name>
          . Foundations and Trends in Databases,
          <volume>1</volume>
          (
          <issue>4</issue>
          ):
          <volume>379</volume>
          {
          <fpage>474</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Curino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Moon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deutsch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          .
          <article-title>Update rewriting and integrity constraint maintenance in a schema evolution support system: PRISM++</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <volume>117</volume>
          {
          <fpage>128</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Kolaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Popa</surname>
          </string-name>
          . Data Exchange: Semantics and
          <string-name>
            <given-names>Query</given-names>
            <surname>Answering</surname>
          </string-name>
          .
          <source>Theor. Comput. Sci.</source>
          ,
          <volume>336</volume>
          (
          <issue>1</issue>
          ):
          <volume>89</volume>
          {
          <fpage>124</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Kolaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Popa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Schema Mapping Evolution Through Composition and Inversion</article-title>
          .
          <source>In Schema Matching and Mapping</source>
          , pages
          <volume>191</volume>
          {
          <fpage>222</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gao</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          .
          <article-title>Provenance Management in Databases Under Schema Evolution</article-title>
          . In TaPP.
          <source>USENIX Association</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Glavic</surname>
          </string-name>
          , G. Alonso,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Haas</surname>
          </string-name>
          .
          <article-title>TRAMP: Understanding the Behavior of Schema Mappings through Provenance</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <volume>1314</volume>
          {
          <fpage>1325</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Molinaro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Spezzano</surname>
          </string-name>
          .
          <article-title>Incomplete Data and Data Dependencies in Relational Databases</article-title>
          .
          <source>Synthesis Lectures on Data Management</source>
          . Morgan &amp; Claypool Publishers,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Karvounarakis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tannen</surname>
          </string-name>
          .
          <article-title>Provenance semirings</article-title>
          .
          <source>In PODS</source>
          , pages
          <volume>31</volume>
          {
          <fpage>40</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Samarati</surname>
          </string-name>
          . Protecting Respondents'
          <article-title>Identities in Microdata Release</article-title>
          .
          <source>IEEE Trans. Knowl</source>
          . Data Eng.,
          <volume>13</volume>
          (
          <issue>6</issue>
          ):
          <volume>1010</volume>
          {
          <fpage>1027</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>