<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GDE: General Data Exchange with Schema and Data Level Mappings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rana Awada</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iluju Kiringa</string-name>
          <email>kiringa@site.uottawa.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Ottawa</institution>
          ,
          <addr-line>SITE</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Data exchange (DE) [5, 3] and data coordination [1, 2, 6] are two important settings that were introduced previously in the literature to resolve the problem of integrating information that resides in di erent sources. A DE setting moves data residing in independent applications, which refer to the same object using the same name, and accesses it through a new target schema. However, a data coordination setting allows the access of data residing in independent sources and that possibly belong to di erent sets of vocabularies, without necessarily exchanging it and while maintaining autonomy. Although a data coordination setting provides users with an amalgamated view of related information, this solution is not enough for applications that require a view of related information using a uni ed set of vocabularies for periodic reporting and decision making. We introduce a general data exchange (GDE) setting that extends DE settings to allow collaboration at the instance level, using a mapping table M , that speci es for each constant value in the source, the set of related (or corresponding) constant values in the target.1 We show in this paper that a GDE setting can be formalized using the knowledge exchange framework introduced in [4]. It allows us to store a target knowledge base (KB) which consists of a subset of the explicit data exchanged that is necessary to infer the full set of exchanged information using a set t of FO sentences. We identify in our work the class of \best" KBs to materialize and we de ne the set of certain answers. A (DE) setting [5, 3] is a tuple S = (S; T; st), where S is a source schema, T is a target schema, S and T do not have predicate symbols in common, and st consists of a set of source-to-target tuple-generating dependencies (st-tgds) that establish the relationship between source and target schemas. A st-tgd is a FOsentence of the form: 8x8y ( (x; y) ! 9z (x; z)); where (x; y) and (x; z) are conjunctions of relational atoms over S and T respectively. Let Const and Var be in nite and disjoint sets of constants and nulls, respectively. We consider in our</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>1 We consider in this work a particular interpretation of related data in a mapping table; that is,
a source element is always uniquely identi ed by at least one target element.
work \complete" source instances I of S, where it holds that dom(I) Const
and do not contain missing data in the form of nulls. However, a target instance
J of T, is allowed to contain null values, and it holds that dom(J ) Const [ Var;</p>
      <p>
        A knowledge base [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] over a schema R is a pair (K; ), where K is an instance
of R (the explicit data) and is a set of logical sentences over R (the implicit
data). The set of models of (K; ), denoted by Mod(K; ), is de ned as the set
of instances of R that contain the explicit data in K and the implicit data in ;
that is, Mod(K; ) corresponds to the set fK0 j K0 is an instance of R, K K0
and K0 j= g. From now on, KR0 denotes the restriction of instance K to a
subset R0 of its schema R.
      </p>
      <p>
        Mapping tables [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are mechanisms that establish how values from di erent
domains correspond. In its simplest form, given two domains D1 and D2, not
necessarily disjoint, a mapping table over (D1; D2) is a subset of D1 D2. Let
ConstS and ConstT be the sets of source and target constants respectively. We
consider in our work mapping tables with the following property: for each value
a 2 ConstS \ dom(M ), there exists at least a single target value a0 2 ConstT \
dom(M ) such that M (a; a0) holds, and there does not exist a source value b 2
ConstS \ dom(M ), where b is di erent than a and M (b; a0) holds. We say a0
uniquely identi es a in M . We de ne C as the set of values in dom(M ) \ ConstT
that uniquely identify source values mapped in M .
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>GDE a Knowledge Exchange System</title>
      <p>A GDE setting S = (S; T; M; st) extends a DE setting with (1) a binary
relation symbol M that appears neither in S nor in T, and that is called a
source-to-target mapping; and (2) st that consists of a set of mapping st-tgds,
which are FO sentences of the form: 8x8y8z ( (x; y) ^ (x; z) ! 9w (z; w));
where (a) (x; y) and (z; w) are conjunctions of relation symbols over S and T
respectively, and (b) (x; z) is a conjunction of relation symbols that only use
the st-mapping relation symbol M. We denote st-mapping tables by M .</p>
      <p>
        In a GDE setting, source KBs are of the form ((I [ fM g); s = ;), which
correspond to data in the source instance I and the st-mapping table M . On
the other hand, the target KBs are of the form ((J [ fM g); t) where t is a
set of FO sentences, of type f ull tgds (which are tgds that do not use
existential quantication). We formalize the notion of a (universal) GDE KB-solution,
extending the notion of knowledge exchange (universal) solution in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to allow
coordinating the source and target information provided by M , as follows:
1. J is a GDE KB-solution for I and M under S, if for every K 2 Mod((J [
fM g); t) there is K0 2 Mod((I [ fM g); s = ;)) such that the following
hold: (a) KM0 KM , and (b) ((KS0 [ KM0 ); KT) st.
2. Also, J is a universal GDE KB-solution (UGDE) for I and M under S, if J is
a GDE KB-solution, and for every K0 2 Mod((I [fM g); s = ;) there is K 2
Mod((J [fM g; t) such that (a) KM KM0 , and (b) ((KS0[KM0 ); KT) st.
      </p>
      <p>Intuitively, in a GDE setting S, C is the sole set of target values that can
capture correctly the set of source values exchanged to a target instance.
Therefore, intuitively a GDE KB-solution J in S has a domain dom(J ) C [ Var.
We de ne t as the following set of full tgds over a schema T [ fM; Relatedg,
where Related is a fresh binary table:
1. For each T 2 T [ fMg of arity n and 1 i n:</p>
      <p>8x1 8xn(T (x1; : : : ; xi; : : : ; xn) ! Related(xi; xi)).
2. 8x8y8z(M(z; x) ^ M(z; y) ^ C(x) ! Related(x; y)).
3. For each T 2 T of arity n:
8x1; y1 8xn; yn (T (x1; : : : ; xn) ^ Vin=1 Related(xi; yi) ! T (y1; : : : ; yn)).</p>
      <p>
        In a GDE setting, we de ne \best" solutions formally following [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] as: Let
S be a GDE setting, I be a source instance, M an st-mapping table, and J a
UGDE solution for I and M under S. Then J is a minimal UGDE solution, if
(1) there is no proper subset J 0 of J such that J 0 is a UGDE solution for I and
M under S, and (2) there is no UGDE solution J 0 such that dom(J 0) \ ConstT
is properly contained in dom(J ) \ ConstT . Also, given a xed GDE setting,
generating UGDE solutions and minimal UGDE solutions is in Logspace.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Query Answering</title>
      <p>We adapt the notion of a certain answer in the usual DE setting to the GDE
setting. Formally, let S be a GDE setting, I a source instance, M an st-mapping
table, and Q a conjunctive query over T. The set of certain answers of Q over I
and M and under S, denoted certainS((I [ fM g); Q), corresponds to the set of
tuples of constants that belong to the evaluation of Q over KT, for each GDE
KB-solution J for I and M and K 2 Mod((J [ fMg); t). Finally, generating
certainS((I [ fM g); Q) is in Logspace.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Future Work</title>
      <p>An interesting extension for this work would be de ning a GDE setting with
a target that contains egds and tgds constraints. Also, investigating GDE in a
peer-to-peer setting might add interesting challenges to the problem.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pottinger</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staub-French</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Data Coordination: Supporting Contingent Updates</article-title>
          . In: VLDB (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Philip</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bernstein</surname>
          </string-name>
          , Fausto Giunchiglia , Anastasios Kementsietsidis , John Mylopoulos , Luciano Sera ni , llya Zaihrayeu :
          <article-title>Data Management for Peer-to-Peer Computing: A Vision</article-title>
          . pp.
          <volume>89</volume>
          {
          <issue>94</issue>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barcelo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Libkin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murlak</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Relational and XML Data Exchange</article-title>
          . Morgan &amp; Claypool Publishers, (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reutter</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Data exchange beyond complete data</article-title>
          .
          <source>In : PODS</source>
          , pp.
          <volume>83</volume>
          {
          <issue>94</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data exchange: semantics and query answering</article-title>
          .
          <source>In: Theoretical Computer Science</source>
          , pp.
          <volume>89</volume>
          {
          <issue>124</issue>
          (
          <year>2005</year>
          )
          <volume>31</volume>
          (
          <issue>4</issue>
          ), pp.
          <volume>761</volume>
          {
          <issue>791</issue>
          (
          <year>1984</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kementsietsidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          :
          <article-title>Mapping data in peer-to-peer systems: Semantics and algorithmic issues</article-title>
          .
          <source>In: SIGMOD</source>
          , pp.
          <volume>325</volume>
          {
          <issue>336</issue>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>