<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">GDE: General Data Exchange with Schema and Data Level Mappings</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rana</forename><surname>Awada</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">SITE</orgName>
								<orgName type="institution">University of Ottawa</orgName>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Iluju</forename><surname>Kiringa</surname></persName>
							<email>kiringa@site.uottawa.ca</email>
							<affiliation key="aff0">
								<orgName type="department">SITE</orgName>
								<orgName type="institution">University of Ottawa</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">GDE: General Data Exchange with Schema and Data Level Mappings</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DAD991C4BA05BFBCB2FA0F5FA847E9E5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T03:41+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Data exchange (DE) <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b2">3]</ref> and data coordination <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b5">6]</ref> are two important settings that were introduced previously in the literature to resolve the problem of integrating information that resides in different sources. A DE setting moves data residing in independent applications, which refer to the same object using the same name, and accesses it through a new target schema. However, a data coordination setting allows the access of data residing in independent sources and that possibly belong to different sets of vocabularies, without necessarily exchanging it and while maintaining autonomy.</p><p>Although a data coordination setting provides users with an amalgamated view of related information, this solution is not enough for applications that require a view of related information using a unified set of vocabularies for periodic reporting and decision making. We introduce a general data exchange (GDE) setting that extends DE settings to allow collaboration at the instance level, using a mapping table M , that specifies for each constant value in the source, the set of related (or corresponding) constant values in the target. <ref type="foot" target="#foot_0">1</ref>We show in this paper that a GDE setting can be formalized using the knowledge exchange framework introduced in <ref type="bibr" target="#b3">[4]</ref>. It allows us to store a target knowledge base (KB) which consists of a subset of the explicit data exchanged that is necessary to infer the full set of exchanged information using a set Σ t of FO sentences. We identify in our work the class of "best" KBs to materialize and we define the set of certain answers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Preliminaries</head><p>A (DE) setting <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b2">3]</ref> is a tuple S = (S, T, Σ st ), where S is a source schema, T is a target schema, S and T do not have predicate symbols in common, and Σ st consists of a set of source-to-target tuple-generating dependencies (st-tgds) that establish the relationship between source and target schemas. A st-tgd is a FOsentence of the form: ∀x∀ȳ (φ(x, ȳ) → ∃z ψ(x, z)), where φ(x, ȳ) and ψ(x, z) are conjunctions of relational atoms over S and T respectively. Let Const and Var be infinite and disjoint sets of constants and nulls, respectively. We consider in our work "complete" source instances I of S, where it holds that dom(I) ⊆ Const and do not contain missing data in the form of nulls. However, a target instance J of T, is allowed to contain null values, and it holds that dom(J) ⊆ Const ∪ Var;</p><p>A knowledge base <ref type="bibr" target="#b3">[4]</ref> over a schema R is a pair (K, Σ), where K is an instance of R (the explicit data) and Σ is a set of logical sentences over R (the implicit data). The set of models of (K, Σ), denoted by Mod(K, Σ), is defined as the set of instances of R that contain the explicit data in K and the implicit data in Σ; that is, Mod(K, Σ) corresponds to the set {K | K is an instance of R, K ⊆ K and K |= Σ }. From now on, K R denotes the restriction of instance K to a subset R of its schema R.</p><p>Mapping tables <ref type="bibr" target="#b5">[6]</ref> are mechanisms that establish how values from different domains correspond. In its simplest form, given two domains D 1 and D 2 , not necessarily disjoint, a mapping table over (D 1 , D 2 ) is a subset of D 1 × D 2 . Let Const S and Const T be the sets of source and target constants respectively. We consider in our work mapping tables with the following property: for each value a ∈ Const S ∩ dom(M ), there exists at least a single target value a ∈ Const T ∩ dom(M ) such that M (a, a ) holds, and there does not exist a source value b ∈ Const S ∩ dom(M ), where b is different than a and M (b, a ) holds. We say a uniquely identifies a in M . We define C as the set of values in dom(M ) ∩ Const T that uniquely identify source values mapped in M .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">GDE a Knowledge Exchange System</head><p>A GDE setting S = (S, T, M, Σ st ) extends a DE setting with (1) a binary relation symbol M that appears neither in S nor in T, and that is called a source-to-target mapping; and (2) Σ st that consists of a set of mapping st-tgds, which are FO sentences of the form: ∀x∀ȳ∀z (φ(x, ȳ) ∧ µ(x, z) → ∃ w ψ(z, w)), where (a) φ(x, ȳ) and ψ(z, w) are conjunctions of relation symbols over S and T respectively, and (b) µ(x, z) is a conjunction of relation symbols that only use the st-mapping relation symbol M. We denote st-mapping tables by M .</p><p>In a GDE setting, source KBs are of the form ((I ∪ {M }), Σ s = ∅), which correspond to data in the source instance I and the st-mapping table M . On the other hand, the target KBs are of the form ((J ∪ {M }), Σ t ) where Σ t is a set of FO sentences, of type f ull tgds (which are tgds that do not use existential quantication). We formalize the notion of a (universal) GDE KB-solution, extending the notion of knowledge exchange (universal) solution in <ref type="bibr" target="#b3">[4]</ref> to allow coordinating the source and target information provided by M , as follows:</p><formula xml:id="formula_0">1. J is a GDE KB-solution for I and M under S, if for every K ∈ Mod((J ∪ {M }), Σ t ) there is K ∈ Mod((I ∪ {M }), Σ s = ∅)) such that the following hold: (a) K M ⊆ K M , and (b) ((K S ∪ K M ), K T ) Σ st . 2.</formula><p>Also, J is a universal GDE KB-solution (UGDE) for I and M under S, if J is a GDE KB-solution, and for every K ∈ Mod((</p><formula xml:id="formula_1">I ∪{M }), Σ s = ∅) there is K ∈ Mod((J ∪{M }, Σ t ) such that (a) K M ⊆ K M , and (b) ((K S ∪K M ), K T ) Σ st .</formula><p>Intuitively, in a GDE setting S, C is the sole set of target values that can capture correctly the set of source values exchanged to a target instance. There-fore, intuitively a GDE KB-solution J in S has a domain dom(J) ⊆ C ∪ Var. We define Σ t as the following set of full tgds over a schema T ∪ {M, Related}, where Related is a fresh binary table <ref type="table">:</ref> 1. For each T ∈ T ∪ {M} of arity n and 1 ≤ i ≤ n:</p><formula xml:id="formula_2">∀x 1 • • • ∀x n (T (x 1 , . . . , x i , . . . , x n ) → Related(x i , x i )). 2. ∀x∀y∀z(M(z, x) ∧ M(z, y) ∧ C(x) → Related(x, y)). 3. For each T ∈ T of arity n: ∀x 1 , y 1 • • • ∀x n , y n (T (x 1 , . . . , x n ) ∧ n i=1 Related(x i , y i ) → T (y 1 , . . . , y n ))</formula><p>. In a GDE setting, we define "best" solutions formally following <ref type="bibr" target="#b3">[4]</ref> as: Let S be a GDE setting, I be a source instance, M an st-mapping table, and J a UGDE solution for I and M under S. Then J is a minimal UGDE solution, if (1) there is no proper subset J of J such that J is a UGDE solution for I and M under S, and (2) there is no UGDE solution J such that dom(J ) ∩ Const T is properly contained in dom(J) ∩ Const T . Also, given a fixed GDE setting, generating UGDE solutions and minimal UGDE solutions is in Logspace.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Query Answering</head><p>We adapt the notion of a certain answer in the usual DE setting to the GDE setting. Formally, let S be a GDE setting, I a source instance, M an st-mapping table, and Q a conjunctive query over T. The set of certain answers of Q over I and M and under S, denoted certain S ((I ∪ {M }), Q), corresponds to the set of tuples of constants that belong to the evaluation of Q over K T , for each GDE KB-solution J for I and M and K ∈ Mod((J ∪ {M}), Σ t ). Finally, generating certain S ((I ∪ {M }), Q) is in Logspace.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Future Work</head><p>An interesting extension for this work would be defining a GDE setting with a target that contains egds and tgds constraints. Also, investigating GDE in a peer-to-peer setting might add interesting challenges to the problem.</p></div>			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">We consider in this work a particular interpretation of related data in a mapping table; that is, a source element is always uniquely identified by at least one target element.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Data Coordination: Supporting Contingent Updates</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lawrence</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pottinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Staub-French</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>VLDB</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Data Management for Peer-to-Peer Computing: A Vision</title>
		<author>
			<persName><forename type="first">Philip</forename><forename type="middle">A</forename><surname>Bernstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fausto</forename><surname>Giunchiglia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anastasios</forename><surname>Kementsietsidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Mylopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luciano</forename><surname>Serafini</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="89" to="94" />
		</imprint>
	</monogr>
	<note>llya Zaihrayeu</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Relational and XML Data Exchange</title>
		<author>
			<persName><forename type="first">M</forename><surname>Arenas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Barceló</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Libkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Murlak</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
			<publisher>Morgan &amp; Claypool Publishers</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Data exchange beyond complete data</title>
		<author>
			<persName><forename type="first">M</forename><surname>Arenas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Reutter</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>PODS</publisher>
			<biblScope unit="page" from="83" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Data exchange: semantics and query answering</title>
		<author>
			<persName><forename type="first">R</forename><surname>Fagin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G</forename><surname>Kolaitis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Popa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Theoretical Computer Science</title>
		<imprint>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="761" to="791" />
			<date type="published" when="1984">2005. 1984</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Mapping data in peer-to-peer systems: Semantics and algorithmic issues</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kementsietsidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Arenas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Miller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGMOD</title>
				<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="325" to="336" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
