<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Inter Model Data Exchange of Type Information via a Common Type Hierarchy</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrew Smith</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter McBrien</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. Computing, Imperial College London</institution>
          ,
          <addr-line>London SW7 2AZ</addr-line>
        </aff>
      </contrib-group>
      <fpage>307</fpage>
      <lpage>321</lpage>
      <abstract>
        <p>Data exchange between heterogeneous schemas is a di cult problem that becomes more acute if the source and target schemas are from di erent data models. The data type of the objects to be exchanged can be useful information that should be exploited to help the data exchange process. So far little has been done to take advantage of this in inter model data exchange. Using a common data model has been shown to be e ective in data exchange in general. This work aims to show how the common data model approach can be useful speci cally in exchanging type information by use of a common type hierarchy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Fagin et al. de ne data exchange as `the problem of taking data structured
under a source schema and creating an instance of a target schema that
re ects the source data as accurately as possible' [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Data exchange may occur
within a single data model, for example when building a relational data
warehouse from a number of relational databases. However, it is often necessary to
perform inter model data exchange in which two or more di erent data models
are involved, such as when XML is used as data exchange format between a
number of databases, which may use relational, XML, or other models for their
internal storage of data.
      </p>
      <p>
        One of the most challenging aspects of data exchange is how to transform
constraints on the source schema to the target schema. This is particularly
difcult if the source and target schemas come from di erent data models. Type
information forms part of these constraints [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The inter model transformation
of integrity constraints, like primary and foreign keys, has been investigated [
        <xref ref-type="bibr" rid="ref3 ref4">3,
4</xref>
        ] and data types have received attention in the eld of database programming
languages [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], but little has been written about how simple types, i.e. integer,
oat, string etc., should be transformed between models in a data exchange
environment.
      </p>
      <p>
        The solution we present is a graph based logical type hierarchy that can
represent the data types of multiple data models and that links similar data
types from di erent models by de ning bi-directional mappings from the high
level data types to a common, extensible hierarchy of data types. Cardelli
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] points out that the purpose of type systems is to prevent execution errors in
the running of a program. We present a system that attempts to prevent data
errors during data exchange.
      </p>
      <p>The contributions this paper makes are as follows:
{ O er a way of improving the expressiveness and safety of inter model
transformations in a data exchange environment by adding type information.
{ Provide a way of transforming data between schemas whose data types
support di erent values.
{ Show when transformations are illegal, when they need to be checked for
data errors and when they do not, based on the data types of the source and
target constructs.</p>
      <p>The rest of the paper is organised as follows. Section 2 provides motivation
by describing some of the di culties inherent in data type exchange. Section
3 describes the formal de nition of our type system. Section 4 introduces the
AutoMed automatic mediation system which we have used to test our approach.
Section 5 describes how the new type system can be added to AutoMed along
with the de nition of a number of operators that can be applied to the data types
of objects in AutoMed. Section 6 describes a data exchange scenario in AutoMed
using the new type system. Section 7 describes some other approaches to
transformation of data types in a data exchange system. Finally some conclusions are
drawn and some of our on-going work is described.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Motivation</title>
      <p>
        In an inter model data exchange system with a typed target schema it is
necessary to transform the data type of source schema objects to the target schema.
Current solutions make use of simple one to one matching between data types
in di erent models [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. This approach can lead to some problems:
{ The data type of the source object may allow a greater range of values
than that of the target object. This could lead to run-time errors when the
materialised target schema is populated with data from the source schema.
{ Types representing an identical concept may support a di erent range of
values. For example, a boolean in one data modelling language may represent
true as 1 while another may represent it as T. Again populating the target
schema would cause errors. Some way of mapping the data values between
the source and target is required.
      </p>
      <p>
        Our solution is to represent the data types in a data exchange system as a
directed acyclic graph (DAG) and to de ne a number of operations on the
graph that can be used to overcome the problems described above. Hierarchies
exist in some well known type systems, for example, Figure 1 is a portion of
the XML Schema [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] type hierarchy. However, for data exchange purposes, the
familiar tree hierarchy of Figure 1 has several short comings due to the hierarchy
not fully modelling the domain of data values in XML. In particular:
{ Figure 1 does not distinguish between types which are disjoint in their
extent, such as positiveInteger (representing all positive integer values) and
negativeInteger (representing all negative integer values), and those which
overlap, such as int (representing 231::231 1) and unsignedInt
(representing 0::232 1).
      </p>
      <p>In data exchange, a mapping between source and target objects where the
types are disjoint should be ruled illegal, unless an explicit conversion has
been de ned, but a mapping between objects that overlap should be allowed,
with runtime range checking.
{ Figure 1 fails to identify all isa associations in the hierarchy, shown as dashed
lines in the gure. For example, all unsignedShort values (which are in the
range 0::216 1) are a subset of int values and all values in XML can be
represented as string.</p>
      <p>In data exchange, if the source object type is a subset of the target object
type, then the values may be cast without runtime range checking, since the
cast will never fail.</p>
      <p>string
nonPositive</p>
      <p>Integer
negative
Integer</p>
      <p>A particular problem in data exchange is found when transferring between
di erent type systems. Figure 2 shows a portion of a type hierarchy built for
Postgres, showing some of the built-in types, and following the built-in casting
rules, giving some isa relationships. When translating between the types of XML
and Postgres the type systems need to be uni ed. Two di culties arise from the
fact that the modelling of the type systems might be quite di erent:
int
integer
int4
smallint
int2
{ The types might not be named consistently. For example the Postgres integer
(a 4-byte integer in the range 231::231 1) semantically corresponds to the
XML int type, and not the XML integer type. Also the types might not be
stored consistently: in XML all data is held as strings, whilst in Postgres the
integer will be held as a twos complement binary number.
{ There may be no equivalent type in the target model as compared with a
source model type. For example, the bit(n) type of Postgres has no direct
equivalent in XML.</p>
      <p>Hence in the following sections we introduce our own de nition of a type
hierarchy that is suited to data exchange applications. In particular, it builds
hierarchies solely on the range of data values a type may take. We then
demonstrate how mapping rules between schemas in di erent modelling languages may
be made type safe.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data Exchange Type Hierarchy</title>
      <p>To facilitate the precise representation of types in a data exchange environment,
we introduce in De nition 1 a logical type hierarchy to be used to describe the
types of a single or collection data modelling languages in a manner that allows
us to address the problems discussed in the previous section. We will show that
this type hierarchy can also be used to capture some semantics of types that are
speci c to a particular schema.</p>
      <sec id="sec-3-1">
        <title>De nition 1. Type Hierarchy</title>
        <p>A type hierarchy T Hx is a tuple hT ypesx; Extx; =t x; x; 6 \x; M appingxi where:
{ A nite set of type names T ypesx. These will make up the nodes of the
graph.
{ An Extx function that returns a subset of the set of all possible data values
anyT ypex of the hierarchy that is consistent with a type:
t 2 T ypesx ! Extx(t) Extx(anyT ypex)
The type anyT ypex represents the set of data values that a particular
language (or collection of languages) may handle, and has the property that
Extx(anyT ypex) C where C is the set of all data values that may be
handled by the data exchange system.
{ An equality relation, =t x, such that for t; t0 2 T ypesx</p>
        <p>t =t x t0 () Extx(t) = Extx(t0)
{ A partial ordering relation x, such that for t; t0 2 T ypesx
t x t0 () Extx(t) Extx(t0)
When all types t; t0 2 T ypesx that have t =t x t0 are treated as a single node,
the x relation builds the types into a connected directed acyclic graph.
=t x and x together make up the edges of the graph.
{ A disjoint operator, 6 \x, such that for t; t0 2 T ypesx
t 6 \x t0 () Extx(t) \ Extx(t0) = . This implies there is no pathway from
t to t0 in the graph and so we cannot cast between t and t0. If t 6 \x t0 then
any subtype of t will also be disjoint from t0.
{ A set M appingx of mapping tables hta; tb; fhs1a; sb1i; : : : ; hsan; sbnigi, where
ta; tb 2 T ypesx, s1a; : : : ; san are subsets of Ext(ta) and are disjoint. Similarly
sb1; : : : ; sbn are subsets of Ext(tb) and are disjoint.</p>
        <p>We overload M appingx to be used as a function with the following de nition:
M appingx(ta; tb; va) = First(sbn) j
hta; tb; mapi 2 M appingx ^ hsan; sbni 2 map ^ va 2 sa
n
M appingx 1(ta; tb; va) = First(sbn) j</p>
        <p>htb; ta; mapi 2 M appingx ^ hsbn; sani 2 map ^ va 2 san
where First returns the rst element of a set according to a sort order that is
xed for the system. We use M appingx(typea; typeb) to denote the speci c
mapping table that maps typea to typeb.</p>
        <p>Examples 1 and 2 illustrate the values for the type hierarchy for the XML
and Postgres type models.</p>
        <p>Example 1. The type hierarchy for the XML Schema data modelling language,
T Hxml, can be derived from Figure 1 as follows:</p>
        <p>T ypesxml = fanyT ypexml; shortxml; intxml; nonP ositiveIntegerxml;
negativeIntegerxml; stringxml; : : : g
Extxml = fbooleanxml ! f0; 1; true; f alseg;</p>
        <p>shortxml ! f-32768, . . . ,32767g; : : : g
=xml = fanyT ypexml =t stringxmlg
t
xml = fshortxml intxml; unsignedShortxml intxml;
intxml longxml; longxml integerxml;
integerxml stringxml : : : g
6 \xml = fnegativeIntegerxml 6 \ positiveIntegerxml; : : : g
M appingxml = fg
The Postgres type hierarchy in Example 2 is quite di erent. The integer and
varchar branches of the hierarchy are quite distinct, since converting an integer
to a varchar in Postgres requires a mapping function to be used.
Example 2. From the Postgres RDBMS type system illustrated in Figure 2, we
derive the type hierarchy T Hpg as:
T ypespg = fanyT ypepg; booleanpg; boolpg; integerpg; intpg; int4pg;
smallintpg; int2pg; char(n)pg; varchar(n)pg; textpg : : : g
Extpg = fbooleanpg ! f`0',`1',`y',`n',`yes',`no',`t',`f',true,falseg;</p>
        <p>smallintpg ! f-32768; : : : ; 32767g; : : : g
=pg = fbooleanpg =t boolpg; intpg =t integerpg; intpg =t int4pg; : : : g
t
pg = fsmallintpg integerpg; integerpg bigintpg;
bigintpg anyT ype;
varchar(1)pg varchar(2)pg; charpg varcharpg;
varcharpg textpg; textpg anyT ypepg; : : : g
6 \pg = fintegerpg 6 \ varcharpg; booleanpg 6 \ varchar; : : : g
M appingpg = fg</p>
        <sec id="sec-3-1-1">
          <title>Example 3. Type Mapping for Postgres</title>
          <p>During a single model data exchange, it might be found that the boolean
type in one Postgres database should map to 0 or 1 of type smallint in another
Postgres database. This can be achieved by including the mapping table shown
below:
M appingpg(booleanpg; smallintpg) = fhbooleanpg; int2pg;
fhf`0',`n',`no',`f',falseg; f0gi;
hf`1',`y',`yes',`t',trueg; f1gigig
3.1</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>The Common Type Hierarchy</title>
          <p>If the source and target schemas are de ned in di erent data modelling languages
we need a way of linking the type hierarchy de ned for the source model, the
source type hierarchy, to that de ned for the target model, the target type
hierarchy. To do this we introduce the extensible common type hierarchy
(CTH).</p>
          <p>Using the common hierarchy as an intermediary means that we only need to
create one set of associations between the high level data types of a given model
and those of the CTH. We do not need to de ne new associations each time we
are faced with a new target model.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>De nition 2. Common type hierarchy</title>
        <p>T ypesc = fanyT ypec; stringc; charc; integerc; shortc;</p>
        <p>f loatc; realc; booleancg
Extc = fbooleanc ! ftrue; f alseg; integerc ! f</p>
        <p>shortc ! f 32768; : : : ; 32767g; : : : g
t
=c = fg
c = fshortc integerc; integerc anyT ypec; realc
f loatc anyT ypec; charc stringc;
stringc anyT ypec; booleanc anyT ypecg
6 \c = fintegerc 6 \ booleanc; integerc 6 \ f loatc; : : : g
M appingc = fg
231; : : : ; 231</p>
        <p>1g;
f loatc;</p>
        <p>During an inter model data exchange the source and target type hierarchies
will be merged with the CTH to form an inter model type hierarchy. De nition 3
gives a procedure for building such an inter model type hierarchy T Hxc from
a data model's type hierarchy T Hx, and existing inter model type hierarchy
T Him, and a (possibly extended) CTH T Hc.</p>
      </sec>
      <sec id="sec-3-3">
        <title>De nition 3. Inter model type hierarchy</title>
        <p>Merge(T Hx,T Him,T Hc,CMxc)</p>
        <p>T ypesxc := T ypesx [ T ypesim
Extxc := Extx [ Extim
6 \xc := 6 \x [ 6 \im</p>
        <p>xc := x [ im
=xc := =t x [ =im
t t
M appingxc := M appingx [ M appingim [ CMxc
for each tx 2 T ypesx
txc := anyT ypec
for each tc 2 T ypesc
if Ext(tx) Ext(tc) ^ Ext(tc)</p>
        <p>Ext(txc) then txc := tc endif
end
end
if Ext(tx) Ext(txc)
t0c := NewType(tx)
T ypesc := T ypesc [ ft0cg</p>
        <p>c := c [ft0c txcg
txc = t0</p>
        <p>c
for each tc 2 T ypesc</p>
        <p>if Ext(tc) 6 \Ext(t0c) then 6 \c := 6 \c [ ftc 6 \ t0cg endif
end
end
=xc := =t xc [ ftx =t xc txcg
t
return hT ypesxc; Extxc; 6 \xc; =t xc; xc; M appingxci
The function NewType(t) generates a new type t0 such that Ext(t0) = Ext(t).</p>
        <sec id="sec-3-3-1">
          <title>Example 4. Constructing an inter model type hierarchy</title>
          <p>Assume that XML Schema is our source data modelling language, Postgres is
our target modelling language and we use the CTH in De nition 2. We will also
assume the following reduced type hierarchies for XML and Postgres:
T Hxml = hfanyT ypexml; booleanxml; integerxml; negativeIntegerxmlg;
fintegerxml ! f 231; : : : ; 231 1g;
negativeIntegerxml ! f 231; : : : ; 0g;
booleanxml ! f0; 1; true; f alsegg; fg;
fbooleanxml anyT ypexml; integerxml anyT ypexml;
negativeIntegerxml integerxmlg; fg; fgi
T Hpg = hfanyT ypepg; booleanpg; int4pg; smallintpgg;
fint4pg ! f 231; : : : ; 231 1g; smallintpg ! f 32768; : : : ; 32767g;
booleanpg ! f`0',`1',`y',`n',`yes',`no',`t',`f',true,falsegg; fg;
fbooleanpg anyT ypepg; int4pg anyT ypepg; smallintpg int4pgg;
fbooleanpg 6 \ int4pgg; fgi
Using De nition 3 we rst extend the CTH with the XML types to get a inter
model type hierarchy T Him</p>
          <p>T Him = Merge(T Hxml; T Hc; T Hc;</p>
          <p>hbooleanxml; booleanc; fhf0,falseg; ffalsegi; hf1,trueg; ftruegigi)
As a side e ect, T Hc has the following changes:</p>
          <p>T ypesc := T ypesc [ fnegativeIntegercg</p>
          <p>c:= c [fnegativeIntegerc integercg
and the following inter model equalities are found in =t im:
=im = fintegerxml = integerc; negativeIntegerxml =t negativeIntegerc; : : : g
t t
No additional disjoint relations are added. We can now add in the Postgres type
hierarchy to expand the inter model hierarchy to T Him0 :</p>
          <p>T Him0 = Merge(T Hpg; T Him; T Hc; hbooleanpg; booleanc;</p>
          <p>fhf`0',`n',`no',`f',falseg; ffalsegi; hf`1',`y',`yes',`t',trueg; ftruegigi)
No new changes to T Hc occur, but the following additional inter model equalities
are =foiumn0d=infi=nttim4p0g: =t integerc; smallintpg =t shortc; : : : g</p>
          <p>t
Note that the two mapping tables in T Him0 , M appingim0 (booleanxml; booleanc)
and M appingim10 (booleanpg; booleanc) allow us to map from a boolean in XML
to a boolean Postgres.
3.2</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>Reducing constraint checking</title>
          <p>The inter model type hierarchy described above can help us decide whether or
not to add constraints. For example shortxml and smallintpg are both
equivalent to shortc. We can say with con dence that any value from a construct of
type shortxml can be stored in a construct of type smallintpg. If we later
expanded our system to include schemas from Microsoft SQL Server and discover
that smallintms also maps to shortc we would know that values from all three
modelling languages could be safely interchanged without checking.</p>
          <p>This method can also highlight when checking is needed. Any time it is
necessary to move down the merged inter model hierarchy to transform from one high
level type to another we know that checking is necessary. For example assume
we have an XML source schema that contains an element of type integerxml,
equivalent to integerc and a target of a Postgres column of type smallintpg,
equivalent to shortc. We know from De nition 2 that shortc integerc. To get
from integerxml to smallintpg we need to go via integerc and then down the
hierarchy to shortc. The checking is necessary here because we know from the
de nition that Ext(integerc) Ext(shortc).</p>
          <p>It may be necessary to nd a common ancestor in the merged inter model
hierarchy to be able to move between certain source and target data types.
In Example 4 we added negativeIntegerc to the hierarchy. To transform from
negativeIntegerc to shortc we need to go via a type in the merged hierarchy
that is an ancestor of both types, in this case integerc. If no such ancestor, other
than the root node anyT ypec, can be found the type transformation is invalid.
As with the previous example, the step from integerc to shortc is down the
hierarchy so a constraint is generated.</p>
          <p>Finally we can identify illegal castings. If our pathway from source to
target includes data types t and t0 such that t 6 \ t0 then the cast is illegal from
De nition 1.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>AutoMed</title>
      <p>
        AutoMed [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is a data integration and exchange system developed as a joint
project between Imperial College London and Birkbeck College. It has been
used successfully for data integration of schemas from a number of di erent
data models including both XML and relational schemas.
      </p>
      <p>
        AutoMed handles multiple data models de ning the constructs of a higher
level modelling language such as the relational model or XML in a lower level
hypergraph data model (HDM) [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>De nition 4. HDM Schema</title>
        <p>Given a set of N ames that we may use for modelling the real world, an HDM
schema, S, is a triple hN odes; Edges; Consi where:
{ N odes fhhnnii j nn 2 N amesg
i.e. N odes is a set of nodes in the graph, each denoted by its name enclosed
in double chevron marks.
{ Schemes = N odes [ Edges
{ Edges fhhne; s1; : : : ; snii j</p>
        <p>ne 2 N ames[f g^s1 2 Schemes^: : :^sn 2 Schemesg
i.e. Edges is a set of edges in the graph where each edge is denoted by its
name, together with the list of nodes/edges that the edge connects, enclosed
in double chevron marks.
{ Cons fc(s1; : : : ; sn) j c 2 F uncs ^ s1 2 Schemes ^ : : : ^ sn 2 Schemesg
i.e. Cons is a set of boolean-valued functions (i.e. constraints) whose
variables are members of Schemes and where the set of functions F uncs forms
the HDM constraint language.</p>
        <p>The extent of a node or edge is returned by the function ExtS;I where I
is a speci c instance of the schema S. Thus ExtS;I (hhnii) returns the values
associated with node hhnii.</p>
        <p>
          As detailed in [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ], the HDM may be used to model a wide range of higher
level modelling languages, and we develop the techniques in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] to handle types
in translating between one higher level modelling language and another by using
the HDM as a common data model (CDM).
        </p>
        <p>
          Using the simple HDM constructs de ned above, the extensional constructs
of higher level modelling languages can be de ned, based on the HDM. In [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ],
higher level modelling language constructs were classi ed into (1) nodal
constructs, which can exist independently, and map to nodes in the HDM, (2)
linknodal constructs, which must be attached to other constructs, but have some
additional data, which map to a node and an edge in the HDM, and (3)
linking constructs, which associate data values in existing constructs, and map onto
edges in the HDM. For example, in an ER model, entities are nodal constructs,
attributes are link-nodal, and relationships linking constructs. Once the
constructs of the new modelling language have been de ned in the HDM, we can
de ne transformations that allow us to add, delete and rename these constructs.
4.1
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Data Exchange in AutoMed</title>
        <p>In AutoMed, data exchange is done in two phases: rst the transformation
pathways are set up, and second the data values are transformed using those
pathways. This is roughly analogous to compile and run time for a conventional
program. Setting up the pathways is a static operation whose complexity is
linear in the number of constructs to be mapped. It can be done before data from
the data source is imported into the system. The complexity of transforming the
data values during run-time depends on the data involved and is less easy to
quantify.
4.2</p>
      </sec>
      <sec id="sec-4-3">
        <title>Example of Data Exchange without Type Information</title>
        <p>On the face of it the transformation is very simple. The values in the complete
element in the XML schema will be added to those in the complete column in the
relational table, and those in the cash element will be added to the cashIn column.
However, this example highlights some problems we may face if we ignore type
information when transferring data between XML and SQL.
CREATE TABLE money (
cashIn smallint,
complete boolean
)</p>
        <p>The XML Schema element cash with type negativeInteger can contain any
negative integer whereas the largest negative number the Postgres column cashIn
with type smallint is -32678. If the XML le had a value less than -32678 in it,
attempting to put that into the Postgres table would cause an error. Similarly
if we try to put the boolean value 0 from the complete element into the boolean
column complete the operation will fail, since there are XML boolean values that
cannot be stored in a Postgres boolean column and we have not speci ed how to
map the XML boolean values into Postgres. These transformations are not type
safe.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Adding the Type System to AutoMed</title>
      <p>To add types to the AutoMed system, we extend the De nition 4 by adding
a type hierarchy T H = hT ypes; Ext; =t ; ; 6 \; M appingi from De nition 1 to
make a typed HDM schema be a tuple hN odes; Edges; Constraints; T Hi. The
scheme of each node will have an extra component, t 2 T ypes added, making
the scheme of a node be hhn; tii The extent of a node can now be de ned as a
subset of the values represented by the node's type so for x 2 hhn; tii ! x 2
Ext(t), i.e. hhn; tii 2 Ext(t). Note that edges connect other nodes and edges, and
therefore their type can always be inferred from the base nodes they connect.</p>
      <p>
        The primitive node operations described in [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] are modi ed and
expanded on to incorporate the new type information as follows.
1. A new operation getNodeType(q) returns the type, t, of a query q.
2. The primitive transformations add and delete are adapted slightly:
{ addNode(hhn; tii; q) returns a new schema by adding a node n of type t to
the schema, where getNodeType(q) =t t. The extent of the node is given
by the value of the IQL (Intermediate Query Language) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] query q.
{ deleteNode(hhn; tii; q) returns a new schema by deleting a node from the
schema, where the value of the node may be recovered from q, and
getNodeType(q) =t t
      </p>
      <p>It may be necessary to change the data type of a node during a
transformation. We present two new primitive operations to allow this. These changes are
done on the inter model type hierarchy.
De nition 5. changeNodeType(hhn; toldii; hhn; tnewii)
Condition: Ext(told) \ Ext(tnew) 6= , returns a new schema which di ers only
in that node hhnii has a di erent type. If told 6 tnew this operation will generate
the following constraint used during query processing:
8hxi 2 hhn; toldii:x 2 Ext(tnew).</p>
      <p>De nition 6. convertNodeType(hhn; toldii; hhn; tnewii; M appingim(told; tnew))
denes a bidirectional type conversion from a node of type told to type tnew. Each
value in the extent of hhn; toldii is mapped to a value in the extent of tnew using
the mapping tables and function from De nition 1.</p>
      <p>This section has shown how our formalism can be applied to the AutoMed
system by the addition of primitive operations to manipulate the types of the
nodes in a schema. However, the method in Section 3 could equally well be
applied to other data exchange systems.
6</p>
    </sec>
    <sec id="sec-6">
      <title>AutoMed Transformation with Data Types</title>
      <p>We can now de ne an AutoMed transformation pathway between the XML and
SQL schemas from Section 4.2 using the operators de ned above. The pathway
is shown in Example 5.</p>
      <p>The initial data type of each node matches the data type of the corresponding
data source object. For example if we add a node representing the Postgres
column complete from Figure 4 with type boolean, a node hhcomplete; booleanpgii
will be created using the addNode operator.</p>
      <p>
        The extents of the high level model types are the set of allowable values as
de ned by the data source. For example the extent of shortxml as de ned by the
XML Schema standard in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] will be the integers from -32768 to 32767.
      </p>
      <p>We merge T Hpg of Example 2 and T Hxml of Example 1 with the CTH as
shown in Example 4 to form T Him0 .</p>
      <p>Example 5. Type safe transformation pathway
1 changeNodeType(hhcashxml; negativeIntegerxmlii; hhcashxml; negativeIntegercii)
2 convertNodeType(hhcompletexml; booleanxmlii;</p>
      <p>hhcompletepg; booleancii; Mappingim0(booleanxml; booleanc))
3 addNode(hhcashInpg; negativeIntegercii; hhcashxml; negativeIntegercii)
4 addNode(hhcompletepg; booleancii; hhcompletexml; booleancii)
5 deleteNode(hhcompletexml; booleancii; hhcompletepg; booleancii)
6 deleteNode(hhcashxml; negativeIntegercii; hhcashInpg; negativeIntegercii)
7 convertNodeType(hhcompletepg; booleancii;</p>
      <p>hhcompletepg; booleanpgii; Mappingim0(booleanpg; booleanc))
8 changeNodeType(hhcashInpg; negativeIntegercii; hhcashInpg; smallintpgii)</p>
      <p>Two nodes hhcashxml; negativeIntegerxmlii and hhcompletexml; booleanxmlii
representing the XML Schema elements along with their data types are created by
the wrapping process. Before adding nodes to represent the Postgres columns
we change the type of the XML node to its equivalent in the CTH 1 because
we have negativeIntegerxml =t negativeIntegerc. 2 uses the mapping de ned
in Example 4.</p>
      <p>In transformations 7 and 8 we convert the data types of the nodes from the
HDM to the Postgres model. In 7 we change the type of hhcompletepg; booleancii
using the convertNodeType operation. We again use the mapping de ned in
Example 4, but this time the inverse of the function to map the values of the CTH
boolean to the Postgres boolean. This overcomes the incompatibility of the XML
Schema boolean and Postgres boolean data types and solves the second problem
from the example in Section 4.2.</p>
      <p>Finally in 8 we change the type of node hhcashInpg; negativeIntegercii using
changeNodeType. smallintpg =t shortc but shortc 6 negativeIntegerc so we need
to nd a common ancestor. The common ancestor is integerc. Since we have to
move down the hierarchy to get from integerc to shortc the constraint:
8x 2 hhcashxml; negativeIntegerxmlii:x 2 Ext(smallintpg)
is generated from De nition 5. The transformation is legal because there are
no disjoint pairs of types in the pathway. We can check the constraint as we
transfer data into the SQL table and so be guaranteed that we will not get any
errors from the database. This solves the rst problem we had in the example
in Section 4.2.</p>
      <p>After transformations 1 { 8 we have the nodes hhcashInpg; smallintpgii and
hhcompletepg; booleanpgii, that correspond to the columns in the Postgres table
in Figure 4.</p>
      <p>Using the type hierarchy has allowed us to identify a mapping between the
XML and Postgres boolean data types. This was done automatically at
compiletime, based on the types of the nodes involved. This could not have been done
without using the type information. We have also identi ed a potential type
casting problem that will need to be checked during run-time. Without the
explicit identi cation of the latter problem a system would either need to check
every transformation for type safety during the data value exchange or adopt a
no-checking policy that could lead to unexpected problems.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Related Work</title>
      <p>
        A number of papers have shown how to transform schemas from one model
to another, most commonly XML to relational [15{17]. There are also systems
that will exchange data between a number of di erent models [
        <xref ref-type="bibr" rid="ref12 ref18">18, 12</xref>
        ]. AutoMed
falls into this category. Transforming integrity constraints from one model to
another has also received some attention [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. There does not, however, seem to
have been much written about exchanging primitive type information between
di erent models and in particular manipulating that information during the
transformation process.
      </p>
      <p>
        Some methods ignore the problem [
        <xref ref-type="bibr" rid="ref15 ref7">15, 7</xref>
        ] and make no explicit mention of
how data type from one model are transformed into the other model. Rahm
and Bernstein [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] suggest using a special synonym table to match data types
between di erent models to each other, an approach they adopt in Cupid [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
This method is e ective when mapping between two speci c models but does
not scale well to a system like AutoMed that supports multiple models.
      </p>
      <p>
        A number of systems provide support for data types. TSIMMIS [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] allows
data types to be stored as part of their Object-Exchange Model [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] but there
is no mechanism to manipulate the data types. WOL [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] is another language
for database transformations that stores type information, however, the
language is only able to describe transformations in relational and object-relational
databases, not inter model transformations. The Clio system [
        <xref ref-type="bibr" rid="ref1 ref23">1, 23</xref>
        ] de nes a
number of value based source-to-target dependencies that specify how and
what source data should appear in the target. This data-centric approach does
not make use of type information in the source schema to help map to the target
schema. In ignoring type information these systems risk losing expressiveness
during the transformations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and allowing type-incompatible transformations
to be written. We have shown that there are times when data types are signi cant
and should not be ignored.
8
      </p>
    </sec>
    <sec id="sec-8">
      <title>Conclusions and Future Work</title>
      <p>We have presented a method of improving the expressiveness of inter model
data exchange between models that have constructs with associated data types.
The method relies on converting the types from the source into a common type
hierarchy capable of representing types from any data source, and from there
into types from the target schema.</p>
      <p>Types can be cast from one type to another and mappings can be de ned
between disjoint types. A formal de nition of the type system has been provided
and it has been shown how this can be included in the existing AutoMed system.
An example inter model transformation with and without using the type system
was presented to show the advantages of using the type system.</p>
      <p>The type hierarchy described above has been implemented in AutoMed and
has been used for data exchange from source schemas in a number of di erent
models to an empty target schema in the relational and XML Schema models.</p>
      <p>In the future we will extend the use of the mapping function to single model
data exchange where for instance one Postgres database represents boolean
values as single characters `t' and `f' of type char(1) while another database may
use a eld of type boolean. We also hope to increase the e ciency of the method
by de ning direct transformation rules between certain well-known data models,
bypassing the need to transform each data type into its HDM equivalent rst.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data exchange: Semantics and query answering</article-title>
          .
          <source>In: ICDT</source>
          . (
          <year>2003</year>
          )
          <volume>207</volume>
          {
          <fpage>224</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Atkinson</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buneman</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Types and persistence in database programming languages</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>19</volume>
          (
          <issue>2</issue>
          ) (
          <year>1987</year>
          )
          <volume>105</volume>
          {
          <fpage>190</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Davidson</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hara</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
          </string-name>
          , J.:
          <article-title>Propagating XML Constraints to Relations</article-title>
          . In: ICDE. (
          <year>2003</year>
          )
          <volume>543</volume>
          {
          <fpage>554</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giacomo</surname>
            ,
            <given-names>G.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Data integration under integrity constraints</article-title>
          .
          <source>Inf. Syst</source>
          .
          <volume>29</volume>
          (
          <issue>2</issue>
          ) (
          <year>2004</year>
          )
          <volume>147</volume>
          {
          <fpage>163</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cardelli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Type systems</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>28</volume>
          (
          <issue>1</issue>
          ) (
          <year>1996</year>
          )
          <volume>263</volume>
          {
          <fpage>264</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>A survey of approaches to automatic schema matching</article-title>
          .
          <source>VLDB J</source>
          .
          <volume>10</volume>
          (
          <issue>4</issue>
          ) (
          <year>2001</year>
          )
          <volume>334</volume>
          {
          <fpage>350</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.L.</given-names>
            ,
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.:</surname>
          </string-name>
          <article-title>A virtual XML database engine for relational databases</article-title>
          .
          <source>XSYM</source>
          <year>2003</year>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Henry</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
          </string-name>
          et al. Eds:
          <article-title>XML Schema part 1: Structures</article-title>
          . http://www.w3.org/TR/xmlschema-1 (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Paul</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Biron</surname>
          </string-name>
          et al. Eds:
          <article-title>XML Schema part 2: Datatypes second edition</article-title>
          . http://www.w3.org/TR/xmlschema-2 (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. Boyd</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kittivoravitkul</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Lazanitis</surname>
            ,
            <given-names>P.J.</given-names>
          </string-name>
          <string-name>
            <surname>McBrien</surname>
            and
            <given-names>N.</given-names>
          </string-name>
          <article-title>Rizopoulos: AutoMed: A BAV Data Integration System for Heterogeneous Data Sources</article-title>
          .
          <source>In: CAiSE04</source>
          . Volume
          <volume>3084</volume>
          of LNCS., Springer Verlag (
          <year>2004</year>
          )
          <volume>82</volume>
          {
          <fpage>97</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>McBrien</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poulovassilis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A general formal framework for schema transformation</article-title>
          .
          <source>In: Data and Knowledge Engineering</source>
          . Volume
          <volume>28</volume>
          . (
          <year>1998</year>
          )
          <volume>47</volume>
          {
          <fpage>71</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Boyd</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McBrien</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Comparing and transforming between data models via an intermediate hypergraph data model</article-title>
          .
          <source>J. Data Semantics IV</source>
          (
          <year>2005</year>
          )
          <volume>69</volume>
          {
          <fpage>109</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>McBrien</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poulovassilis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A semantic approach to integrating XML and structured data sources</article-title>
          .
          <source>In: Advanced Information Systems Engineering</source>
          . Volume 2068 of LNCS., Springer Verlag (
          <year>2001</year>
          )
          <volume>330</volume>
          {
          <fpage>345</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Poulovassilis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A tutorial on the IQL query language</article-title>
          .
          <source>AutoMed Technical Report</source>
          http://www.doc.ic.ac.uk/automed/ (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Phil</surname>
          </string-name>
          Bohannon et al.:
          <article-title>From XML Schema to relations: A cost-based approach to XML storage</article-title>
          .
          <source>In: Proc. of Intl. Conf. on Data Engineering (ICDE)</source>
          .
          <article-title>(</article-title>
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ferandez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suciu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Silkroute:
          <article-title>Trading between relations and XML</article-title>
          .
          <source>In: Proceedings of the Ninth International World Wide Web Conference</source>
          . (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Jayavel</surname>
          </string-name>
          Shanmugasundaram et al.:
          <article-title>A general techniques for querying XML documents using a relational database system</article-title>
          .
          <source>SIGMOD Record</source>
          <volume>30</volume>
          (
          <issue>3</issue>
          ) (
          <year>2001</year>
          )
          <volume>20</volume>
          {
          <fpage>26</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolaitis</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data exchange: getting to the core</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .
          <volume>30</volume>
          (
          <issue>1</issue>
          ) (
          <year>2005</year>
          )
          <volume>174</volume>
          {
          <fpage>210</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Madhavan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Generic schema matching with cupid</article-title>
          .
          <source>In: VLDB</source>
          . (
          <year>2001</year>
          )
          <volume>49</volume>
          {
          <fpage>58</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Garcia-Molina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papakonstantinou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quass</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rajaraman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sagiv</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vassalos</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widom</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The tsimmis approach to mediation: Data models and languages</article-title>
          .
          <source>J. Intell. Inf. Syst</source>
          .
          <volume>8</volume>
          (
          <issue>2</issue>
          ) (
          <year>1997</year>
          )
          <volume>117</volume>
          {
          <fpage>132</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Papakonstantinou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widom</surname>
          </string-name>
          , J.:
          <article-title>Object exchange across heterogeneous information sources</article-title>
          . In Yu, P.S.,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>A.L.P</given-names>
          </string-name>
          ., eds.
          <source>: 11th Conference on Data Engineering</source>
          , Taipei, Taiwan, IEEE Computer Society (
          <year>1995</year>
          )
          <volume>251</volume>
          {
          <fpage>260</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>Susan</given-names>
            <surname>Davidson</surname>
          </string-name>
          and
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Kosky: WOL: A Language for Database Transformations and Constraints</article-title>
          .
          <source>In: Proceedings of the International Conference of Data Engineering</source>
          . (
          <year>1997</year>
          )
          <volume>55</volume>
          {
          <fpage>65</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haas</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernandez</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Schema mapping as query discovery</article-title>
          .
          <source>In: VLDB</source>
          . (
          <year>2000</year>
          )
          <volume>77</volume>
          {
          <fpage>88</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>