<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Information Sharing for the Semantic Web | a Schema Transformation Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lucas Zamboulis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandra Poulovassilis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science and Information Systems, Birkbeck College, University of London</institution>
          ,
          <addr-line>London WC1E 7HX</addr-line>
        </aff>
      </contrib-group>
      <fpage>275</fpage>
      <lpage>289</lpage>
      <abstract>
        <p>This paper proposes a framework for transforming and integrating heterogeneous XML data sources, making use of known correspondences from them to ontologies expressed in the form of RDFS schemas. The paper ¯rst illustrates how correspondences to a single ontology can be exploited. The approach is then extended to the case where correspondences may refer to multiple ontologies, themselves interconnected via schema transformation rules. The contribution of this research is an XML-speci¯c approach to the automatic transformation and integration of XML data, making use of RDFS ontologies as a `semantic bridge'.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This paper proposes a framework for the automatic transformation and
integration of heterogeneous XML data sources by exploiting known correspondences
between them to ontologies expressed as RDFS schemas. Our algorithms
generate schema transformation rules implemented in the AutoMed heterogeneous
data integration system (http://www.doc.ic.ac.uk/automed/). These rules
can be used to transform an XML data source into a target format, or to
integrate a set of heterogeneous XML data sources into a common format. The
transformation/integration may be virtual or materialised.</p>
      <p>There are several advantages of our approach, compared with say
constructing pairwise mappings between the XML data sources, or between each data
source and some known global XML format: known semantic correspondences
between data sources and domain and other ontologies can be utilised for
transforming or integrating the data sources; the correspondences from the data
sources to the ontology do not need to perform a complete mapping of the
data sources; and changes in a data source, or addition or removal of a data
source, do not a®ect the other sets of correspondences.</p>
      <p>Paper outline: Section 2 compares our approach with related work.
Section 3 gives an overview of AutoMed to the level of detail necessary for the
purposes of this paper. Section 4 presents the process of transforming and
integrating XML data sources which are linked to the same ontology, while Section 5
extends this to the more general case of data sources being linked to di®erent
ontologies. Section 6 gives our concluding remarks and plans for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The work in [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] also undertakes data integration through the use of ontologies.
However, this is by transforming the source data into a common RDF format,
in contrast to our integration approach in which the common format is an XML
schema. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], mappings from DTDs to RDF ontologies are used in order to
reformulate path queries expressed over a global ontology to equivalent XML
data sources. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], an ontology is used as a global virtual schema for
heterogeneous XML data sources using LAV mapping rules. SWIM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] uses mappings
from various data models (including XML and relational) to RDF, in order to
integrate data sources modelled in di®erent modelling languages. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], XML
Schema constructs are mapped to OWL constructs and evaluation of queries on
the virtual OWL global schema are supported.
      </p>
      <p>In contrast to all of these approaches, we use RDFS schemas merely as a
`semantic bridge' for transforming/integrating XML data, and the target/global
schema is in all cases an XML schema.</p>
      <p>
        Other approaches to transforming or integrating XML data which do not
make use of RDF/S or OWL include [16, 18{20, 23]. Our own earlier work in
[
        <xref ref-type="bibr" rid="ref24 ref25">24, 25</xref>
        ] also discussed the transformation and integration of XML data sources.
However, this work was not able to make use of correspondences between the data
sources and ontologies. The approach we present here is able to use information
that identi¯es an element/attribute in one data source to be equivalent to, a
superclass of, or a subclass of an element/attribute in another data source. This
information is generated from the correspondences between the data sources and
ontologies. This allows more semantic relationships to be inferred between the
data sources, and hence more information to be retained from a data source
when it is transformed into a target format.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Overview of AutoMed</title>
      <p>AutoMed is a heterogeneous data transformation and integration system which
o®ers the capability to handle virtual, materialised and hybrid data
integration across multiple data models. It supports a low-level hypergraph-based
data model (HDM) and provides facilities for specifying higher-level
modelling languages in terms of this HDM. An HDM schema consists of a set of
nodes, edges and constraints, and each modelling construct of a higher-level
modelling language is speci¯ed as some combination of HDM nodes, edges and
constraints. For any modelling language M speci¯ed in this way (via the API
of AutoMed's Model De¯nitions Repository) AutoMed provides a set of
primitive schema transformations that can be applied to schema constructs expressed
in M. In particular, for every construct of M there is an add and a delete
primitive transformation which add to/delete from a schema an instance of that
construct. For those constructs of M which have textual names, there is also a
rename primitive transformation.</p>
      <p>Instances of modelling constructs within a particular schema are identi¯ed by
means of their scheme enclosed within double chevrons hh: : :ii. AutoMed schemas
can be incrementally transformed by applying to them a sequence of primitive
transformations, each adding, deleting or renaming just one schema construct
(thus, in general, AutoMed schemas may contain constructs of more than one
modelling language). A sequence of primitive transformations from one schema
S1 to another schema S2 is termed a pathway from S1 to S2 and denoted by
S1 ! S2. All source, intermediate, and integrated schemas, and the pathways
between them, are stored in AutoMed's Schemas &amp; Transformations Repository.</p>
      <p>Each add and delete transformation is accompanied by a query specifying
the extent of the added or deleted construct in terms of the rest of the constructs
in the schema. This query is expressed in a functional query language, IQL, and
we will see some examples of IQL queries in Section 4. Also available are extend
and contract primitive transformations which behave in the same way as add
and delete except that they state that the extent of the new/removed construct
cannot be precisely derived from the rest of the constructs. Each extend and
contract transformation takes a pair of queries that specify a lower and an
upper bound on the extent of the construct. These bounds may be Void or
Any, which respectively indicate no known information about the lower or upper
bound of the extent of the new construct.</p>
      <p>
        The queries supplied with primitive transformations can be used to translate
queries or data along a transformation pathway S1 ! S2 by means of query
unfolding: for translating a query on S1 to a query on S2 the delete, contract
and rename steps are used, while for translating data from S1 to data on S2 the
add, extend and rename steps are used | we refer the reader to [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for details.
      </p>
      <p>
        The queries supplied with primitive transformations also provide the
necessary information for these transformations to be automatically reversible, in that
each add/extend transformation is reversed by a delete/contract
transformation with the same arguments, while each rename is reversed by a rename with
the two arguments swapped. As discussed in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], this means that AutoMed is
a both-as-view (BAV) data integration system: the add/extend steps in a
transformation pathway correspond to Global-As-View (GAV) rules while the
delete and contract steps correspond to Local-As-View (LAV) rules. If a GAV
view is derived from solely add steps it will be exact in the terminology of [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. If,
in addition, it is derived from one or more extend steps using their lower-bound
(upper-bound) queries, then the GAV view will be sound (complete) in the
terminology of [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Similarly for LAV views. An in-depth comparison of BAV with
the GAV, LAV and GLAV [
        <xref ref-type="bibr" rid="ref13 ref6">6, 13</xref>
        ] approaches to data integration can be found
in [
        <xref ref-type="bibr" rid="ref15 ref9">15, 9</xref>
        ], while [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] discusses the use of BAV in a peer-to-peer data integration
setting.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Representing XML schemas in AutoMed</title>
        <p>
          The standard schema de¯nition languages for XML are DTD [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] and XML
Schema [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Both of these provide grammars to which conforming documents
adhere, and do not abstract the tree structure of the actual documents. In our
schema transformation and integration context, knowing the actual structure
facilitates schema traversal, structural comparison between a source and a target
schema, and restructuring of the source schema(s) that are to be transformed
and/or integrated. Moreover, such a schema type means that the queries supplied
with the AutoMed primitive transformations are essentially path queries, which
are easily generated and easily translated into XPath/XQuery for interaction
with the XML data sources. In addition, it may not be the case that the all data
sources have an accompanying DTD or XML Schema they conform to.
        </p>
        <p>We have therefore de¯ned a simple modelling language called XML
DataSource Schema (XMLDSS) which summarises the structure of an XML
document. XMLDSS schemas consist of four kinds of constructs:
Element: Elements, e, are identi¯ed by a scheme hheii and are represented by
nodes in the HDM.</p>
        <p>Attribute: Attributes, a, belonging to elements, e, are identi¯ed by a scheme
hhe; aii. They are represented by a node in the HDM, representing the
attribute; an edge between this node and the node representing the element
e; and a cardinality constraint stating that an instance of e can have at
most one instance of a associated with it, and that an instance of a can be
associated with one or more instances of e.</p>
        <p>NestList: NestLists are parent-child relationships between two elements ep
and ec and are identi¯ed by a scheme hhep; ec; iii, where i is the position of
ec within the list of children of ep. In the HDM, they are represented by an
edge between the nodes representing ep and ec; and a cardinality constraint
that states that each instance of ep is associated with zero or more instances
of ec, and each instance of ec is associated with precisely one instance of ep.1
PCData: In any XMLDSS schema there is one construct with scheme hhPCDataii,
representing all the instances of PCData within an XML document.</p>
        <p>In an XML document there may be elements with the same name
occurring at di®erent positions in the tree. In XMLDSS schemas we therefore use
an identi¯er of the form elementN ame$count for each element in the schema,
where count is a counter incremented every time the same elementN ame is
encountered in a depth-¯rst traversal of the schema. If the su±x $count is
omitted from an element name, then the su±x $1 is assumed. For the XML
documents themselves, our XML wrapper generates a unique identi¯er of the form
elementN ame$count&amp;instanceCount for each element where instanceCount is
a counter identifying each instance of elementN ame$count in the document.</p>
        <p>
          The XMLDSS schema, S, of an XML document, D, is derived by our XML
wrapper by means of a depth-¯rst traversal of D and is equivalent to the tree
resulting as an intermediate step in the creation of a minimal dataguide [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
However, unlike dataguides, we do not merge common sub-trees and the schema
remains a tree rather than a DAG.
        </p>
        <p>To illustrate XMLDSS schemas, consider the following XML document:
1 Here, the fact that IQL is inherently list-based means that the ordering of children
instances of ec under parent instances of ep is preserved within the extent of the
NestList hhep; ec; iii.
&lt;university&gt;
&lt;school name="School of Law"&gt;
&lt;academic&gt;
&lt;name&gt;Dr. G. Grigoriadis&lt;/name&gt;
&lt;office&gt;123&lt;/office&gt;&lt;/academic&gt;
&lt;academic&gt;
&lt;name&gt;Prof. A. Karakassis&lt;/name&gt;
&lt;office&gt;111&lt;/office&gt;&lt;/academic&gt;
&lt;/school&gt;
&lt;school name="School of Medicine"&gt;
&lt;academic&gt;
&lt;name&gt;Dr. A. Papas&lt;/name&gt;
&lt;office&gt;321&lt;/office&gt;&lt;/academic&gt;
&lt;/school&gt;
&lt;/university&gt;
The XMLDSS schema extracted from this document is S1 in Figure 1. Note
that a new root element r is generated for each XMLDSS schema, populated by
a unique instance r&amp;1 This is useful in adopting a more uniform approach to
schema restructuring and schema integration by not having to consider whether
schemas have the same or di®erent roots.</p>
        <p>As mentioned earlier, after a modelling language has been speci¯ed in terms
of the HDM, AutoMed automatically makes available a set of primitive
transformations for transforming schemas de¯ned in that modelling language. Thus, for
XMLDSS schemas there are transformations addElement (hheii,query), addAttribute
(hhe; aii, query), addNestList ((hhep; ec; iii, query), and similar transformations
for the extend, delete, contract and rename of Element, Attribute and
NestList constructs.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Transforming and Integrating XML Data Sources</title>
      <p>
        In this section we consider ¯rst a scenario in which two XMLDSS schemas S1
and S2 are each semantically linked to an RDFS schema by means of a set of
correspondences. These correspondences may be de¯ned by a domain expert or
extracted by a process of schema matching from the XMLDSS schemas and/or
underlying XML data, e.g. using the techniques described in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Each
correspondence maps an XMLDSS Element or Attribute construct to an IQL query
over the RDFS schema (so correspondences are LAV mappings).
      </p>
      <p>In Section 4.1 we show how these correspondences can be used to
generate a transformation pathway from S1 to an intermediate schema IS1, and a
pathway from S2 to an intermediate schema IS2. The schemas IS1 and IS2
are `conformed' in the sense that they use the same terms for the same RDFS
concepts.</p>
      <p>Due to the bidirectionality of BAV, from these two pathways S1 ! IS1 and
S2 ! IS2 can be automatically derived the reverse pathways IS1 ! S1 and
IS2 ! S2.</p>
      <p>In Section 4.2 we show how a transformation pathway from IS1 ! IS2
can then be automatically generated. An overall transformation pathway from
S1 to S2 can ¯nally be obtained by composing the three pathways S1 ! IS1,
IS1 ! IS2 and IS2 ! S2.</p>
      <p>This pathway can subsequently be used to automatically translate queries
expressed on S2 to operate on S1, using AutoMed's XML Wrapper over source
S1 to return the query results. Or the pathway can be used to automatically
transform data that is structured according to S1 to be structured according to
S2, and an XML document structured according to S2 can be output.</p>
      <p>In Section 4.3 we discuss the automatic integration of a number of XML data
sources described by XMLDSS schemas S1; : : : ; Sn, each semantically linked to
a single RDFS schema by a set of correspondences. This process extends the
approach of Sections 4.1 and 4.2 to integrate a set of schemas into a single
global XMLDSS schema.</p>
      <p>S2
name
name</p>
      <p>r
1
staffMember
1
office
1
college 2</p>
      <p>PCData</p>
      <p>belongs
R1</p>
      <p>belongs
College name</p>
      <p>IS2
S1</p>
      <p>
        r
In our approach, a correspondence de¯nes an Element or Attribute of an
XMLDSS schema by means of an IQL path query over an RDFS schema2. In
particular, an Element e may map either to a Class c, or to a path ending
with a class-valued property of the form hhp; c1; c2ii, or to a path ending with
a literal-valued property of the form hhp; c; Literalii; additionally, the
correspondence may state that the instances of a class are constrained by membership in
some subclass. An Attribute may map either to a literal-valued property or to
a path ending with a literal-valued property. Our correspondences are similar to
path-path correspondences in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], in the sense that a path from the root of an
XMLDSS schema to a node corresponds to a path in the RDFS schema.
      </p>
      <p>For example, Tables 1 and 2 show the correspondences between the XMLDSS
schemas S1 and S2 and the RDFS schema R1 (Figure 1). In Table 1 the 1st
correspondence maps element hhuniversityii to class hhUniversityii. The 2nd
correspondence states that the extent of element hhschoolii corresponds to the instances
of class School derived from the join of properties hhbelongs, College, Universityii
2 An RDFS schema can be represented in the HDM using ¯ve kinds of constructs:</p>
      <p>
        Class, Property, subClassOf, subPropertyOf, Literal. See [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] for details.
and hhbelongs, School,Collegeii on their common class construct, College.3 In the
4th correspondence, element hhacademicii corresponds to the instances of class
Sta® derived from the speci¯ed path expression and that are also members of
AcademicSta®. In the 5th correspondence, the IQL function generateElemUID
generates as many instances for element hhnameii as speci¯ed by its second
argument i.e. the number of instances of the property hhname; Sta®; Literalii in the
path expression speci¯ed as the argument to the count function. The remaining
correspondences in Tables 1 and 2 are similar.
      </p>
      <p>The conformance of a pair of XMLDSS schemas S1 and S2 to equivalent
XMLDSS schemas IS1 and IS2 that represent the same concepts in the same
way is achieved by renaming the constructs of S1 and S2 using the sets of
correspondences from these schemas to a common ontology.</p>
      <p>
        For every correspondence i in the set of correspondences between an XMLDSS
schema S and an ontology R, a rename AutoMed transformation is generated,
as follows:
1. If i concerns an Element e:
3 The IQL query de¯ning this correspondence may be read as \return all values s such
that the pair of values fc; ug is in the extent of construct hhbelongs; College; Universityii
and the pair of values fs; cg is in the extent of construct hhbelongs; School; Collegeii".
IQL is a comprehensions-based language and we refer the reader to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for details of
its syntax, semantics and implementation. Such languages subsume query languages
such as SQL-92 and OQL in expressiveness [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. There are AutoMed wrappers for
SQL and OQL data sources and these translate fragments of IQL into SQL or OQL.
Translating between fragments of IQL and XPath/XQuery is also straightforward
| see Section 4.4 below.
(a) If e maps directly to a Class c, rename e to c. If the instances of c are
constrained by membership in a subclass csub of c, rename e to csub.
(b) Else, if e maps to a path in R ending with a class-valued Property,
rename e to s, where s is the concatenation of the labels of the Class
and Property constructs of the path, separated by `.'. If the instances
of a Class c in this path are constrained by membership in a subclass,
then the label of the subclass is used instead within s.
(c) Else, if e maps to a path in R ending with a literal-valued Property
hhp; c; Literalii, rename e as in step 1b, but without appending the label
Literal to s.
2. If i concerns an Attribute a, then a must map to a path in R ending with
a literal-valued Property hhp; c; Literalii, and it is renamed as Element e in
step 1c.
      </p>
      <p>Note that not all the constructs of S1 and S2 need be mapped by correspondences
to the ontology. Such constructs are not a®ected and are treated as-is by the
subsequent schema restructuring phase.</p>
      <p>Figure 1 shows the schemas IS1 and IS2 produced by the application of the
renamings to S1 and S2 arising from the sets of correspondences in Tables 1 and
2.
4.2</p>
      <sec id="sec-4-1">
        <title>Schema restructuring</title>
        <p>
          In order to next transform schema IS1 to have the same structure as schema
IS2, we have developed a schema restructuring algorithm that, given a source
XMLDSS schema S and a target XMLDSS schema T , automatically transforms
S to the structure of T , given that S and T have been previously conformed.
This algorithm is able to use information that identi¯es an element/attribute
in S to be equivalent to, a superclass of, or a subclass of an element/attribute
in T . This information may be produced by, for example, a schema matching
tool or, in our context here, via correspondences to an RDFS ontology. We note
that this algorithm is an extension of our earlier schema restructuring algorithm
described in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], which could only handle equivalence information between
elements/attributes and could not exploit superclass and subclass information. The
extended algorithm allows more semantic relationships to be inferred between S
and T , and hence more information to be retained from S when it is transformed
into T . The restructuring algorithm consists of a \growing phase" where T is
traversed in a depth-¯rst fashion and S is augmented with any constructs from
T that it is missing, followed by a \shrinking phase" where the augmented S is
traversed in a depth-¯rst fashion and any contruct present in S but not in T is
removed.
        </p>
        <p>The AutoMed transformations generated by the schema restructuring
algorithm for transforming schema IS1 to schema IS2 are illustrated in Table 3. In
the growing phase, the ¯rst three transformations concern the element hhSta®ii
of IS2. This element is inserted in IS1 using Element hhAcademicSta®ii, which
corresponds to a class that is a subclass of the class hhSta®ii corresponds to in
After growing phase</p>
        <p>1
University</p>
        <p>1
University.belongs.College.</p>
        <p>belongs.School</p>
        <p>1
University.belongs.College.belongs.</p>
        <p>School.belongs. AcademicStaff
University.belo1ngs. Un2iversity.belongs.</p>
        <p>College.belongs. College.belongs.</p>
        <p>School.belongs. School.belongs.</p>
        <p>AcademicStaff. AcademicStaff.</p>
        <p>name office
r</p>
        <p>1</p>
        <p>University.belongs.</p>
        <p>College.belongs.School.</p>
        <p>belongs.Staff. name
University.belongs.</p>
        <p>College.belongs.</p>
        <p>School.name</p>
        <p>University.belongs.</p>
        <p>College.name</p>
        <p>University.belongs.</p>
        <p>College.belongs.</p>
        <p>School.belongs. Staff</p>
        <p>1
University.belongs.College.belongs.</p>
        <p>School.belongs.Staff. office</p>
        <p>1
University.belongs. College
2
1
1</p>
        <p>PCData
the RDFS ontology; the ren IQL function is used here to rename the instances
of Element hhAcademicSta®ii appropriately. After that, a NestList is inserted,
linking hhSta®ii to its parent, which is the root r, using the path from r to
AcademicStaff. hhSta®ii in T is not linked to the PCData construct, and
therefore its attribute is handled next. The addAttribute transformation performs
an element-to-attribute transformation by inserting Attribute hhSta®; nameii
using the extents of hhAcademicSta®; nameii and hhname; PCDataii. The following
three transformations insert Element hhSta®:o±ceii along with its incoming and
outgoing NestList constructs in a similar manner. Then the last two
transformations insert Element hhCollegeii along with its Attribute and its incoming
NestList. Since there is no information relevant to the extents of these
constructs in S, extend transformations are used, with Void as the lower-bound
query. Note however that the upper-bound query generates a synthetic extent
for both the hhCollegeii Element and its incoming NestList (for the latter, the
IQL function generateNestLists is used4); this is to make sure that if any
following transformations attach other constructs to hhCollegeii, their extent is not
lost (assuming that these constructs are not themselves inserted with extend
transformations and the constants Void and Any as the lower-bound and
upperbound queries). At the end of the growing phase, the transformations applied to
schema IS1 result in the intermediate schema shown in Figure 2.
4 Generally, function generateNestLists either accepts Element schemes hhaii and hhbii,
with equal size of extents, and generates the extent of NestList construct hha; bii;
or, it accepts Element schemes hhaii and hhbii, where the extent of hhaii is a single
instance, and generates the extent of NestList construct hha; bii.</p>
        <p>The shrinking phase operates similarly. The transformations removing
hhAcademicSta®,AcademicSta®.nameii, hhAcademicSta®, PCDataii and
hhAcademicSta®.nameii specify the inverse of the element-to-attribute
transformation of the growing phase. To support attribute-to-element transformations,
the IQL function skolemiseEdge is used; it takes as input a NestList hhep; ecii,
and an Element hheii, which have the same extent size, and for each pair of
instances e of hheii and fep; ecg of hhep; ecii generates a tuple fep; e; ecg.</p>
        <p>
          The result of applying the transformations of Table 3 to schema IS1 is IS2
illustrated in Figure 1. There now exists a transformation pathway S1 ! IS1 !
IS2 ! S2, which can be used to query S2 by obtaining data from the data
source corresponding to schema S1. For example, if this is the XML document
of Section 3.1, the IQL query
returns the following result:
[f`Dr: G: Grigoriadis';`123'g; f`P rof: A: Karakassis';`111'g; f`Dr: A: P apas';`321'g]
We could also use the pathway S1 ! IS1 ! IS2 ! S2 to materialise S2 using
the data from the data source corresponding to S1 | see [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] for details of this
process.
        </p>
        <p>The separation of the growing phase from the shrinking phase ensures the
completeness of the restructuring algorithm: the growing phase considers in turn
each node in the target schema T and generates if necessary a query de¯ning this
node in terms of the source schema S; conversely, the shrinking phase considers in
turn each node of S and generates if necessary a query de¯ning this node in terms
of T; inserting new target schema constructs before removing any redundant
source schema constructs ensures that the constructs needed to de¯ne the extent
of any construct are always present in the current schema.
Consider now the integration of a set of XMLDSS schemas S1; : : : ; Sn all
conforming to some ontology R into a global XMLDSS schema. The renaming
algorithm of Section 4.1 can ¯rst be used to produce intermediate XMLDSS schemas
IS1; : : : ; ISn. The initial global schema, GS1, is IS1. IS2 is then integrated with
GS1 producing GS2. The integration of ISi with GSi¡1 to produce GSi
proceeds until i = n. This integration consists of ¯rst an expansion of GSi¡1 with
the constructs from ISi that it is missing (again via a growing and a shrinking
phase) and then a restructuring, using the algorithm of Section 4.2, of ISi with
the resulting schema GSi.
In our framework, XML data sources are accessed using an XMLDSS
wrapper. This has SAX and DOM versions for XML ¯les, supporting a subset of
XPath. There is also a wrapper over the eXist XML repository which translates
IQL queries representing (possibly nested) select-project-join-union queries into
(possibly nested) XQuery FLWR expressions.</p>
        <p>The XML wrapper can be used in three di®erent settings: (i) When a source
XMLDSS schema S1 has been transformed into a target XMLDSS schema S2,
the resulting pathway S1 ! S2 can be used to translate an IQL query expressed
on S2 to an IQL query on S1, and the XML wrapper of the XML data source
corresponding to S1 can be used to retrieve the necessary data for answering the
query. (ii) In the integration of multiple data sources with schemas S1; : : : ; Sn
under a virtual global schema GS, AutoMed's Global Query Processor can
process an IQL query expressed on GS in cooperation with the XML wrappers for
the data sources corresponding to the Si. (iii) In a materialised data
transformation or data integration setting, where the XML wrapper(s) of the data source(s)
retrieve the data and the XML wrapper of the target schema materialises the
data into the target schema format.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Handling Multiple Ontologies</title>
      <p>We now discuss how our approach can also handle XMLDSS schemas that are
linked to di®erent ontologies. These may be connected either directly via an
AutoMed transformation pathway, or via another ontology (e.g. an `upper'
ontology) to which both ontologies are connected by an AutoMed pathway.</p>
      <p>Consider in particular two XMLDSS schemas S1 and S2 that are semantically
linked by two sets of correspondences C1 and C2 to two ontologies R1 and R2.
Suppose that there is an articulation between R1 and R2, in the form of an
AutoMed pathway between them. This may be a direct pathway R1 ! R2.
Alternatively, there may be two pathways R1 ! RGeneric and R2 ! RGeneric
linking R1 and R2 to a more general ontology RGeneric, from which we can derive
a pathway R1 ! RGeneric ! R2 (due to the reversibility of pathways). In both
cases, the pathway R1 ! R2 can be used to transform the correspondences C1
expressed w.r.t. R1 to a set of correspondences C10 expressed on R2. This is using
the query translation algorithm mentioned in Section 3 which performs query
unfolding using the delete, contract and rename steps in R1 ! R2.</p>
      <p>The result is two XMLDSS schemas S1 and S2 that are semantically linked by
two sets of correspondences C10 and C2 to the same ontology R2. Our approach
described for a single ontology in Section 4 can now be applied. There is a
proviso here that the new correspondences C10 must conform syntactically to
the correspondences accepted as input by the schema conformance process of
Section 4.1 i.e. their syntax is as described in the ¯rst paragraph of Section 4.1.
Determining necessary conditions for this to hold, and extending our approach
to handle a more expressive set of correspondences, are areas of future work.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Concluding Remarks</title>
      <p>This paper has discussed the automatic transformation and integration of XML
data sources, making use of known correspondences between them and one or
more ontologies expressed as RDFS schemas. The novelty of our approach lies
in the use of XML-speci¯c graph restructuring techniques in combination with
correspondences from XML schemas to the same or di®erent ontologies. The
approach promotes the reuse of correspondences to ontologies and mappings
between ontologies. It is applicable on any XML data source, be it an XML
document or an XML database. The data source does not need to have an
accompanying DTD or XML Schema, although if this is available it is straightforward
to translate such a schema in our XMLDSS schema type.</p>
      <p>The schema conformance algorithm handles 1-1 mappings between XMLDSS
and RDFS constructs, enriched with containment relationships through the use
of subclass/superclass and subproperty/superproperty RDFS constraints. This
semantic reconciliation of the data source schemas is followed by their
structural reconciliation by the schema restructuring algorithm, which handles 1-1
mappings between XMLDSS schemas, utilising the constraints de¯ned in the
correspondences. Extending our approach to be capable of utilising 1:n, n:1 and
more complex mappings, is a matter of ongoing work. To this end, and at the
same time aiming to maintain the current separation of semantic and
structural schema reconciliation, we are currently extending the schema conformance
algorithm.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>B.</given-names>
            <surname>Amann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Beeri</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Fundulaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Scholl</surname>
          </string-name>
          .
          <article-title>Ontology-based integration of XML web resources</article-title>
          .
          <source>In Proc. International Semantic Web Conference</source>
          <year>2002</year>
          , pages
          <fpage>117</fpage>
          {
          <fpage>131</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Buneman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Suciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tannen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Wong</surname>
          </string-name>
          .
          <article-title>Comprehension syntax</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>23</volume>
          (
          <issue>1</issue>
          ):
          <volume>87</volume>
          {
          <fpage>96</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>V.</given-names>
            <surname>Christophides</surname>
          </string-name>
          and et. al. The
          <string-name>
            <surname>ICS-FORTH</surname>
            <given-names>SWIM</given-names>
          </string-name>
          :
          <article-title>A powerful Semantic Web integration middleware</article-title>
          .
          <source>In Proc. SWDB'03</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>Using a layered approach for interoperability on the Semantic Web</article-title>
          .
          <source>In Proc. WISE'03</source>
          , pages
          <fpage>221</fpage>
          {
          <fpage>231</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Hsu</surname>
          </string-name>
          .
          <article-title>An ontology-based framework for XML semantic integration</article-title>
          .
          <source>In Proc. IDEAS'04</source>
          , pages
          <fpage>217</fpage>
          {
          <fpage>226</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Levy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Millstein</surname>
          </string-name>
          .
          <article-title>Navigational plans for data integration</article-title>
          .
          <source>In Proc. of the 16th National Conference on Arti¯cial Intelligence</source>
          , pages
          <fpage>67</fpage>
          {
          <fpage>73</fpage>
          . AAAI,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>R.</given-names>
            <surname>Goldman</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Widom.</surname>
          </string-name>
          <article-title>DataGuides: enabling query formulation and optimization in semistructured databases</article-title>
          .
          <source>In Proc. VLDB'97</source>
          , pages
          <fpage>436</fpage>
          {
          <fpage>445</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>E.</given-names>
            <surname>Jasper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poulovassilis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Zamboulis</surname>
          </string-name>
          .
          <article-title>Processing IQL queries and migrating data in the AutoMed toolkit</article-title>
          .
          <source>AutoMed Tech. Rep. 20</source>
          ,
          <year>June 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>E.</given-names>
            <surname>Jasper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Brien</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Poulovassilis</surname>
          </string-name>
          .
          <article-title>View generation and optimisation in the AutoMed data integration framework</article-title>
          .
          <source>In Proc. 6th International Baltic Conference on Databases &amp; Information Systems</source>
          , Riga, Latvia,
          <year>June 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>L. V. S.</given-names>
            <surname>Lakshmanan</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Sadri</surname>
          </string-name>
          .
          <article-title>XML interoperability</article-title>
          . In
          <source>In Proc. of WebDB'03</source>
          , pages
          <fpage>19</fpage>
          {
          <fpage>24</fpage>
          ,
          <year>June 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>P.</given-names>
            <surname>Lehti</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Fankhauser</surname>
          </string-name>
          .
          <article-title>XML data integration with OWL: Experiences and challenges</article-title>
          .
          <source>In Proc. Symposium on Applications and the Internet (SAINT</source>
          <year>2004</year>
          ), Tokyo,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          .
          <article-title>Data integration: A theoretical perspective</article-title>
          .
          <source>In Proc. PODS'02</source>
          , pages
          <fpage>233</fpage>
          {
          <fpage>246</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          .
          <article-title>Composing mappings among data sources</article-title>
          .
          <source>In Proc. VLDB'03</source>
          , pages
          <fpage>572</fpage>
          {
          <fpage>583</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>P.</given-names>
            <surname>McBrien</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Poulovassilis</surname>
          </string-name>
          . De¯
          <article-title>ning peer-to-peer data integration using both as view rules</article-title>
          .
          <source>In Proc. Workshop on Databases, Information Systems and Peer-toPeer Computing (at VLDB'03)</source>
          , Berlin,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>P.</given-names>
            <surname>McBrien</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Poulovassilis</surname>
          </string-name>
          .
          <article-title>Data integration by bi-directional schema transformation rules</article-title>
          .
          <source>In Proc. ICDE'03. ICDE</source>
          ,
          <year>March 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. L.
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Velegrakis</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hernandez</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Fagin</surname>
          </string-name>
          .
          <article-title>Translating web data</article-title>
          .
          <source>In Proc. VLDB'02</source>
          , pages
          <fpage>598</fpage>
          {
          <fpage>609</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. E. Rahm and
          <string-name>
            <given-names>P.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>A survey of approaches to automatic schema matching</article-title>
          .
          <source>VLDB Journal</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ):
          <volume>334</volume>
          {
          <fpage>350</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>C. Reynaud</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sirot</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Vodislav</surname>
          </string-name>
          .
          <article-title>Semantic integration of XML heterogeneous data sources</article-title>
          .
          <source>In Proc. IDEAS</source>
          , pages
          <volume>199</volume>
          {
          <fpage>208</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>P.</given-names>
            <surname>Rodriguez-Gianolli</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Mylopoulos</surname>
          </string-name>
          .
          <article-title>A semantic approach to XML-based data integration</article-title>
          .
          <source>In Proc. ER'01</source>
          , pages
          <fpage>117</fpage>
          {
          <fpage>132</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. H.
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Kuno</surname>
            , and
            <given-names>E. A.</given-names>
          </string-name>
          <string-name>
            <surname>Rudensteiner</surname>
          </string-name>
          .
          <article-title>Automating the transformation of XML documents</article-title>
          .
          <source>In Proc. WIDM'01</source>
          , pages
          <fpage>68</fpage>
          {
          <fpage>75</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. W3C.
          <article-title>Guide to the W3C XML speci¯cation (\XMLspec"</article-title>
          ) DTD, version
          <volume>2</volume>
          .1,
          <string-name>
            <surname>June</surname>
          </string-name>
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>W3C. XML Schema</surname>
          </string-name>
          <article-title>Speci¯cation</article-title>
          . http://www.w3.org/XML/Schema, May
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.W.</given-names>
            <surname>Ling</surname>
          </string-name>
          .
          <article-title>Resolving structural con°icts in the integration of XML schemas: A semantic approach</article-title>
          .
          <source>In Proc. ER'03</source>
          , pages
          <fpage>520</fpage>
          {
          <fpage>533</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>L.</given-names>
            <surname>Zamboulis</surname>
          </string-name>
          .
          <article-title>XML data integration by graph restructuring</article-title>
          .
          <source>In Proc. BNCOD'04, LNCS 3112</source>
          , pages
          <fpage>57</fpage>
          {
          <fpage>71</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>L.</given-names>
            <surname>Zamboulis</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Poulovassilis</surname>
          </string-name>
          .
          <article-title>Using AutoMed for XML data transformation and integration</article-title>
          .
          <source>In Proc. DIWeb</source>
          '
          <volume>04</volume>
          (at CAiSE'04), Riga, Latvia,
          <year>June 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>L.</given-names>
            <surname>Zamboulis</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Poulovassilis</surname>
          </string-name>
          .
          <article-title>Information sharing for the Semantic Web | a schema transformation approach</article-title>
          .
          <source>AutoMed Tech. Rep. 31</source>
          ,
          <year>February 2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>