SPARQL Query Mediation over RDF Data
                         Sources with Disparate Contexts
           Xiaoqing Zheng                                  Stuart E. Madnick                                    Xitong Li
     School of Computer Science                    Sloan School of Management            Sloan School of Management
          Fudan University                      Massachusetts Institute of Technology Massachusetts Institute of Technology
      Shanghai, 201203, China                       Cambridge, MA 02142, USA              Cambridge, MA 02142, USA
     zhengxq@fudan.edu.cn                                 smadnick@mit.edu                                xitongli@mit.edu


ABSTRACT                                                                loss of meaning. The data do not need to be stored in RDF but can
Many Semantic Web applications require the integration of data          be created on the fly from relational databases [3] or other non-
from distributed and autonomous RDF data sources. However, the          RDF sources. We expect that more and more content providers
values in the RDF triples would be frequently recorded simply as        will make their data available via SPARQL endpoints. However,
the literal, and additional contextual information such as unit and     much existing data on the Web takes the form of simple values for
format is often omitted, relying on consistent understanding of the     properties such as weights, costs, etc., and contextually dependent
context. In the wider context of the Web, it is generally not safe to   information such as unit and format is often omitted. Due to the
make this assumption. The Context Interchange strategy provides         openness of the Web, it is generally not safe to make the
a systematic approach for mediated data access in which semantic        assumption that anyone accessing the value of a property will
conflicts among heterogeneous data sources are automatically            understand the units being used. For example, an U.K. site might
detected and reconciled by a context mediator. In this paper, we        give a height value in feet, but someone accessing that data
show that SPARQL queries that involve multiple RDF graphs               outside the U.K. might assume that heights are given in meters.
originating from different contexts can be mediated in the way          Another example is that a gallon in the U.K. (the so-called
using the Context Interchange (COIN) Framework.                         Imperial gallon) is approximately 4546 ml, while in the U.S. the
                                                                        "same" gallon (the so-called Winchester gallon) is 3785 ml,
                                                                        almost 1 liter less. The principle that such simple values are often
Categories and Subject Descriptors                                      insufficient to adequately describe these values is an important
H.2.5 [Database Management]: Heterogeneous Databases –                  one. If the data originating from different contexts are brought
data translation; H.2.4 [Database Management]: Systems –                together and we pose queries on the whole dataset, many semantic
query processing; H.3.5 [Information Storage and Retrieval]:            conflicts can happen (see a motivational example in Section 2 for
Online Information Services – data sharing.                             more detail). Proper interpretation of RDF data would depend on
                                                                        information that is not explicitly provided in the RDF dataset, and
General Terms                                                           hence such information may be not available to other applications
Algorithms, Design                                                      that need to interpret this data.
                                                                        With the above observations in mind, the goal of this paper is to
Keywords                                                                illustrate the novel features of the Context Interchange mediation
Semantic heterogeneity, semantic interoperability, query mediator,      strategy, and to describe how the semantic conflicts in RDF data
data integration                                                        sources can be automatically detected and reconciled by query
                                                                        rewriting technique. Specifically, the paper makes the following
                                                                        contributions:
1. INTRODUCTION
An increasing amount of data is published in RDF format due to          y     We describe how to use the Context Interchange strategy to
the activity of the linked data community. The Web of linked data             achieve semantic interoperability among heterogeneous RDF
that is emerging by integrating data from different sources via               sources and receivers by rewriting the user SPARQL query to
URIs can be considered as a single, globally distributed dataspace            a mediated query. The mediated query can return the answer
[1]. With SPARQL [2], a W3C Recommendation for a query                        collated and presented in the receiver context.
language for RDF, data from different sources can be aggregated         y     We propose a formal and logical COIN framework to model
and applications can pose complex queries over the RDF dataset,               contexts, i.e., the factual statements present in a data source
which were not possible before. The presence on the Web of such               are true relative to the context associated with the source but
a huge distributed and autonomous RDF sources poses a great                   not necessarily so in a different context. With the framework,
challenge when it comes to achieve semantic interoperability                  the users are not burdened with the diverse data semantics in
among heterogeneous sources and receivers.                                    sources, all of which are declared in the context representation
                                                                              components and can be automatically taken into consideration
RDF is a data model for expressing the information that needs to              by the mediator.
be processed by applications, so that it can be exchanged without
                                                                        y     A SPARQL query rewriting algorithm is described and tested
                                                                              with the real data from CIA Factbook 1 and DBpedia 2 . The
 Copyright is held by the author/owner(s).
                                                                        1
 LDOW2012, April 16, 2012, Lyon, France.                                    http://www4.wiwiss.fu-berlin.de/factbook/
     CONTEXT: RECEIVER                                                  CONTEXT: SOURCE 1

     y Currency is USD with a scale-factor of 1;                        y Currency is USD with a scale-factor of 1;
     y Datetime is expressed in US style;                               y Datetime is expressed in US style;
     y Locations are expressed as city name.                            y Locations are expressed as IATA airport codes.

     NAÏVE SPARQL QUERY                                                 # Named graph: http://usairline.com/flights
                                                                        @prefix : <http://usairline.com/flights#> .
     1: SELECT ?airline1 ?airline2 ?total
     2:                                                                 :us339 fts:depDateTime "12:30 PM 02/09/2011" .
     3: FROM NAMED <http://usairline.com/flights>                       :us339 fts:arrDateTime "7:25 AM 02/10/2011" .
     4: FROM NAMED <http://japanairline.com/flights>                    :us339 fts:depCity "BOS" .
     5:                                                                 :us339 fts:arrCity "TYO" .
     6: WHERE {                                                         :us339 fts:price 950 .
     7: GRAPH ?graph1
     8:   { ?airline1 fts:depDateTime ?depDateTime1 ;                   :us512 fts:depDateTime "9:45 AM 02/10/2011" .
     9:                fts:arrDateTime ?arrDateTime1 ;                  :us512 fts:arrDateTime "10:30 PM 02/10/2011" .
     10:               fts:depCity "Boston" ;                           :us512 fts:depCity "TYO" .
     11:               fts:arrCity "Tokyo" ;                            :us512 fts:arrCity "SHA" .
     12:               fts:price ?price1 . }                            :us512 fts:price 380 .
     13: GRAPH ?graph2
     14:   { ?airline2 fts:depDateTime ?depDateTime2 ;                  CONTEXT: SOURCE 2
     15:               fts:arrDateTime ?arrDateTime2 ;
     16:               fts:depCity "Tokyo" ;                            y Currency is JPY with a scale-factor of 1000;
     17:               fts:arrCity "Shanghai" ;                         y Datetime is expressed in xsd:dateTime type;
     18:               fts:price ?price2 . }                            y Locations are expressed as city names.
     19:
     20: FILTER ( ?arrDateTime1 < ?depDateTime2 ) .                     # Named graph: http://japanairline.com/flights
     21: FILTER ( ?depDateTime1 >= "9:30 AM 02/09/2011" ) .             @prefix : <http://japanairline.com/flights#> .
     22: FILTER ( ?arrDateTime2 <= "11:30 PM 02/10/2011" ) .
     23:                                                                :jp241 fts:depDateTime "2011-02-10T09:25:00Z"^^xsd:dateTime .
     24: LET ( ?total := ?price1 + ?price2 ) }                          :jp241 fts:arrDateTime "2011-02-10T22:05:00Z"^^xsd:dateTime .
     25:                                                                :jp241 fts:depCity "Tokyo" .
     26: ORDER BY ASC(?total)                                           :jp241 fts:arrCity "Shanghai" .
     27: LIMIT 1                                                        :jp241 fts:price 25 .


                                                        Figure 1. Example scenario
     results show that the approach is promising and effective. The      Feb 10th, 2011. This query can be formulated on the schema of
     source code and some test cases can be downloaded from the          the two sources as the naïve query shown in Figure 1. The query
     web site http://homepage.fudan.edu.cn/zhengxq/coin/.                will return the empty answer without any mediation if it is
The rest of the paper is organized as follows. Following this            executed over the given dataset.
introduction, a motivational example is presented to highlight the
                                                                         The query, however, does not take into account the fact that both
Context Interchange strategy toward semantic interoperability.
                                                                         sources and receivers may operate with different contexts, i.e.,
The COIN framework is described by introducing the formalism
                                                                         they may have different assumptions on how the property values
in section 3. The SPARQL query mediation via query rewriting
                                                                         should be interpreted. Specifically, the user operates with city
technique is explained in section 4. A preliminary implementation
                                                                         names and US style datetimes, while the locations are recorded
is introduced in section 5. Section 6 presents a brief overview of
                                                                         using IATA airport codes in the source 1 and the source 2
related work. The conclusions are summarized in section 7.
                                                                         assumes xsd:dateTime format. It requires that certain constraints
                                                                         typed by the user should be transformed properly to comply with
2. MOTIVATIONAL EXAMPLE                                                  assumptions in the contexts of data sources (for example, from
Consider the scenario of finding cheap airfare on the Web shown          "Boston" to "BOS"; from "9:30 AM 02/09/2011" to "2011-02-
in Figure 1, deliberately kept simple for didactical reasons. In this    09T09:30:00Z"). Besides, she works with US dollars with a scale-
paper, examples assume the namespace prefix bindings given in            factor of 1, whereas the source 2 reports all ticket prices in
Appendix A unless otherwise stated. Data on scheduled-service            Japanese Yen with a scale-factor of 1000, which shows that the
flights are available in two autonomously administered data              data might vary in two or more aspects (in that case, currency and
sources. We assume that the flights are described by the terms           scale). So there must be more than one conversion of the data.
from shared vocabularies to highlight the data-level conflicts.          Even if these specific differences were carefully dealt with by
Suppose a user looks for a one-way ticket from Boston to                 writing a new query with appropriate datetime formats, currencies
Shanghai with one stop in Tokyo. She wants to leave Boston after         and city codes for each individual source (which might be a
9:30 a.m., Feb 9th, 2011 and arrive in Shanghai before 11:30 p.m.,       significant challenge for the user, especially if unfamiliar with the
                                                                         details of each of the multiple sources involved), the result still
2
    http://dbpedia.org/
would be misleading. For the source 1, the graph pattern of naïve      optimizers. We have shown only one user in this scenario.
query needs to be rewritten to the following one:                      However, other users with different contexts could also issue their
   GRAPH ?graph1                                                       queries and get the results in their own contexts by simply
      { ?airline1 fts:depDateTime ?depDateTime1;                       declaring or choosing their contexts. Similarly, other RDF data
                  fts:arrDateTime ?arrDateTime1;                       sources can be added to the system with the declaration of their
                  fts:depCity "BOS";                                   contexts, and queries over multiple sources with different contexts
                  fts:arrCity "TYO";                                   could be mediated in the similar way.
                  fts:price ?price1 . }
     GRAPH ?graph2                                                     3. CONTEXT REPRESENTATION
      { ?airline2 fts:depDateTime ?depDateTime2;                       The purpose of knowledge representation in COIN is to provide a
                  fts:arrDateTime ?arrDateTime2;                       formal way of making explicit disparate assumptions about data.
                  fts:depCity "TYO";                                   Knowledge about the source and user contexts is declared under a
                  fts:arrCity "SHA";                                   formal, logical COIN framework consisting of the following four
                  fts:price ?price2 . }                                core components (see Figure 2):
But the result below still is incorrect because it is not cheapest
airfare:
         airline1             airline2             total
           us339               us512              1330.00
The above result is a solution corresponding to the way in which
the query's pattern matches the RDF data, all from the source 1.
For the source 2, the result returned is empty because there is only
one flight from "Tokyo" to "Shanghai". SPARQL, however, can
be used to express queries across diverse data sources, and a
SPARQL query can match different parts of the query pattern
against different graphs. In the example, it does not work by
rewriting the naïve query into two separate queries, one for
executing over source 1 and one for source 2, and combining the
results. A solution is required to make these context conversions
in dynamic way, depending on which data sources is involved.
Further, the answers should be transformed so that they conform                Figure 2. An illustration of the COIN framework
to the context of the user.
In the Context Interchange system, the semantics of data of those      y   O = <C, P>, the Context Ontology, is a description of generic
present in a source, or of those expected by a receiver can be             concepts C that would be interpreted differently across data
explicitly represented in the form of a context theory and a set of        sources and receivers (e.g. "MonetaryValue"), and the conflict
context statements with reference to a context ontology (see               dimensions (or modifiers) that are defined as the properties P
Section 3). Queries submitted to the system would be rewritten             for these concepts (e.g. "hasCurrency").
into the mediated queries by a Context Mediator, in which the          y A, the Context Pool, is a set of instantiations of the context
semantic conflicts between the sources and the receiver would be           ontology. Value assignments are made for each modifier to
automatically recognized and reconciled.                                   explicate the meaning of a concept in a data source or receiver
The naïve query can be transformed to the mediated query by the            (e.g. ctx:JP_yen coin:scale "1000"^^xsd:integer).
rewriting algorithm described in the Section 4. This mediated          y M, the Context Mappings, defined a set of context statements
query considers all potential conflicts between the sources and the        that associate the sources or receivers with their contexts. A
receiver when matching, restricting and testing values. Moreover,          context statement can identify the correspondences between a
the answers returned can be transformed so that they conform to            property used in the sources and a context instance defined in
the context of the receiver. The mediated query, when executed,            A (e.g. fts:price coin:monetaryValue ctx:JP_yen).
returns the "correct" answer below, which helps the user to make       y F, the Conversion Function Bindings, specified which
the right choice.                                                          functions should be used to make data transforms. Conversion
         airline1            airline2              total                   functions are defined to achieve conversions between different
           us339               jp241              1255.59                  contexts. For each modifier at least one conversion function
                             Exchange rate: 100 USD = 8181 JPY             will be defined to transform a value in one (source) context
In the COIN system, query mediation and query answering are                into a corresponding value in another (target) context (e.g.
separated as shown in the above example. Since the mediated                fn:cvt_currency("JPY", "USD", 25000), which converts
queries encode all the necessary data transformations, they can be         25000 Japanese Yen to the equivalent US dollars).
executed by existing query engines such as ARQ3, AllegroGraph4,        In the remaining subsections, we describe each of the components
OpenRDF Sesame5, taking advantage of their sophisticated query         in turn with examples. RDF and OWL have been used to describe
                                                                       context knowledge in the framework. The adoption of RDF and
                                                                       OWL provides us with greater flexibility in representing, reusing
3
  http://jena.sourceforge.net/ARQ/                                     and exchanging data semantics captured in different contexts.
4
  http://www.franz.com/
5
  http://www.openrdf.org/
3.1 Context Ontology                                                       are incorporated in the system. If all sources and receivers hold a
Context ontology is a collection of generic concepts (or classes in        common meaning for a given concept, no modifier is required at
the OWL language), which provides a common type system for                 that time. When that situation changes at a later time, modifiers
                                                                           can be introduced to handle the variations.
describing data semantics exchanged between disparate systems.
A context ontology corresponding to the motivational example in
Section 2 can be seen in Figure 2. Concepts, denoted by C, are             3.3 Context Mappings
depicted by ellipses, and the "Context" is the special concept from        Context mappings provide the articulation of the data semantics
which all other concepts inherit. Concepts may have properties,            for the sources and receivers, which are often implicit in the given
called modifiers and denoted by P, which serve as annotations              contexts. For each concept in the context ontology, a predicate
that make explicit the semantics of data in different contexts.            with the same name as the concept was defined (for example, the
                                                                           predicate coin:monetaryValue needs to be defined for the concept
Contexts are the actual specializations of the concepts subject to
                                                                           "MonetaryValue"). These predicates are used to associate the
multiple meanings across sources and receivers. For sources, the
                                                                           properties used in the sources with the corresponding context
contexts are defined as the specializations used for the underlying
                                                                           instances in order to make explicit the data semantics of the
data values. For receivers, on the other hand, the contexts are
                                                                           values of the properties. The statements of the context mappings
defined as the specializations assumed in viewing the data values.
These specializations could be the representation of date or the           are defined over the domain (R D) × T × A, where R is a set
number scale being used.                                                   of the IRIs that represent the properties appeared in the data
                                                                           sources, D is a set of IRIs used to identify specific applications,
The modifiers, as properties, will be inherited by the sub-concept         and T is a set of predefined vocabulary that have the same names
relations from its ancestors. A concept can have multiple                  as the concepts defined in the context ontology, but begin with a
modifiers, each of which indicates an orthogonal dimension of the          lower case letter. T is used to indicate the types of the context
variations. For example, the concept "MonetaryValue" has four              instances assigned in the context mapping statements.
modifiers, two of which are "hasCurrency" introduced by the
"MonetaryValue" and "hasScale" inherited from the "Number",                For example, the fact that the values of the "fts:price" property are
which indicates that its instances could be interpreted according to       reported in US dollars using a scale-factor of 1 is made explicit by
two dimensions: scale factor and money currency. All the                   the following statement:
concepts and their properties of the context ontology are defined             fts:price coin:monetaryValue ctx:US_dollar .
in the namespace bound to the prefix "coin".                               The "ctx:US_dollar" is a context instance of the "MonetaryValue"
                                                                           concept, and was defined in the example of Section 3.2. If the
3.2 Context Pool                                                           data is structured in the form: subject-predicate-[ ]-predicate-
Context pool contains a set of instantiations of the concepts in the       object, using an intermediate blank node, as the example similar
context ontology. As we mentioned above, modifiers are special             to the following definition:
properties that affect the interpretation of data values. The context        item hasPrice _:blanknode
ontology defines what types of modifiers apply to which concepts.            _:blanknode price "100"
A context instance or individual is defined by a set of RDF                  _:blanknode currency "USD "
statements that determine the values of modifiers. Such                    In that case, the context mapping will be defined by associating
statements are defined over the domain A × P × (I L), where                the "price" property with the appropriate context instance to make
I is a set of the IRIs, and L is a set of the RDF literals. If {s, p, o}   explicit the data semantics of the value "100" of that property. For
is a statement about a context instance s, and s belongs to a              the sources, the context mappings are defined by attaching the
concept C C, the property p P should be able to apply to the               context instances to the properties, which could be considered as
class C, and a value o must be in the range of the property p.             some extensions to the RDF Schema vocabulary. The extensions
                                                                           support the description of context information indicating how the
For example, a context instance "ctx:US_dollar" can be described           values of a given property are interpreted.
with the following statements:
  ctx:US_dollar coin:hasDataType xsd:long .                                We cannot assume that the users have intimate knowledge of the
  ctx:US_dollar coin:hasScale "1"^^xsd:integer .                           data sources being accessed since this assumption is generally
  ctx:US_dollar coin:hasCurrency "USD" .                                   non-tenable when the number of systems involved is large and
These modifier assignments explicitly indicate that any data value         when changes are frequent. The users should remain isolated from
associated with the "ctx:US_dollar" is in US dollars with a scale-         underlying semantic heterogeneity, i.e., they are not required to
factor of 1 and is represented by the typed literal "xsd:long". All        be sufficiently familiar with the properties in different schemas
the context instances are defined in the namespace bound to the            (so as to construct a query). There are some graphic tools (Gruff6
prefix "ctx". The objects (xsd:long, "1", and "USD") in the above          for example) to assist the users to create SPARQL queries. The
RDF statements are called modifier values.                                 COIN enables the users to identify their contexts independent of
                                                                           the data sources. Receiver's contexts can be declared by assigning
We can declare new context instances or reuse which are already            the instances in A to D with the aid of vocabulary in T. For
defined in the pool. For each concept of the context ontology that         example, a user could declare that she use US dollars by the
is interpreted differently by sources and receivers, modifiers are         following statement:
introduced to explicate those differences. The advantage of this
                                                                              :flight coin:monetaryValue ctx:US_dollar .
approach is that it allows conflicts between sources and receivers
to be introduced gradually as they are discovered. Many conflicts
emerge later in the integration process as more sources and users          6
                                                                               http://www.franz.com/agraph/gruff/
The ":flight" is used to indicate an application domain, so a          Conversion function bindings, denoted by F, are defined by a set
meaningful name is recommended. A user is allowed to make              of RDF triples that determine which functions can be used to
different context definitions for different applications.              make necessary data transforms for the modifiers of the concepts.
All context statements for a data source or a receiver should be       Such statements are defined over the domain C × P × N, where N
made in a separate namespace. The correspondences between the          is a set of IRIs used to identify and retrieve conversion functions.
sources or receivers and their context definitions then need to be
further identified. We assume that all context statements about the    4. SPARQL QUERY MEDIATION
source 1 were described at <http://coin.mit.edu/sources/usairline>.    The goal of the COIN framework is to provide a formal, logical
The following triple asserts that the context of the source 1 is       basis that allows for the automatic mediation of queries such as
defined in the RDF graph <http://coin.mit.edu/ sources/usairline>,     those described in Section 2. The semantic conflicts would happen
where the source 1 is identified by <http:// usairline.com/flights>:   when the RDF literals typed by a receiver are attempted to match
  <http://usairline.com/flights> coin:hasContext                       against source graphs or the RDF terms from different graphs are
  <http://coin.mit.edu/sources/usairline> .                            compared. In this section, we describe the process of rewriting a
Such RDF statements, called context bindings, are defined over         naïve SPARQL query (i.e. query ignoring semantic differences
                                                                       between sources and/or receivers) to a mediated query with all the
the domain I × coin:hasContext × I. The context bindings,
                                                                       semantic conflicts reconciled and the query results transformed
denoted by B, will be added in the mediated query to retrieve the
                                                                       appropriately according to user expectation.
context mappings of the data sources and/or receivers (more about
these in the section 4).
                                                                       4.1 Well-Formed Query
                                                                       Given a naïve SPARQL query, context mediation is bootstrapped
3.4 Conversion Function Bindings                                       by transforming this user query into a logically equivalent query.
The preceding statements are not yet sufficient for resolving          The mediation process starts by converting the naïve query into
conflicts of data present in disparate contexts, since we have to      its well-formed query that must satisfy the following requirements:
define how values in one (source) context are to be reported in
                                                                       y All the data sources are introduced as named graphs by using
another (target) context with different assumptions (i.e., modifier
                                                                            the FROM NAMED clause.
values). This is accomplished via the introduction of conversion
                                                                       y All the graph patterns (i.e., basic or group graph patterns) are
functions that are defined for each modifier between two different
                                                                            defined inside the scope of GRAPH keywords.
modifier values. A general representation of conversion functions
is shown as follows:                                                   y There is no such variable that is used in two different GRAPH
                                                                            clauses.
   fn:cvt_modifier(mvs, mvt, vs)
                                                                       The translation to meet the first two requirements is obviously a
where mvs and mvt are two distinct values of the modifier in the       trivial exercise. Note that the GRAPH keyword could be followed
source and target contexts respectively. The function returns the      by a variable or an IRI. If a variable is provided, it is not possible
equivalent value vt that complies with assumptions in the target       to know in advance which named graph will be matched since the
context for the source value vs. For example, a scale conversion       variable will range over the IRIs of all the named graphs in the
fn:cvt_scale could be defined by multiplying a given value with        query's RDF dataset. Query variables in SPARQL queries have
the appropriate ratio as shown below:                                  global scope and use of a given variable name anywhere in a
   fn:cvt_scale: vt = vs * mvs / mvt                                   query identifies the same variable. However, a variable shared by
Note that the conversion function will return vt directly if mvs or    different named graphs might be bound to the logically-equivalent
mvt is undefined.                                                      term, but with different representations in different contexts. So
                                                                       the last requirement is necessary and can always be guaranteed by
In some cases, ancillary data sources may be used for defining         renaming variables and adding the corresponding FILTERs to the
appropriate functions. For instance, currency conversions need to      WHERE clause. For example, if a ?var variable is used across two
be supported by external data sources that provide the exchange        different GRAPH clauses, one variable would be renamed ?nvar,
rate between two different currencies. Atomic conversions can be       and a FILTER(sameTerm (?var ,?nvar)) constraint will be created.
composited to construct composite conversions. As we mentioned,
each modifier captures one aspect of interpretation that may vary
across contexts. After invoking an atomic conversion, the source       4.2 Semantic Conflicts Detection
value is transformed to a (intermediate) context that is the same as   The semantic conflicts are detected by a context mediator through
the target context in terms of this aspect; by invoking the atomic     the comparison of context statements corresponding to the sources
conversions consecutively, the value is converted through a series     and receivers engaged in query patterns. The algorithm 1 shows
of intermediate contexts, each having one aspect being different       how to detect the potential conflicts among heterogeneous sources
from the target context; it reaches the target context in the end      and receivers for given a pair of properties, where a three-place
when no aspect is different from the target context. Thus, in the      notation Tripleg(s, p, o) is used to represent a typical RDF triple
case of "MonetaryValue" that has two modifiers, currency and           <s, p, o> and if the subscript g is given it indicates that the triple
scale, we may have:                                                    is defined under the named graph g. The subscript g could be an
                                                                       IRI or a variable. In this paper, Tripleg(s, p, o) is also used to
   fn:cvt_scale(1000, 1, fn:cvt_currency("JPY", "USD", 25))
                                                                       denote a triple pattern.
Hence, if the function for currency returns the value 0.30559, it
will be rewritten to 305.59 by the scale conversion function. All      Note that the properties P1 and P2 are not required to be
the COIN components for the motivational example are given in          associated with the same concept in C, which makes it possible to
Appendix B except the context ontology that has already shown          express the constraints like FILTER (?price < ?weight). If the
in Figure 2.                                                           properties of ?price and ?weight have a common modifier,
Algorithm 1. Detection of semantic conflicts between sources              1: SELECT ?concept1 ?concept2 ?modifier ?mv1 ?mv2
and/or receivers for a given property                                     2: WHERE {
Input:     K:COIN components, K = < O, A, M, F > including B              3: S1 coin:hasContext ?def1
           S1 :
             a data source IRI (could be a variable)                      4: S2 coin:hasContext ?def2
           S2 :
             another data source IRI (could be a variable) or an          5: GRAPH ?def1 { P1 ?concept1 ?ctx1 }
             IRI used to identify a receiver                              6: GRAPH ?def2 { P2 ?concept2 ?ctx2 }
        P1 : a property IRI used in S1                                    7: ?ctx1 ?modifier ?mv1 .
        P2 : a property IRI used in S2 or an IRI used to identify a       8: ?ctx2 ?modifier ?mv2 .
             application domain if S2 is a receiver identifier            9: FILTER(!sameTerm(?mv1, ?mv2)) } .
Output: SC : a set of semantic conflicts detected
                                                                           Figure 3. The equivalent SPARQL query for Algorithm 1
1   SC                                                                    shown in Table 1. Note that we just need to detect the conflicts
2:  def1 = { def | Triple(S1, coin:hasContext, def) B }                   that are required to make necessary conversions for queries.
3:  def2 = { def | Triple(S2, coin:hasContext, def) B }
4:  ctx1 = { ctx | Tripledef1(P1, concept, ctx) M }                       Table 1. The semantic conflicts detected in the motivational
5:  C1 = { concept | Tripledef1(P1, concept, ctx1) M concept T }          example for the property fts:price
6:  if S2 is a data source identifier then                                         modifier              mv1                    mv2
7:       ctx2 = { ctx | Tripledef2(P2, concept, ctx) M                         coin:hasScale              1                     1000
8:       C2 = { concept | Tripledef2(P2, concept, ctx2) M                      coin:hasCurrency          USD                    JPY
                              concept T }
9: else ctx2 { ctx | Tripledef2(P2, C1, ctx) M }                          For the same reason, in some cases it is not possible to know in
10:      C2 = C1                                                          advance how to interpret the value of a property because we
11: for each modifier { p | Triple(ctx1, p, mv) A p P } do                cannot know in advance the sources of the value. However, in
12:      mv1 { mv | Triple(ctx1, modifier, mv) A }                        most cases, we are able to know with which concept the property
13:      mv2 { mv | Triple(ctx2, modifier, mv) A }                        was associated by looking up the context mappings. With the
14:      if mv1 mv2 and mv1 NULL and mv2 NULL then                        concept name, we can get all the modifiers applied to the concept,
15:          SC SC { S1, S2, P1, P2, C1, C2, modifier, mv1, mv2 }         access the values of the modifiers, and then use them to construct
16: return SC                                                             the appropriate conversion functions. An OPTIONAL keyword
                                                                          will be used when we cannot know in a static way whether certain
"hasScale" for example, and different modifier values, this
                                                                          modifiers or modifier values are defined in the context statements.
conflict will also be detected and be used to construct the               Notice that if a concept has no modifier, there is no conversion
conversion function (see Section 4.3). Recall that the users are not      function defined for the concept, so the values are assumed not to
required to be sufficiently familiar with the underlying schemas of       vary across any context.
data sources, and their contexts are declared by assigning the
context instances to the concepts C via T. If S2 identifies a receiver,   Algorithm 2. Conversion function construction
the context instance ctx2 could be retrieved directly by looking up       Input:  K : COIN components, K = < O, A, M, F> including B
M for the concept C1 as shown in line 9 of the algorithm 1.                       SC : a set of semantic conflicts detected by the algorithm 1
                                                                                  MV : a data value or a variable in source context
The algorithm 1 can only be used to statically detect the semantic                D : a boolean variable to indicate conversion direction
conflicts between sources and/or the receiver. Sometimes it is            Output: CF : a composite conversion (default value is NULL)
impossible to know in advance which sources the query pattern
                                                                          1:  for each { S1, S2, P1, P2, C1, C2, modifier, mv1, mv2 } SC do
will be matched until the query is executed, not to mention the
                                                                          2:      if    CF = = NULL then
contexts of the sources. For example, the following group graph
                                                                          3:            function { f | Triple(C, modifier, f ) F C C
pattern in the query of the motivational example can be matched                                  ( C = C1 C is a closest super-concept of C1 ) }
against both source 1 and 2 (assuming the city names, "Shanghai"          4:            if     D = = TRUE
and "Tokyo", are automatically transformed so that they conform           5:                   CF function(mv1, mv2, MV)
to the contexts of the corresponding sources).                            6:            else CF function(mv2, mv1, MV)
   GRAPH ?graph2                                                          7:      else function { f | Triple(C, modifier, f ) F C C
      { ?airline2 fts:depDateTime ?depDateTime2 ;                                                ( C = C1 C is a closest super-concept of C1 ) }
                   fts:arrDateTime ?arrDateTime2 ;                        8:            if     D = = TRUE
                   fts:depCity "Tokyo" ;                                  9:                   CF function(mv1, mv2, CF)
                   fts:arrCity "Shanghai" ;                               10:           else CF function(mv2, mv1, CF)
                   fts:price ?price2 . }                                  11: return CF
The trick is that we are able to access the graph name by making a
variable (i.e., ?graph2 used in the above pattern) bound to IRIs of       4.3 Conversion Function Construction
the graph being matched, and then use the graph name to obtain            The conversion functions are introduced to define how values of a
its context definition via context bindings (see Section 3.3).            given concept are transformed between different contexts. In the
SPARQL queries can be used to find all the potential semantic             COIN framework, an "ontology-based" representation is adopted
conflicts instead of the algorithm 1, and the algorithm can be            where conversion functions are attached to concepts in different
trivially translated to the equivalent SPARQL query shown in              contexts. This mechanism allows for greater sharing and reuse of
Figure 3. If this query is applied to the source 1 and 2 for the          semantic encoding. For example, the same concept may appear
property fts:price, two semantic conflicts would be detected as           many times in different properties (e.g., consider the concept
                                                                          "MonetaryValue"). Rather than writing a function conversion for
each property that redundantly describes how different reporting       Algorithm 3. SPARQL query rewriting
currencies are resolved, we can simply associate the conversion
                                                                       Input:  K : COIN components, K = < O, A, M, F > including B
function with the concept "MonetaryValue".
                                                                               WQ : a well-formed SPARQL query
In addition, when a property involves two or more conflicts, a                 r:   an IRI used to identify a receiver
composite function can be constructed to reconcile those conflicts             a : an IRI used to identify a application domain
by the algorithm 2. It is also scalable because it can compose all     Output: MQ : an mediated SPARQL query
necessary conversions using a small set of component conversions.
The composition can be obtained as a series of invocations on the      1:  MQ WQ
conversion function defined for each modifier pertaining to the        2:  pattern
                                                                       3:  parse the query WQ and extract its triple patterns into bgp
concept in the context ontology.
                                                                       4:  M and A are included into the dataset of MQ as named graphs
The function-finding method in the algorithm 2 (line 3 and 7) at       5:  for each var { v | v is a RESULT variable
first will try to find the specific conversion function defined for                              Tripleg(s, p, v) bgp } do
the modifier of the concept. If it cannot be found, the method will    6:      if SC Algorithm1(K, g, r, p, a) SC                 then
try to find the function defined for the same modifier under the       7:          pattern pattern PatternCreator(g, r, p, a, SC) / bgp
direct super-concept. The step will repeat until a function is         8:          CF Algorithm2(K, SC, v, TRUE)
retrieved or it reaches the top "Context" concept. For example, the    9:          adds the assignment LET ( nvar := CF ) into MQ and
function cvt_format_dateTime( ) is defined for the modifier                        replace var with nvar in the SELECT clause, where nvar
"hasFormat" under the concept "dateTime" that has no sub-                          is a new variable never used before.
concepts, while cvt_scale( ) function is defined under the concept                 /* if the variable v is introduced by a LET clause, the
"Number" and the function can be used by all its sub-concepts                      context conversion will be done in the LET clause */
                                                                       10: for each cont = { c | c is a constant Tripleg(s, p, c) bgp } do
such as "Weight", "Height" and "MonetaryValue" for scale-factor
                                                                       11:     if SC Algorithm1(K, g, r, p, a) SC                 then
adjustment.
                                                                       12:         pattern pattern PatternCreator(g, r, p, a, SC) / bgp
Note that the translation from one context to another is embedded      13:         CF Algorithm2(K, SC, cont, FALSE)
in conversion functions present in individual context theories, and    14:         adds the FILTER( nvar := CF ) into MQ and replace
they are not part of the context ontology. This means that there is                cont with nvar in Tripleg(s, p, c), where nvar is a new
greater scope for different users to introduce conversion functions                variable never used before.
which are most appropriate for their purposes without requiring        15: for each EXPRESSION ( expr1 op expr2 ) or
these differences to be accounted globally. For example, different                    FUNCTION( expr1, expr2 ) do
                                                                       16:     if expr1 is a constant expr2 is a variable
currency exchange system would be used in different countries or
                                                                                   Tripleg(s, p, expr2) bgp
for different purposes.
                                                                                   SC Algorithm1(K, g, r, p, a) SC              then
The modifiers of the concept are called orthogonal if the value        17:         pattern pattern PatternCreator(g, r, p, a, SC) / bgp
derived from its composite conversion is not affected by the order     18:         CF Algorithm2(K, SC, expr1, FALSE)
in which the component conversions are invoked. For example,           19:         replace expr1 with CF in the expression or function
the currency and scale modifiers of the "MonetaryValue" concept        20:     if expr2 is a constant expr1 is a variable
are orthogonal. We will get the same value either by converting                    Tripleg(s, p, expr1) bgp
currency first followed by scale-factor adjustment or by adjusting                 SC Algorithm1(K, g, p, r, a) SC              then
                                                                       21:         pattern pattern PatternCreator(g, r, p, a, SC) / bgp
scale-factor first followed by currency conversion. For any pair of
                                                                       22:         CF Algorithm2(K, SC, expr2, FALSE)
modifiers that are not orthogonal, the component conversions are
                                                                       23:         replace expr2 with CF in the expression or function
required to be invoked in a particular order to return right value.    24:     if expr1 is a variable expr2 is a variable
The order depends on how the component conversion is specified.                    Tripleg1(s1, p1, expr1) bgp
An in-depth discussion on this issue can be found in [6].                          Tripleg2(s2, p2, expr2) bgp
                                                                                   SC Algorithm1(K, g1, g2, p1, p2) SC              then
4.4 Query Rewriting                                                    25:         pattern pattern
The context mediator uses the algorithm 3 to undertake the role of                            PatternCreator(g1, g2, p1, p2, SC) / bgp
detecting and reconciling potential conflicts at the time a query is   26:         CF Algorithm2(K, SC, expr1, TRUE)
                                                                       27:         replace expr1 with CF in the expression or function
submitted. The following parts in a query might be rewritten to
                                                                       28: adds pattern into MQ
the corresponding forms that all semantic conflicts, when detected,
                                                                       29: return MQ
are resolved:
y SELECT: the answers returned should be further transformed           Function PatternCreator(S1, S2, P1, P2, SC)
    so that they conform to the context of the receiver.               1: pattern
y WHERE: the constants should be transformed to comply with            2: pattern { Triple(S1, coin:hasContext, def1) ,
    assumptions in the source contexts.                                                Triple(S2, coin:hasContext, def2) ,
y EXPRESSION and FUNCTION: one of two arguments might                                  Tripledef1(P1, C1, ctx1) ,
    be transformed so that the two arguments conform to the same                       Tripledef2(P2, C2, ctx2)
    context.                                                           3: for each { S1, S2, P1, P2, C1, C2, modifier, mv1, mv2   SC
First, the context mappings M and context pool A will be added         4:      pattern      pattern ,
                                                                                           Triple(ctx1, modifier, mv1) ,
into the RDF dataset as named graphs (line 4). A mediated query
                                                                                            Triple(ctx2, modifier, mv2)
is executed against the RDF dataset that comprises one or more
                                                                       5: return pattern
data sources and the two graphs generated from M and A. In the
current SPARQL core standard, SELECT queries only project out           xiv:  ?ctxr coin:hasCurrency ?mvcr . }
variables bound in the query and there is no way to return the          xv: LET ( ?total := fn:cvt_currency(?mvc1, ?mvcr,
values of expressions over result bindings. The mediated queries                            fn:cvt_scale(?mvs1, ?mvsr, ?price1)) +
                                                                                            fn:cvt_currency(?mvc2, ?mvcr,
need the ability to project expressions rather than just variables                          fn:cvt_scale(?mvs2, ?mvsr, ?price2)) )
because the results need to be transformed to comply with
                                                                        The line i-iv was generated to retrieve the context definitions of
assumptions in the receiver context. An example is returning the        the two possible data sources (one is ?graph1 in which ?price1
total cost of two air tickets. LET assignments are used to enable       will be retrieved; the other is ?graph2 that will provide ?price2)
transformation on the results by using the conversion functions         and the receiver. Note that the duplicate patterns will be deleted
(line 9). For the same reason, we also use FILTER clauses in the        from the mediated query when the patterns are merged into the
line 14 to transform the constants in the queries to comply with        previously-generated ones. The context instances and their
assumptions in the sources contexts.                                    modifier values will be obtained by the line v-xiv. To rewrite the
Considering the naïve query in the motivational example, the first      LET assignment (line xv), the algorithm 2 will be called twice,
triple pattern encountered that needs to be processed is the one:       each for ?price1 and ?price2, to make the sum of the two prices
    GRAPH ?graph1 { ?airline1 fts:depCity "Boston" . }                  comply with assumptions in the receiver context.
The line 10-14 of the algorithm 3 will transform this triple into the
following patterns.                                                     5. IMPLEMENTATION
i: GRAPH ?graph1 { ?airline1 fts:depCity ?depCity . }                   We use Jena7 to implement COIN SPARQL framework to provide
ii: GRAPH <http://coin.mit.edu/bindings/flight>                         a demonstration of the feasibility of Context Interchange strategy.
iii: { ?graph1 coin:hasContext ?def1 .                                  As shown in Figure 4, queries submitted to the system will be
iv:     coin:receiver coin:hasContext ?recv . }                         intercepted by a Context Mediator, which rewrites the user query
v: GRAPH ?def1 { fts:depCity coin:city ?ctx1 . }                        into a mediated query and pass it to ARQ, the SPARQL query
vi: GRAPH ?recv { :flight coin:city ?ctx2 . }                           engine for Jena. COIN also can easily be plugged in to other
vii: GRAPH <http://coin.mit.edu/contexts>                               existing SPARQL query engines to take advantage of their state
viii: { ?ctx1 coin:hasEncoding ?mv1 .
                                                                        of the art query optimization and execution techniques. We tested
ix:     ?ctx2 coin:hasEncoding ?mv2 . }                                 the solution with the real data from CIA Factbook and DBpedia.
x: FILTER ( ?depCity = fn:cvt_encoding_city(?mv2, ?mv1,
                                                 "Boston") ) .               Context Mediation                                                                           User
                                                                                                 Conversion      Context                   Context                     Context
A new variable ?depCity was generated and replaced "Boston" at                  Local            Functions        Pool                     Ontology                    Mappings
the line i. The context statements of the data sources can be                 Databases
accessed dynamically via the variables ?graph1 that will bound to
IRIs of the sources being matched (line iii). The pattern of the line           SPARQL          Optimized
                                                                                                 Query
                                                                                                                              Mediated
                                                                                                                               Query
                                                                                                                                                              Query
                                                                                                               Query                       Context
iv was used to retrieve the receiver's context definition. The two               Query                        Optimizer                    Mediator
                                                                                 Engine
context instances of the concept "City" will be obtained by the
line v-vi. One was associated with the property fts:depCity of the
                                                                                                                           Query Answers                   Extensional Anssers
data sources; the other was referenced in the receiver's context            Query     Query Answers
statements. The modifier values of those two context instances                                                                Query

                                                                                                      Context                                   Context
can be retrieved by the line viii-xi, which will be taken as inputs                                   Mappings                                  Mappings         RDF Views
to the conversion function fn:cvt_encoding_city. In the FILTER                 RDF
                                                                               Data
clause of the line x, the English city name "Boston" might be                           RDF
                                                                                        Data
transformed to comply with the contexts of the data sources. If the                              RDF                                                              Relational
                                                                                                 Data                                                             Databases
variable ?graph is bound to the IRI of the source 1 (USA airlines),
"Boston" will be rewritten into "BOS". If ?graph is bound to the            RDF Data Sources                                                      Other Data Sources
IRI of the source 2 (Japan airlines), "Boston" will stay unchanged.
                                                                                    Figure 4. Architecture of a COIN SPARQL system
For the result variables of the naïve query, only the variable ?total
needs to be processed because it is the sum of the prices of the        OWL is used to represent the context ontology, and the pool and
two tickets, ?price1 and ?price2 that occur as objects in the triple    mapping components are described by using RDF. All conversion
patterns. The following fragment of the mediated query will be          functions now are implemented as FILTER (extension) functions,
generated by the line 6-9 of the algorithm 3 for the variable ?total.   which use the advanced feature of ARQ that goes beyond the core
i: GRAPH <http://coin.mit.edu/bindings/flight>                          SPARQL specification. The conversion functions extend the
ii: { ?graph1 coin:hasContext ?def1 .                                   abstract class FunctionBase3 that takes three parameters. The
iii:    ?graph2 coin:hasContext ?def2 .                                 query engine can dynamically call the functions based on the
iv:     coin:receiver coin:hasContext ?recv . }                         function URIs. This is done by either registering it or using the
v: GRAPH ?def1 { fts:price coin:monetaryValue ?ctxp1 . }                fake java: URI scheme.
vi: GRAPH ?def2 { fts:price coin:monetaryValue ?ctxp2 . }               Conversion functions are defined as parametric functions that can
vii: GRAPH ?recv { :flight coin:monetaryValue ?ctxr }                   express the conversion between all possible modifier value pairs,
viii: GRAPH <http://coin.mit.edu/contexts>                              not only a specific one. For example, fn:cvt_ currency can be used
ix: { ?ctxp1 coin:hasScale ?mvs1 .                                      to make currency conversion among any different currencies. For
x:      ?ctxp1 coin:hasCurrency ?mvc1 .                                 "hasFormat" and "hasEncoding" modifiers, the value at first is
xi:     ?ctxp2 coin:hasScale ?mvs2 .
xii:    ?ctxp2 coin:hasCurrency ?mvc2 .
                                                                        7
xiii:   ?ctxr coin:hasScale ?mvsr .                                         http://jena.sourceforge.net/
translated from the source context into an internal representation     The context interchange strategy is mediator-based approach for
and then into the target context. A datetime value, for example, is    achieving semantic interoperability among heterogeneous sources
firstly transformed into xsd:dateTime by fn:cvt_format_dateTime        and receivers. As realizations of the strategy, COIN [4], [7] and a
before being transformed again in order to comply with the target      recent extension [9], [19], are working prototypes that implement
context. Notice that the number of conversion functions is not         the Context Interchange strategy. COIN uses FOL/Prolog as the
generally proportional to the number of sources and receivers in       representation and implementation language for the application
the system. The COIN framework facilitates the maximum reuse           ontology in the context mediation. Various sample applications
of existing conversion functions, and thus the number of newly-        have been implemented to illustrate its ability to solve semantic
introduced conversion functions would diminish rapidly with the        interoperability problems in areas such as financial services,
addition of each source or receiver. A more in-depth analysis of       weather information, and airfare aggregation and comparison. Our
scalability can be found in [18].                                      goal in this paper is to illustrate how to extend COIN strategy to
                                                                       solve context conflicts in the emerging linked data by SPARQL
With the COIN framework, the users are not burdened with the
                                                                       query rewriting.
diverse data semantics in sources, all of which are declared in the
context representation components and can be automatically taken       It is worth noting some interesting work which is complementary
into consideration by the mediator. A receiver in any context is       to our approach. A system, called Sesame, as well as its index
able to issue queries over any set of data sources in other contexts   structure was presented for query processing and optimization in
as if they were in the same context. For example, it is easy for       distributed RDF repositories in [11]. Glimm and Krötzsch extend
users to modify their context definition. Continue the motivational    the SPARQL query language by defining the semantics of queries
example in Section 2. Here the receiver wants to use CNY instead       under the entailment regimes of RDF, RDFS, and OWL [16]. An
of USD. After the modification is made, the query results will         approach is proposed to discover data that might be relevant for
immediately be reported in CNY as follows.                             answering a query during the query execution itself [5]. The
                                                                       discovery is driven by following RDF links between data sources
         airline1            airline2              total
                                                                       based on URIs in the query and in partial results.
          us339                jp241             8172.32
      Exchange rate: 100 USD = 651.56 CNY; 100 JPY = 7.93 CNY.         7. CONCLUSIONS
The sources are not required to make any change or commit to           The "Web of linked data" can be understood as a single, globally
any criteria under the COIN framework; they only need to record        distributed dataspace, and SPARQL queries can be executed over
data semantics declaratively. Adding or removing a data source is      this huge dataspace. But semantic heterogeneity widely exists
accomplished by adding or removing the corresponding context           among the RDF data sources originating from different contexts,
declarations, which does not require any changes to the mediator       and severely hampers their integration and interoperability. This
or query processor. Conversion functions are defined for each          paper describes how to use the Context Interchange strategy to
modifier between distinct modifier values, not between pair-wise       achieve semantic interoperability among heterogeneous RDF
sources. Thus only a small number of atomic or component               sources and receivers by rewriting the user query to a mediated
conversions need to be defined, which are used by the mediator to      query. The semantic conflicts can be automatically detected and
compose necessary composite conversions in a dynamic way to            reconciled by the context mediator using the context definitions
reconcile all semantic differences involved in a query. In many        associated with both the data sources and the data receivers. The
practical cases, an atomic function can be parameterized for the       mediated query, when executed, returns the answer collated and
modifier to convert from any given context to any other context.       presented in the receiver context. The core components of COIN
The COIN strategy requires significantly less conversions than         are represented by using the Semantic Web constructs such as
traditional global or interchange standardization approaches [18],     RDF and OWL, which make them possible to be exchanged and
and therefore is easy to be maintained.                                processed by other applications.
                                                                       The mediated query uses extension functions to transform values
6. RELATED WORK                                                        between different contexts, which might be likely to have limited
We can roughly distinguish between two types of data conflicts in      interoperability. It might require transformation between datatype
data integration: schematic conflicts and semantic conflicts. The      formats not supported by the core SPARQL specification, and the
first has been well-documented in the database and Semantic Web        orderings of some application datatype also need to be defined. A
literature. A language, called CQuery was described in [10] to         promising alternative is to use nest query (i.e. to nest the results of
represent the domain knowledge in the form of a vocabulary or          a query within another query) to provide conversion. The current
ontology as semantic metadata, and use the ontology to overcome        SPARQL core standard does not support nested query, but the
heterogeneity among different data sources. Gracia et al. proposed     standard is still evolving (see [8]). In the future, we plan to
a technique to perform the integration of ontology terms in order      consider temporal semantic heterogeneities which refer to the
to cluster the most similar ones in [12]. Networked Graphs was         situation where the semantics between data sources, even in the
proposed as a means for describing RDF graphs that are partially       same data source, change over time. For example, the profit of a
derived from other graphs using a declarative view mechanism.          company might be represented in DEM before 1998 and in EUR
The relationships between graphs are described in a declarative        since 1999. The approach partitions graph patterns of a given
way using SPARQL queries in [13]. Some surveys of schematic            "naive" query into multiple GRAPH clauses, each of which uses a
integration can be found in [14], [15], and [17]. Our approach is      unique set of query variables. This solution need to add FILTER
different because we have chosen to focus on the semantics of          clauses that ensure the equality of values bound to newly
data level as opposed to the conflicts at schematic level. To the      introduced variables. These filters increase the complexity of
best of our knowledge, very little existing work has addressed the     queries and have a negative effect on performance. The focus of
semantic conflicts at data level among RDF data sources.               the paper has been on the query mediation, but an implementation
                                                                       with query optimization is also planned.
8. ACKNOWLEDGMENTS                                                    [14] Noy, N.F. Semantic integration: a survey of ontology-based
The work was supported, in part, by a grant from the National              approaches. SIGMOD Record, 33, 4, (2004), 65-70.
Natural Science Foundation (No. 60903078).                            [15] Shvaiko, P., Euzenat, J. A survey of schema-based matching
                                                                           approaches. Journal of Data Semantics IV, (2005), 146-171.
9. REFERENCES                                                         [16] Glimm, B., Krötzsch, M. SPARQL beyond subgraph
[1] Franklin, M.J., Halevy, A.Y., Maier, D. From databases to              matching. In Proceedings of the International Conference on
    dataspaces: A new abstraction for information management.              Semantic Web (ISWC'10), 2010.
    SIGMOD Record, 34, 4, (2005), 27-33.                              [17] Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H.
[2] Prud'hommeaux, E., Seaborne, A. SPARQL query language                  Schuster, G., Neumann, H., Hübner, S. Ontology-based
    for RDF. W3C Recommendation, 2008. Retrieved April 25,                 integration of information – a survey of existing approaches.
    2011, from http://www.w3.org/TR/2008/REC-rdf-sparql-                   In Proceedings of the International Joint Conferences on
    query-20080115/                                                        Artificial Intelligence Workshop, (IJCAI'01), 2001, 108-117.
[3] Sahoo, S.S., Halb, W., Hellmann, S., Idehen, K., et al. A         [18] Gannon, T., Madnick, S., Moulton, A., Siegel, M., Sabbouh,
    survey of current approaches of mapping of relational                  M., Zhu, H. Framework for the analysis of the adaptability,
    database to RDF, 2009. Retrieved April 25, 2011, from                  extensibility, and scalability of semantic information
    http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_Suve                  integration and context mediation approach. In Proceedings
    yReport.pdf                                                            of the Hawaii International Conference on System Sciences,
[4] Goh, G.H., Bressan, S., Madnick, S., Siegel, M. Context                (HICSS'09), 2009, 1-11.
    Interchange: New Features and Formalisms for the                  [19] Mihai, L., Madnick, S. Using Semantic Web tools for
    Intelligent Integration of Information. ACM Transactions on            context interchange. In Proceedings of the VLDB Workshop
    Information Systems, 17, 3, (1999), 270-293.                           on Ontologies-Based Techniques for Databases and
[5] Hartig, O., Bizer, C., Freytag, J.-C. Executing SPARQL                 Information System, (ODBIS'07), 2007.
    queries over the Web of linked data. In Proceedings of the        APPENDIX A. NAMESPACE
    International Conference on Semantic Web, (ISWC'09). 2009,        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    293-309.                                                          PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
                                                                      PREFIX coin: <http://coin.mit.edu/ontology#>
[6] Zhu, H. Effective information integration and reutilization       PREFIX ctx: <http://coin.mit.edu/contexts#>
    solutions: to technological deficiency and legal uncertainty.     PREFIX fts: <http://www.example.org/flightschedule#>
    Ph.D. Thesis, Massachusetts Institute of Technology,              PREFIX fn: <http://coin.mit.edu/functions#>
    Cambridge, MA, 2005.                                              APPENDIX B. THE EXAMPLE SCENARIO
[7] Zhu, H., Madnick, S. Scalable interoperability through the        Context Pool:
    use of COIN lightweight ontology. In Proceedings of the           # Default graph: http://coin.mit.edu/contexts
                                                                      ctx:US_dollar coin:hasScale "1"^^xsd:integer .
    VLDB Workshop on Ontologies-Based Techniques for                  ctx:US_dollar coin:hasCurrency "USD" .
    Databases and Information System, (ODBIS'06), 2006, 37-50.        ctx:JP_yen coin:hasScale "1000"^^xsd:integer .
                                                                      ctx:JP_yen coin:hasCurrency "JPY" .
[8] Kjernsmo, K., Passant, A. SPARQL New Features and
                                                                      ctx:US_dateTime coin:hasFormat "US_dateTime" .
    Rationale. W3C Working Draft 2, 2009. Retrieved April 25,         ctx:XSD_dateTime coin:hasFormat "XSD_dateTime" .
    2011, from http://www.w3.org/TR/sparql-features/                  ctx:IATA_airportCode coin:hasEncoding "IATA_airportCode" .
                                                                      ctx:EN_city coin:hasEncoding "EN_city" .
[9] Li, X., Madnick, S., Zhu, H., Fan, Y.S. An approach to
    composing Web services with context heterogeneity. In             Context Mappings:
    Proceedings of the International Conference on Information        # Named graph: http://coin.mit.edu/sources/usairline (Source 1)
    System, (ICIS'09), 2009, 695-702.                                 fts:depDateTime coin:dateTime ctx:US_dateTime .
                                                                      fts:arrDateTime coin:dateTime ctx:US_dateTime .
[10] Sattler, K.U., Geist, I., Schallehn, E. Concept-based querying   fts:depCity coin:city ctx:IATA_airportCode .
     in mediator systems. The VLDB Journal, 14, (2005), 97-111.       fts:arrCity coin:city ctx:IATA_airportCode .
                                                                      fts:price coin:monetaryValue ctx:US_dollar .
[11] Stuckenschmidt, H. Vdovjak, R. Houben, G.J., Broekstra, J.,
     Amerfoort, A.B.V. Index structures and algorithms for            # Named graph: http://coin.mit.edu/sources/japanairline (Source 2)
     querying distributed RDF repositories. In Proceedings of the     fts:depDateTime coin:dateTime ctx:XSD_dateTime .
                                                                      fts:arrDateTime coin:dateTime ctx:XSD_dateTime .
     International Conference on World Wide Web, (WWW'04),            fts:depCity coin:city ctx:EN_city .
     2004, 631-639.                                                   fts:arrCity coin:city ctx:EN_city .
[12] Gracia, J., d'Aquin, M., Mena, E. Large scale integration of     fts:price coin:monetaryValue ctx:JP_yen .
     senses for the Semantic Web. In Proceedings of the               # Named graph: http://coin.mit.edu/receivers/myContext (Receiver)
     International Conference on World Wide Web, (WWW'09),            :flight coin:dateTime ctx:US_dateTime .
     2009, 611-620.                                                   :flight coin:city ctx:EN_city .
                                                                      :flight coin:monetaryValue ctx:US_dollar .
[13] Schenk, S., Staab, S. Networked graphs: a declarative
     mechanism for SPARQL rules, SPARQL views and RDF                 # Named graph: http://coin.mit.edu/bindings/flight (Context binding)
     data integration on the Web. In Proceedings of the               <http://usairline.com/flights> coin:hasContext
     International Conference on World Wide Web, (WWW'08),               <http://coin.mit.edu/sources/usairline> .
     2008, 585-594.                                                   <http://japanairline.com/flights> coin:hasContext
                                                                        <http://coin.mit.edu/sources/japanairline> .
                                                                      coin:receiver coin:hasContext <http://coin.mit.edu/receivers/myContext> .