<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Supporting Identity Reasoning in SPARQL Using Bloom Filters</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gregory Todd Williams</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Rensselaer Polytechnic Institute</institution>
          ,
          <addr-line>Troy, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The SPARQL Query Language presents a standardized interface to accessing RDF graphs. Its use allows templated queries to retrieve information from remote sources about known entities. However, such templated queries present a key problem: Queries cannot be made directly about nodes whose identity depends on reasoning. We present an e cient approach to solving this problem using a SPARQL extension function and the SPARQL XML Results format. We then discuss the use of this technique in federated query execution.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The SPARQL Query Language[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] presents a standardized interface to accessing
RDF graph data. It is particularly useful for querying remote endpoints to
retrieve data about known entities. However, there are limits to how SPARQL may
be used in this way: unlike entities identi ed by URI, queries cannot be made
about speci c blank nodes. Unfortunately blank nodes are often used in
situations where they are uniquely identi ed by their properties. Such properties can
be de ned with the Web Ontology Language[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (OWL) as being Functional or
Inverse Functional properties. Blank nodes using these properties are endowed
with identity, even if they themselves aren't directly identi able.
      </p>
      <p>
        Making queries about such blank nodes (with identities based on Functional
and Inverse Functional properties) is desirable, even if SPARQL fails to
support them directly. We present a system to allow e cient querying of identi
able blank nodes using Bloom lters[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a probabilistic approach to set
membership testing. A Bloom lter-based SPARQL extension function is used to
determine which nodes satisfy the identity constraints of the query, returning
a result set that closely approximates the desired data. Additional identity
information about each result is returned with the result set, allowing the query
results to be joined with local information to produce the desired information.
      </p>
      <p>The rest of the paper is organized as follows. In section 2 we introduce a
motivating example used throughout the paper, and discuss background
information on formulating SPARQL queries for this example and the Bloom lter.
In section 3 we introduce a Bloom lter SPARQL extension and show how it can
be used to express equivalent queries. In section 4 we look at the space savings
of using this approach in formulating queries (and by implication, the reduction
in query complexity). Section 5 discusses performance considerations relevant to
deployment and optimization of this technique. Finally, in sections 6 and 7 we
look at related work and conclude by proposing future work on optimization and
analysis of this technique as well as studying its application in federated query
environments.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>In this section we will examine the problem of querying a SPARQL endpoint for
information about nodes identi ed by their properties. To do this, we use the
following motivating example: we have a local address book database of known
people we'd like to discover more information about. We would like to make
a single query to a SPARQL endpoint, retrieving certain desired information
(their phone number, for example) for all the people in our address book. Each
person also has one or more identifying properties in the address book database
(e.g. email address or homepage URL). Below we discuss how SPARQL can be
queried for this information without the use of any extensions. We also look at
Bloom lters, whose use will aid in more e ciently expressing and executing
these queries.
2.1</p>
      <sec id="sec-2-1">
        <title>Graph Pattern Matching in SPARQL</title>
        <p>
          One possible solution to this problem is to load the remote data into the local
store (merging it with the local identity data) and use an OWL reasoner before
making the query. Unfortunately, this strategy can only work in situations where
the remote data is accessible as RDF and the size of RDF to be loaded is
reasonable for downloading, importing, reasoning and querying. In many common
cases these requirements cannot be guaranteed; the data may be too large for
downloading and local querying to be practical (such as DBpedia and DBLP
from the Linked Open Data project[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]) or a SPARQL interface may be the only
way to access otherwise private data. In these cases, using SPARQL is the only
practical way to access the desired information.
        </p>
        <p>In order to write a SPARQL query that will return the phone numbers of
all the known people, the query must encode the identifying properties of each
person. If we were only concerned with a single identifying property, such a query
might look like this:</p>
        <sec id="sec-2-1-1">
          <title>PREFIX foaf: &lt;http://xmlns.com/foaf/0.1/&gt;</title>
          <p>SELECT ?email ?phone
WHERE {
?p foaf:mbox ?email ; foaf:phone ?phone .</p>
          <p>FILTER(
?email = &lt;mailto:alice@example.org&gt; ||
?email = &lt;mailto:eve@example.net&gt;
) .</p>
          <p>Here we match all people with email addresses and telephone numbers, and
use a lter to keep only those people who have an email address we know of.
It is important to note that we must retrieve ?email, the email address used
to identify the person, so that when we process the results we can match each
telephone number with the appropriate person. The situation becomes more
complex when we consider situations with more than one identifying property.
For multiple properties, we must construct a pattern for each property and the
associated values of interest, and take the union of all such patterns before going
on to match for a telephone number:</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>PREFIX foaf: &lt;http://xmlns.com/foaf/0.1/&gt;</title>
          <p>SELECT ?email ?homepage ?phone
WHERE {
{
?p foaf:mbox ?email .</p>
          <p>FILTER(
?email = &lt;mailto:alice@example.org&gt; ||
?email = &lt;mailto:eve@example.net&gt;
) .
} UNION {
?p foaf:homepage ?homepage .</p>
          <p>FILTER(</p>
          <p>?homepage = &lt;http://example.com/&gt;</p>
          <p>Since node identity isn't constrained to a single property-value pair, the
number of potential patterns in the union can grow with the number of possible
property chains (where the value of the rst property-value pair is itself identi ed by
another property-value pair).</p>
          <p>This approach will work, but the query size scales with O(nml), the number
of local nodes (n) multiplied by the number of expected identifying properties
per node (m) multiplied by the expected length of each property chain (l) with a
constant multiplier of the expected size of representing the properties and values
in each node's property chain in the SPARQL syntax (in this example, roughly
30 bytes are used for each equality test and single property-value pair).</p>
          <p>As these examples indicate, the query size can grow quite large with just
modest numbers of nodes and predicates. If possible, we'd like to express the
query in a more compact form that decreases total bandwidth (both query and
result size).
2.2</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Bloom Filters</title>
        <p>
          Bloom Filters[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] provide a probabilistic method for testing set membership. A
Bloom lter consists of a bit vector of length N , and uses a set of t hashings
of an input to compute t bit addresses in the vector. The bits of the vector
are initially set to all zero. For each member of the set, the bits of the vector
referenced by the t hashings of the input are all set to one. Testing an input i for
membership involves computing t bit addresses using the hashing functions on
i, and checking each of the t referenced bits of the vector. If any of the t bits is
set to zero, i is not part of the set. If all t bits are set to one, then i is accepted
as belonging to the set. Notice that testing any valid member of the set will
always involve t set bits yielding a positive result, while testing a non-member
may involve t bits coincidentally set to one, resulting in a false positive.
        </p>
        <p>
          The use of hashing techniques such as Bloom lters have been used in
relational database systems as an e cient way to compute semijoins[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This
usage informs our approach of using them in SPARQL, a usage that may be seen
as a semijoin over data whose equality may be tested based not just on node
value (as in relational systems) but also on a node's relationships (the value of
Functional and Inverse Functional properties).
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>
        Here we present an extension function to SPARQL for testing a node's
membership in a set based on a pre-computed Bloom lter. We have implemented this
function in RDF::Query[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], a SPARQL implementation for Perl. The function,
&lt;http://kasei.us/code/rdf-query/functions/bloom&gt; (abbreviated k:bloom),
is invoked as k:bloom( VAR, FILTER ), taking a variable VAR and a serialized
Bloom lter FILTER as arguments. The function returns true if the value bound
to the variable is accepted as a member of the set, false otherwise. It may be
used in place of the union-and- lter pattern described above with similar results
(but with potential false positives).
      </p>
      <p>PREFIX foaf: &lt;http://xmlns.com/foaf/0.1/&gt;
PREFIX k: &lt;http://kasei.us/code/rdf-query/functions/&gt;
SELECT ?p ?phone
WHERE {
?p foaf:phone ?phone .</p>
      <p>FILTER k:bloom(
?p,
"AAAACgAAACIAAAACAAAAAgAAAAUAAAAD2aCcopHv9zsoAAQgADAuMg==\n"
) .
}</p>
      <p>In Figure 1 is an example of a SPARQL query using the Bloom lter
extension function. Encoded in the Bloom lter are two nodes with the identifying
properties:
{ foaf:mbox &lt;mailto:willig4@rpi.edu&gt;
{ foaf:isPrimaryTopicOf
&lt;http://dblp.l3s.de/d2r/resource/authors/Gregory Williams&gt;
This query will return bindings for ?p and ?phone for all nodes ?p that are
identi ed with these speci c foaf:mbox and foaf:isPrimaryTopicOf values
(and might also, with xed low probability, return false positives).</p>
      <p>
        To encode all the information required for identity reasoning about each
node, we must add each identifying property chain to the Bloom lter. To add
each chain to the Bloom lter, we rst encode each chain as a string based on
the N3[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] syntax for paths. We refer to these identifying property chain strings
as \names" for a given node. The name of a node is computed recursively as
follows:
name(n) = \&lt;" n \&gt;" if n is an IRI
name(n) = \00" n \00" if n is a Literal
name(n) = \=" name(o) 8o s.t. f ?n owl:sameAs ?o g
name(n) = \!" p name(o) 8p; o s.t.
      </p>
      <p>f ?n ?p ?o . ?p a owl:InverseFunctionalProperty g
name(n) = \^" p name(o) 8p; o s.t.</p>
      <p>f ?n ?p ?o . ?p a owl:FunctionalProperty g</p>
      <p>Thus, the two identifying properties above are represented with the names:
{ \!foaf:mbox &lt;mailto:willig4@rpi.edu&gt;"
{ \^foaf:isPrimaryTopicOf
&lt;http://dblp.l3s.de/d2r/resource/authors/Gregory Williams&gt;".</p>
      <p>The error rate of a Bloom lter, or the probability that all t bits are set to
one, is Pe = (1 (1 N1 )tc)t with c being the number of items added to the lter.
It can be seen that the error rate may be made arbitrarily small by increasing
the size of the vector N . Using a xed hashing count of t = 7 and error rate of
Pe = 1010 , for example, we see that the lter size grows by approximately 10 bits
for each additional item expected in the lter. This space requirement is clearly
better than the requirements of the union-and- lter approach mentioned in the
previous section which depends on the length of the node identi ers and is on
the order of at least tens of bytes per item.</p>
      <p>Compared to the union-and- lter approach, the use of the k:bloom function
doesn't leave us with a query that will return enough information to know which
telephone number belongs to each person. This is because the query only returns
values for ?p and ?phone, not any of the identifying property values such as email
address. For this reason, the use of the k:bloom function triggers an extension
to the query result generation code, allowing the required information to be
returned along with the binding results.</p>
      <p>During query execution, the implementation of the k:bloom function
retrieves all the names for each candidate binding of VAR and tests them against
FILTER. If any name matches, the node is a member of the requested set with
high probability. Each such matching node name is added to a list of \identity
hints" that will be returned with the query results. Once results are returned to
the client, the hints will allow the client to determine which local nodes share
identity with each value of VAR.</p>
      <p>
        The SPARQL XML Results Format allows a link element \with an href
attribute containing a relative URI that provides a link to some additional
metadata about the query results"[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This link element is used to reference the list
of \identity hints" generated during query execution. Two approaches to the use
of this element are possible: either the identity information may be stored on
the server with a persistent URL that the client may retrieve prior to joining
the results with local data, or the identity information may be encoded using
the data: URI scheme, allowing it to be included directly in the XML result
data. Our implementation chooses to rely on data: URIs, but the next section
discusses tradeo s between the two approaches.
&lt;?xml version="1.0"?&gt;
&lt;sparql xmlns="http://www.w3.org/2005/sparql-results#"&gt;
&lt;head&gt;
&lt;variable name="p"/&gt;
&lt;variable name="phone"/&gt;
&lt;link href="data:text/xml, (identity hints) " /&gt;
&lt;/head&gt;
&lt;results&gt;
&lt;result&gt;
&lt;binding name="p"&gt;&lt;bnode&gt;r1&lt;/bnode&gt;&lt;/binding&gt;
&lt;binding name="phone"&gt;&lt;uri&gt;tel:+1-518-276-4431&lt;/uri&gt;&lt;/binding&gt;
&lt;/result&gt;
&lt;/results&gt;
&lt;/sparql&gt;
      </p>
      <p>Figure 2 shows an example of SPARQL XML results including a data: URL
link element. For brevity, the actual identity hints content has been left out of
the data: URL; the identity hints in this case might map:
(r1)
dictating that the blank node r1 referenced in the results could be joined with
any local node identi ed by the email address &lt;mailto:willig4@rpi.edu&gt;.</p>
      <p>As is the case with Bloom lter use in relational databases, it is possible that
due to the error rate of the lter, no local node will share identity with a value
of VAR. In this case, there will be extra results and identity mappings. However,
for each such extra result, the names of the node bound to VAR (provided by the
identity map) will not correspond to any local node names, and a join of the
result and local data will simply discarded the extra results.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Although we are currently working on a more thorough analysis of the use of
Bloom lters in SPARQL queries (discussed in section 7), we present a brief set
of results looking at the reduction in query size using Bloom lters in queries
compared to equivalent union-and- lter queries.</p>
      <p>We used the ESWC 2007 RDF data1 as a basis for constructing queries. From
this data, we consider three sets of resources which we use as the local data for
Bloom lter construction: the set of all resources, the set of all foaf:Persons,
and a small subset of people (chosen by matching an arbitrary substring against
foaf:firstName values). We refer to these sets as large, medium, and small,
respectively.</p>
      <p>For each set, we constructed a query to retrieve a speci c property value for
each of the local resources using both the full union-and- lter-style query and a
Bloom lter. Table 1 shows, for each set, the total number of local resources, the
total number of identi ers (node \names"), and the corresponding query sizes
(in bytes). We show the size for two di erent Bloom lter queries using di erent
xed error rates of 1% and 10%. In general, the choice of error rate (and thus
lter size) will a ected by the size of the SPARQL endpoint's database. If the
endpoint's database is small, a larger error rate is acceptable as few false positives
will be returned, while a large database will necessitate a smaller error rate to
keep false positives small.</p>
      <p>As Table 1 shows, using a Bloom lter with a xed error rate of 1% compared
to an equivalent union- lter query, the query sizes shrink by between 88% and
97% for the small and large datasets, respectively. Whereas a union-and- lter
query size of over 100 kilobytes for the large dataset is likely unreasonably large
for a query over HTTP (the typical protocol for communicating with SPARQL
endpoints), the equivalent Bloom lter query with 1% error (at 3146 bytes) is
1 http://data.semanticweb.org/dumps/conferences/eswc-2007-complete.rdf
much more reasonable, even within the limit of most webservers for encoding
the query in the request URL.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Performance Considerations</title>
      <p>There are several performance issues to be considered when implementing and
using a function such as the one described above. Here we discuss some of these
issues and how they were dealt with in our implementation.</p>
      <p>The use of Bloom lters as described here has the potential to reduce the
overall bandwidth requirements of making a query. However, there are cases
where the bandwidth requirements actually increase through the lter use. In
cases where the selectivity of the lter on the remote data is close to 1, it may
be more e cient to send a small query that selects all values, and simply remove
unwanted data with a local join.</p>
      <p>The use of the Bloom lter approach is predicated on the assumption that a
typical SPARQL endpoint contains much more information than a typical query
is interested in, and does not contain all relevant information about a node. In
these cases, and with the ability to place an upper bound on the irrelevant data
returned from a query due to Bloom lter error, we note that the expected total
bandwidth required by the use of the k:bloom function is less than the previously
described union-and- lter pattern resulting in an overall saving. Whereas the
union-and- lter approach must send all names of local nodes in the (verbose)
SPARQL syntax, by using k:bloom the same names can be sent in a single
Bloom lter, and only those names that match remote data will be included in
the result. Since the size of a lter given a xed error rate is on the order of tens
of bits per name (regardless of the size of the name), the space savings can be
signi cant.</p>
      <p>Beyond the space savings to be had through the use of Bloom lters, our
intuition based on work in the relational database literature is that this approach
may also yield performance gains based on query execution. We discuss this
brie y below as potential future work.</p>
      <p>Our implementation currently uses data: URIs for including the identity
information in the query result. In choosing between using data: URIs and
persistent URLs for storing the identity information, there is a tradeo between
code simplicity and query e ciency. Using data: URIs has the advantage of
requiring no persistent storage for identity information, simplifying the endpoint
implementation. The identity information is calculated during query execution
(as each result is matched), and so isn't complete until the query execution is
complete. Unfortunately, the link element of the result XML must appear in the
document header before the binding data, requiring the entire result set to be
held in memory until the link element is output. This might require a signi cant
amount of memory at a performance cost to the server. Using a persistent URL
for the identity information alleviates this memory requirement, allowing the
result data to be streamed to the client but also requiring code to manage the
data persistence and insure that persistent identity information isn't accessed
before all data is written (a possibility if the client requests the data concurrently
as soon as the link element has been parsed).</p>
      <p>Care must be taken in computing node names. Cycles in the property chains
must not cause the computation to enter an in nite loop. Such situations may
simply be ignored if no URI or literal value is included in the chain since such
chains could never be used to identify a node in SPARQL. Even in cases where
there are URI or literal values in a property chain, or simply in cases where a
chain is particularly long, it may be desirable to declare a threshold length
beyond which computation of names simply stops. This may result in queries that
return in only partial results, a situation whose acceptability must be weighed
against the computational demands and the speci cs of the particular
application. In RDF::Query, we do not de ne a cuto threshold, favoring query
completeness over e ciency.</p>
      <p>To alleviate the issue of determining a cuto threshold, notice that the names
of nodes can be pre-computed, leaving only the actual hashing of the names
(which depends on the query-speci c Bloom lter) for execution at runtime.
In cases where an endpoint's data is fairly static with infrequent updates,
precomputing the names of nodes could dramatically reduce the per-query cost of
the k:bloom function. The cost of pre-computing the names would be amortized
over all executed queries. Conversely, if the endpoint updates its data frequently,
the number of total node names exceeded the practical ability to compute or store
them, or the expected number of node names required by queries is much smaller
than the total number of node names, it may be more desirable to compute node
names at query execution time. Again, in RDF::Query we favor simplicity over
e ciency and so compute names during query execution.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Related Work</title>
      <p>
        Hashing has been used in relational databases to e ciently compute joins[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Mackert and Lohman[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] discuss the use of Bloom lters in distributed queries
and nd that the Bloomjoin outperforms many other distributed join methods
except in cases where the involved semijoin has a selectivity very close to 1.
Mullin[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] extends the Bloom lter approach by sending a sequence of Bloom
lters until sending back all remaining tuples is more e cient than continuing
to lter them.
      </p>
      <p>
        The ARQ SPARQL engine[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] includes a SPARQL extension to allow basic
federated queries[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The extension can be used to make similar queries to those
discussed in this paper, but is implemented using a nave nested loop join and
addresses neither the issue of identity reasoning nor of reducing the number of
queries sent to the remote endpoint.
      </p>
      <p>
        SPARQLfed[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is an extension to SPARQL allowing intermediate result sets
to be included alongside a query with a BINDINGS keyword, constraining the
possible values of speci c variables. The motivation of expressing variable binding
constraints is shared between SPARQLfed and our work. Use of BINDINGS could
be used to replace ltering in the union-and- lter pattern, but would still yield
very large queries and requires changes to the SPARQL grammar (a more
complex change to a SPARQL engine than our extension function).
      </p>
      <p>
        DARQ[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], an extension to ARQ, provides a more complete system for
federated queries, including methods to de ne service descriptions including remote
database size and the expected selectivity of triple patterns. Such values could
be used to determine when Bloom lter use is appropriate, and to optimize
the construction of Bloom lters. However, DARQ, like ARQ, fails to address
identity reasoning in the evaluation of federated queries.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Future Work and Conclusion</title>
      <p>
        The Bloom lter function introduced here could be used in federated queries
similar to their use in relational database systems. RDF::Query implements the
SERVICE extension (introduced in ARQ), and we modi ed the query engine to
automatically use the Bloom lter function when making remote query calls. The
modi ed implementation works as expected, interacting with both standard and
k:bloom-enabled endpoints. Based on work with relational databases and Bloom
lters such as [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we believe their use may yield performance gains in federated
query evaluation compared to the union-and- lter approach.
      </p>
      <p>We are currently evaluating the performance of our system, and investigating
how di erent optimization techniques a ect query execution time. In particular,
we hope to look more closely at the impact of traditional indexes on the
unionand- lter approach compared to the impact of pre-computation on the Bloom
lter approach, and to develop techniques for determining which will perform
better for a speci c query and dataset.</p>
      <p>The federated query implementation in RDF::Query also has several
restrictions we would like to remove. Currently, the use of Bloom lters in SERVICE
calls rely on the join computing node names using local data. This requires all
information used to compute the names of nodes used in a join (and so
appearing as an argument to k:bloom) to be loaded in the local graph. This restriction
doesn't preclude the join of two SERVICE calls, but the identity of nodes shared
between such calls is always computed locally and so can't rely on remote
knowledge of identity. Further work is needed to investigate how a query engine might
choreograph the movement of result and identity data between remote endpoints
to allow all participating endpoints to contribute to node identity computation.</p>
      <p>The use of k:bloom works on the assumption that both the local client
and the remote endpoint share an understanding of what properties are de ned
as Functional and Inverse Functional. We currently assume both sides have
relevant ontologies loaded so that such properties can be used in computing
node names. Future work in this area might allow the query to specify which
ontologies, predicates, or even speci c property chains should be considered in
computing names. This information could be passed as additional arguments to
the k:bloom function, but this runs the risk that the remote endpoint might need
to load additional ontologies at runtime. Further study is needed to determine if
such a feature is needed in practice, and what level of granularity is appropriate
for restricting the predicates used.</p>
      <p>We have proposed the use of Bloom lters in SPARQL, allowing compact
queries to be made about known entities. We have shown how this method can
support joining query results with local data not just with direct node equality,
but also with simple identity reasoning. The methods proposed in this paper
have been implemented in RDF::Query and are available with that package from
http://search.cpan.org/dist/RDF-Query/. We found the method to be useful in
both simple and federated query evaluation, and are currently working on a more
thorough analysis of theoretical and practical impacts of the method on query
evaluation.</p>
      <sec id="sec-7-1">
        <title>Acknowledgements</title>
        <p>We wish to thank Sibel Adal for her helpful comments on this work.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>ARQ - Basic Federated</surname>
          </string-name>
          SPARQL Query. http://jena.sourceforge.net/ARQ/service.html.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. DARQ -
          <article-title>Federated Queries with SPARQL</article-title>
          . http://darq.sourceforge.net/,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>3. ARQ - A SPARQL Processor for Jena</article-title>
          . http://jena.sourceforge.net/ARQ/,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Dave</given-names>
            <surname>Beckett. SPARQL Variable</surname>
          </string-name>
          <article-title>Binding Results XML Format</article-title>
          . http://www.w3.org/TR/rdf-sparql-XMLres/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Tim</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Notation 3</article-title>
          . http://www.w3.org/DesignIssues/Notation3,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Chris</surname>
            <given-names>Bizer</given-names>
          </string-name>
          , Tom Heath,
          <string-name>
            <given-names>Danny</given-names>
            <surname>Ayers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yves</given-names>
            <surname>Raimond</surname>
          </string-name>
          .
          <source>Interlinking Open Data on the Web. Demonstrations Track, 4th European Semantic Web Conference (ESWC2007)</source>
          , Innsbruck, Austria.,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Burton</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Bloom</surname>
          </string-name>
          .
          <article-title>Space/time trade-o s in hash coding with allowable errors</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>13</volume>
          (
          <issue>7</issue>
          ):
          <volume>422</volume>
          {
          <fpage>426</fpage>
          ,
          <year>1970</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Kjell</given-names>
            <surname>Bratbergsengen</surname>
          </string-name>
          .
          <article-title>Hashing Methods and Relational Algebra Operations</article-title>
          .
          <source>In VLDB '84: Proceedings of the 10th International Conference on Very Large Data Bases</source>
          , pages
          <volume>323</volume>
          {
          <fpage>333</fpage>
          , San Francisco, CA, USA,
          <year>1984</year>
          . Morgan Kaufmann Publishers Inc.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Mike</given-names>
            <surname>Dean</surname>
          </string-name>
          and
          <string-name>
            <given-names>Guus</given-names>
            <surname>Schreiber</surname>
          </string-name>
          .
          <article-title>OWL Web Ontology Language Reference</article-title>
          . http://www.w3.org/TR/owl-ref/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lothar</surname>
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Mackert</surname>
          </string-name>
          and
          <string-name>
            <surname>Guy M. Lohman</surname>
          </string-name>
          . R*
          <article-title>Optimizer Validation and Performance Evaluation for Distributed Queries</article-title>
          .
          <source>In VLDB '86: Proceedings of the 12th International Conference on Very Large Data Bases</source>
          , pages
          <volume>149</volume>
          {
          <fpage>159</fpage>
          , San Francisco, CA, USA,
          <year>1986</year>
          . Morgan Kaufmann Publishers Inc.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>James</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Mullin</surname>
          </string-name>
          .
          <article-title>Optimal Semijoins fo Distributed Database Systems</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          ,
          <volume>16</volume>
          (
          <issue>5</issue>
          ):
          <volume>558</volume>
          {
          <fpage>560</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Eric</surname>
          </string-name>
          <article-title>Prud'hommeaux. Federated SPARQL</article-title>
          . http://www.w3.org/
          <year>2007</year>
          /05/SPARQLfed/,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Eric</surname>
          </string-name>
          <article-title>Prud'hommeaux and Andy Seaborne. SPARQL Query Language for RDF</article-title>
          . http://www.w3.org/TR/rdf-sparql-query/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Gregory Todd</surname>
          </string-name>
          <article-title>Williams</article-title>
          . RDF::Query. http://search.cpan.org/dist/RDF-Query/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>