<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Cost Model for Querying Distributed RDF-Repositories with SPARQL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Philipp Obermeier</string-name>
          <email>obermeie@inf.fu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lyndon Nixon</string-name>
          <email>nixon@inf.fu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Freie Universität Berlin, Institut für Informatik, AG Netzbasierte Informationssysteme</institution>
          ,
          <addr-line>Königin-Luise-Straße 24-26, D-14195 Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the last years, the query language SPARQL has evolved into the widely accepted standard for querying RDF. Since many Semantic Web applications make use of data whose storage and management is distributed, distributed SPARQL query processing becomes necessary. In the relation and object-oriented database community the efficiency gain by cost-based, adaptive optimizers for distributed querying is proven, though such optimizers are not available for SPARQL. Thus we describe in this paper a cost model which is meant to act as a sub component of a query optimizer for distributed SPARQL query processing to serve as a cost indicator for other subcomponents of the optimizer, e.g. query decomposition, query rewriting and choosing join algorithms and their order. The cost model is tailored for a heterogeneous grid of SPARQL processors and represents query plans as SPARQL Query Graph Models (SQGM). Costs are assigned in an System-R-like fashion relying on recursive cost and cardinality functions. Therefore evaluation complexities of basic operations in SPARQL queries are derived from the complexities of best practice algorithms for the algebraically equivalent basic operations in relational query languages.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In January 2008 SPARQL was officially approved by a W3C Recommendation
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] – a status which no other RDF query language has achieved. Parallel to
this development a respectable number of SPARQL query engines have been
developed, e.g. Jena ARQ, the Sesame SPARQL plugin or OWLIM as part of
ORDI.
      </p>
      <p>
        Since many Semantic Web applications eo ipso manage and store data in
distributed places, distributed SPARQL query processing becomes necessary.
Development of distributed query engines is only in a premature state since
none of the widely accepted query engines, including the ones mentioned above,
are designed for distributed query processing at all. As we will point out in
section 2 cost-based, adaptive optimization techniques are well established for
querying distributed relational or object-oriented databases. Since SPARQL is
based on the relational algebra of Relational Database Management Systems
(RDBMS), we believe it is possible to map these techniques to SPARQL. For
this reason we present here a cost model for the evaluation of SPARQL queries
tailored to act as part of an optimizer for adaptive, distributed query processing
[
        <xref ref-type="bibr" rid="ref11 ref4">4,11</xref>
        ] supporting other subcomponents like query decomposition, query
rewriting and join order selection for cost-based decision-making. Our cost model is
similar to the ones for System-R suggested in [
        <xref ref-type="bibr" rid="ref12 ref17">17,12</xref>
        ] in that it estimates
physical costs for each individual operation of a query and then totals these costs.
Furthermore costs are distinguished by the type of used resources, i.e. number of
CPU instructions, number of I/O operations (page/read accesses on persistent
memory) and NET delay for retrieval of remote data. From this, a recursive cost
function is developed for query execution based on physical cost functions and
cardinality functions for basic operations in SPARQL Abstract Queries [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], a
declarative representation of SPARQL queries (see section 3.1). Furthermore we
presume a SPARQL query plan is given as a SPARQL Query Graph Model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
which is a graph model for queries abstracting to the semantic of the
corresponding SPARQL Abstract Query (see section 3.2). For the choice of complexities
of basic SPARQL operations we argue that complexities of best-practice
algorithms for related operations in the relational algebra for RDBMS can be used
as orientation.
      </p>
      <p>In Section 2 we address the scientific background of this paper. In Section
3 we introduce the preliminaries necessary for the definition of the cost and
cardinality function in Section 4. There we also deliver the proof for the equality
of the complexities for relational algebra operators and their equivalent SPARQL
basic operations. In Section 5 we demonstrate the usage of the cost model as
part of a query optimizer. We conclude with a discussion of open problems and
future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Some breakthroughs in the analysis of the expressiveness and classification of
SPARQL also with respect to other query languages has been achieved by
Perez et al.[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In their work they determined the complexity of certain types
of SPARQL expression and developed an algebra for SPARQL resp. SPARQL
graph pattern expressions which was a significant contribution to the theoretical
algebraic model SPARQL Abstract Queries of SPARQL queries in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
Furthermore except for minor exceptions the possibility of a straightforward
reduction of SPARQL to a SQL-resembling relational algebra has been shown in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
which is the theoretical core of several SPARQL-to-SQL transforming
SPARQLDBMS like D2RQ or SquirrelRDF. Additional it can be attributed to the
similarity between SPARQL and SQL algebras that a System-R-like graph model for
SPARQL query plans together with optimization techniques query plans could
be introduced [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Research in distributed SPARQL query processing started recently and
predominantly leans on optimization techniques for general database query
languages and RDF query languages. Heiner Stuckenschmidt et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] suggest
for querying distributed RDF-repositories a schema-path-based indexing and
storage organization as well as heuristics for join ordering. The techniques are
implemented as a prototype extension for Sesame. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], RDFPeers, a spanning
of RDF-Repositories over a Peer-to-Peer network is introduced. Execution of
disjunctive and conjunctive queries over the distributed storages are described
though RDFpeers doesn’t support SPARQL as query language. In contrast to
that Andreas Harth et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] propose YARS2, a toolkit for efficiently querying
distributed storages of graph-shaped data, especially RDF data and SPARQL
are supported. A distributed indexing mechanism for quadruples (representing
RDF triples) and distributed SPARQL query processing with join ordering are
fundamental contributions of this work. While lookups for indexes can be
dispatched concurrently in the distributed storage environment, other operations
embodied in a SPARQL query (i.e. J oin, U nion, etc.) are solely executed at one
host in the system. An adaptive cost estimation based on System-R-like
physical cost and cardinality functions is presented in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for conjunctive reasoning
for Deductive Ontology Basis (DOB ), a subset of OWL lite. A translation of
SPARQL queries into DOBs is promised by the authors for the future, but only
partially explained at present.
      </p>
      <p>
        In contrast to SPARQL the opportunities offered by distributed query
processing are vastly explored in the database community. Kossmann describes in
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] a series of query processing optimization concepts in distributed databases
in a textbook style. Pirahesh et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] present rule-based rewriting techniques
for queries which deliver optimization on a logical level independent from the
physical implementation. Query plan cost models which sum up physical costs
for single operators are suggested in [
        <xref ref-type="bibr" rid="ref12 ref17">17,12</xref>
        ]. Such a cost model is also needed as
a subcomponent for an optimizer for distributed SPARQL processing as
indicator for decision making of other components like query rewriting, decomposition
or join and selection ordering. A sophisticated survey of adaptivity for query
processing, especially intra-query adaptivity, was assembled by Deshpande et al.
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Adaptivity means the query processing is interleaved with adjustments by
the query optimizer, which significantly improves the performance of the query
evaluation
      </p>
      <p>A System-R-like cost model and the cost-based, adaptive techniques
addressed in the last paragraph are not available yet for a distributed SPARQL
processing context. Thus our proposed cost model as part of a still to be
developed optimizer for adaptive, distributed SPARQL query processing is a first
step to change this circumstance.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Preliminaries</title>
      <sec id="sec-3-1">
        <title>Formalization and Semantics of SPARQL Queries</title>
        <p>
          3.1.1 SPARQL Abstract Queries For the study of SPARQL semantics,
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] provides a declarative formalization for a SPARQL query, called SPARQL
Abstract Query. A SPARQL Abstract Query is a triple that consists of a Query
Form, a SPARQL Algebra Expression and a RDF Dataset.
        </p>
        <p>The set of Query Forms, denoted as QF , comprises four functions fSelect ;
Ask ; Construct ; Describeg which take a solution mapping multi-set (see
Definition 1) as input, Construct additional uses a set of tripe pattern templates.
Select returns a set of variable bindings, Ask a boolean, Construct and Describe
a RDF graph.</p>
        <p>The SPARQL Algebra, whose terms are called SPARQL Algebra
Expressions, is an algebra representing graph patterns and solution modifiers in queries.
It provides a set of unary or binary functions fJoin; Union; Diff ; LeftJoin; :::g
and a set BGP of constants, which are called Basic Graph Patterns (BGP). Both
function and constants return a solution mapping multi-set as result while taking
solution mapping multi-sets and logical boolean constraints for variable bindings
as input. In the following we refer to the set of these functions as SF , and we
call SAO = fBGPg[SF the set of SPARQL Algebra Operations (SAO). The
elements of the subset SM ½ SF , where SM = fDistinct ; OrderBy ; Limit ; OffSet g,
are called Solution Modifiers (SM). Overall we refer to QF [ SAO as the set of
SPARQL basic operations.</p>
        <p>
          A SPARQL Abstract Term is either a SPARQL Abstract Query or a SPARQL
Algebra Expressions. Intuitively a SPARQL Abstract Term describes a “sub
term” of a SPARQL Abstract Query.
3.1.2 Solution Mapping Multi-Sets From a technical viewpoint solution
mapping multi-sets are unordered lists of solution mappings which may contain
duplicates. For our following discussion we adopt the mathematical definition
from [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]:
Definition 1. A solution mapping multi-set is a tuple (S; card) where S is a
set of solution mappings and card is a function which annotates every x 2 S
with a positive integer.
        </p>
        <p>The cardinality of (S; card), denoted as j(S; card)j, is the summation of
card(x) for all x 2 S.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>SPARQL Query Graph Model</title>
        <p>
          The SPARQL Query Graph Model (SQGM) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is a graph model for SPARQL
queries resp. SPARQL Abstract Queries adopted from the Query Graph Models
for SQL [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. A SQGM is a planar rooted directed graph consisting of operator
as nodes and dataflows as edges. Each operator represents an occurrence of an
SPARQL algebra operation in the corresponding SPARQL Abstract Query while
each data flow connects an operator A to an operator B, iff B uses A’s result
as input in the corresponding SPARQL Abstract Query. In this constellation A
is called providing operator and B consuming operator. Furthermore operators
are divided into types each of them having a one-to-one correspondence to a
SPARQL basic operation, except for BGPs in combination with Filters and also
for Solution Modifiers. For the first constellation SQGMs use an operator type,
the set of Graph Pattern Operators (GPO), each corresponding to a BGP with
an optional Filter operation. This set is referred to as GPO in the following. For
Solution Modifiers SQGMs provide a general operator type, consisting of the
Solution Modifier Operators which represent concatenations of arbitrary Solution
Modifiers. Additionally we give the formal definition of a SQGM.
Definition 2. A SPARQL Query Graph Model (SQGM) represents a SPARQL
query. It is a tuple (OP; DF; r; df lt; N G) where
– OP denotes the set of all operators necessary to model the query,
– DF denotes the set of all dataflows necessary to model the query,
– r 2 OP is an operator responsible for generating the result of the query (i.e.
        </p>
        <p>r represents a Query Form),
– df lt 2 OP is an operator providing the default RDF graph of the queried</p>
        <p>
          RDF dataset,
– N G ½ OP is the set of graph operators that provide named graphs.
For particular details on SQGMs we refer to [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          In the following we will base our cost estimation for SPARQL queries and
query plans on their corresponding SQGMs. While as shown above queries can
be translated directly into SQGMs query plans contain in general additional
physical information for each operator. Thus we consider SQGM as abstraction
of query plans in the sense that all physical information has been masked out.
3.2.1 Constraints for BGPs resp. Graph Pattern Operators Actually
there are two different SPARQL basic operations which can describe joins
between solution mapping multi-sets: The J oin operator and BGPs containing
more than one triple pattern. By limiting joins between solution mapping
multisets to the actual J oin operation of the SPARQL Algebra we unify these to
one form to ease the definition of cost and cardinality functions. Motivated by
this consideration we restrict BGPs (resp. GPOs) to the biggest subset of BGP
(resp. GPO), in which each BGP (resp. GPO) only contains exactly one single
triple pattern. We refer to this subset as BGPT P (resp. GPOT P ). As shown in
the extension to [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] it is semantically equivalent to simulate BGPs embodying
an arbitrary set of triple pattern by defining a BGP for each triple pattern and
after that to join these.
3.2.2 Constrains for Solution Modifier Operators As mentioned above
a Solution Modifier Operator can express a application of several Solution
Modifiers. In the following we only accept Solution Modifier Operators which
represent exactly one Solution Modifier. A sequence of Solution Modifier appliances
m1; : : : ; mn in a SPARQL query can also be represented in a SQGM as a path
of Solution Modifiers M1 ! : : : ! Mn where Mi solely represents mi. This
limitation is motivated by the need of a one-to-one correspondence for the sake
of exchangeability of cost and cardinality functions for corresponding SPARQL
basic operations and SQGMs operator types. The exchangeability of functions
significantly assists the comprehensibility of the cost computations based on
SQGMs.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Cost and Cardinality Functions</title>
      <p>In this section we will describe the structure of the cost model for SQGMs
and provide a justification to base our cost estimation on the complexities of
best-practice-algorithms for relational algebra operations related to the SPARQL
basic operations.</p>
      <p>Overall the cost model for SQGMs banks on a recursive cost function relying
in turn on a recursive cardinality function. Furthermore both cost and cardinality
function themselves are based on functions estimating physical cost and solution
cardinalities for each SPARQL basic operation.</p>
      <p>At first we introduce a method to assign cost and cardinality functions to each
SPARQL basic operation also depending on the used implementation algorithms.
After that we show how we can recursively compute costs for SQGMs based on
our earlier results. Finally we justify that we can reduce almost all SPARQL
basic operations to corresponding relational algebra operations in linear time to
the input cardinalities.
4.1</p>
      <sec id="sec-4-1">
        <title>Cost and Cardinality Functions for SPARQL Basic Operations</title>
        <p>For each SPARQL basic operation we assign a cardinality, CPU- and IO-cost
function. The cardinality function estimates the cardinality of the operation
result, the CPU-function the necessary number of instructions and the IO-cost
function the necessary number of hard disk (or page) accesses to process the
operation. Each of these functions depends on the cardinalities of the input solution
mapping multi-sets, the used algorithm and real number parameters expressing
physical characteristics of the machine executing the query. To compute these
functions for a triple pattern we use the pattern itself as indication for the
cardinality of its solution mapping multi-set. This is motivated by the fact, that
either we can retrieve the cardinality directly by sending the triple pattern to
the RDF-Repositories or estimate the cardinality by a selectivity method with
the triple pattern as input.</p>
        <p>In Definition 3 we define a summarization of the function assignments above,
denoted as configuration. Hereby T P stands for the set of all RDF Triple
Pattern, RDF G for the set of all RDF-Graphs, ALG for the set of all algorithms
implementing SPARQL basic operations. Moreover a physical parameter setup
is defined as mapping from a set of identifiers to a set of reals. The set of all
physical parameter setups is called PPS. The Cartesian product Rkop is the set
of all possible tuples consisting of the cardinalities of the input solution mapping
multi-sets for an operation op.</p>
        <p>Definition 3. A configuration C is a function assigning each SPARQL basic
operation op a tuple C(op) = (fCP U;op; fIO;op; fcard;op) where
else fÁ;op : (Rk+op £ ALG £ PPS) ! R+ with Á 2 fCP U; IO; cardg.
if op = BGPT P then fÁ;op : (T P £ RDF G £ ALG £ PPS) ! R+ with Á 2
fCP U; IO; cardg,
We call fIO;op resp. fCP U;op the IO resp. CPU cost function for op and fcard;op
the cardinality function for op.</p>
        <p>The IO- and CPU-costs of a SPARQL basic operation can be estimated
accurately by the usage of an elaborated time and space complexity function for the
implementation algorithm, given that the estimation errors for the cardinalities
of the input solution mapping multi-sets and the physical parameters are small.</p>
        <p>
          If no additional information is available, a naive way to compute the
cardinalities of the results for a SPARQL basic operation is the usage of the result
cardinalities provided by the operator definitions in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. For instance, we could
use the possible maximum of the result size. Hence such a procedure, which
solely relies on the input sizes of an operator, causes generally a large
estimation error. If available, statistical values for the result sizes of Triple Patterns,
BGPs or more complex SPARQL Algebra Expressions are a highly recommended
alternative.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Cost and Cardinality Functions for SQGM</title>
        <p>In this section we recursively determine the execution costs and result
cardinalities of a query graph by beginning the recursion at its root moving to the
nodes with in-degree 0. Relevant physical cost factors for the processing of an
SQGM operator like costs for swapping, hashing, computation, etc. are given in
a function, physical parameter assignment, and thus can be respected in the cost
estimation. For the delay by transmission through the network we use a very
simplistic approach by assigning a real number weight for each operator x which
indicates the net distance between the host computing x and the host
computing x’s upstream operator. The actual net costs for x is the product &lt;size of
transmitted data&gt; ¤ &lt;distance weight&gt;. To determine the overall net costs for
a SQGM we add up these products for each operator.</p>
        <p>First of all we emphasize in Definition 4 the exchangeability of cost and
cardinality functions for SPARQL basic operations and SQGM operator types
presuming the restrictions stated in Section 3.2.1 and 3.2.2.</p>
        <p>Definition 4. Given a SQGM (OP; DF; r; df lt; N G), a configuration C and a
SQGM operator type t which represents a SPARQL basic operation op with
C(op) = (fCP U;op; fIO;op; fcard;op). We call fIO;op resp. fCP U;op the IO resp.
CPU cost function for t and fcard;op the cardinality function for t. Analogously
we use for t the references fIO;t = fIO;op, fCP U;t = fCP U;op and fcard;t =
fcard;op.</p>
        <p>Before we can present the cardinality (Definition 5) and cost functions
(Definition 6) for SQGMs, we need to determine three kinds of functions for a given
SQGM G = (OP; DF; r; df lt; N G):</p>
        <p>An algorithm assignment is a function which assigns each operator x 2 OP
an algorithm. This algorithm is a implementation of x according to the type x
belongs to. For instance, if x belongs to the Join Operator type, the algorithm
is a join algorithm (NestedLoop-Join, Hash-Join,. . . ).</p>
        <p>A physical parameter assignment is a function which assigns each operator
x 2 OP a physical parameter setup. This physical parameter setup contains
(approximation of) physical values relevant for the estimation of the processing
costs (e.g. block size, costs for swapping, hashing or value comparison) for the
machine executing x.</p>
        <p>A net-distance weight assignment is a function which assigns each operator
x 2 OP a positive real number. This number is a weight which expresses the
net latency between the machine executing x and the machine executing x’s
upstream operator. The weight is multiplied with the result cardinality of x to
take the net latency to the machine where the upstream operator is processed
into account.</p>
        <p>Finally we define the recursive cardinality and cost functions for SQGMs
(Definition 5 and 6).</p>
        <p>Definition 5. For a SQGM (OP; DF; r; df lt; N G), a configuration C and a node
x 2 OP , possessing the providing operators c1; :::; cn, the recursive function
card : OP ! R+, called cardinality function, is defined as follows:
if x 2 GPOT P then card(x) = fcard;GPOT P (tp; G) where tp is the triple pattern
embodied in x and G is the input graph of x,
else card(x) = fcard;x:type(card(c1); : : : ; card(cn)) where x:type denotes the
operator type of x.</p>
        <p>Definition 6. For a SQGM G = (OP; DF; r; df lt; N G), Á 2 fIO; CP U; N ET g,
a configuration C, an algorithm assignment A, a physical parameter assignment
P, a net-distance weight assignment D and a node x 2 OP the recursive function
costsÁ : OP ! R+, called Á-cost function, is defined as follows:
For Á 2 fIO; CP U g:
if x 2 GPOT P then costsÁ(x) = fÁ;GPOT P (tp; G; A(x); P(x)) where tp is
the triple pattern embodied in x and G is the input graph of x,
else costsÁ(x) = fÁ;x:type(card(c1); : : : ; card(cn); A(x); P(x)) +
+ ∑ costsÁ(ci)</p>
        <p>i2f1;:::;ng
where x:type denotes the operator type, c1; :::; cn the providing operators
of x.</p>
        <p>For Á = N ET :
costNET (x) = ∑</p>
        <p>(D(ci) ¢ card(ci) + costsNET (ci)).</p>
        <p>i2f1;:::;ng
If x is a Select-, Describe-, Construct-, or Ask-Result Operator we refer to
costsÁ(x) also as the Á-cost function for G.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Cost Estimation Oriented on RDBMS Complexities</title>
        <p>
          Based on [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] we will now prove that the choice of cost functions for SPARQL
basic operations oriented on the complexities of best-practice-algorithms for
relational algebra operations is sound. This is motivated by the fact that there
are strong similarities between SPARQL basic operations and the operations
of the relational algebra for RDBMS. To support this claim we will prove that
almost each SPARQL basic operation is reducible to a single operation or a
more complex term of the relation algebra for RDBMS in linear time while the
cardinalities of the input values are preserved.
        </p>
        <p>Actually this reduction only yields an upper bound for SPARQL basic
operations, which might be too pessimistic for certain implementations of operations.
To formally neglect this we have to show that there also exists a reduction in
the other direction. More precisely, each relational algebra operation must be
reducible to a corresponding SPARQL Abstract Term (see Section 3.1), while
the most efficient computation of the reduction function for this operation has
to be in the smallest complexity class for which the evaluation of the SPARQL
Abstract Term is a member of. Alternatively we could empirically prove the
usability of the upper bound gained by the reduction introduced here. This proof,
by mathematical means or benchmark testing, is a future task.</p>
        <p>
          Before we present the reduction in Theorem 1, we want to clarify that we
base the following discussion on the named version of the relational algebra, as
described by [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in chapter 5.1, which contains the operations f¾; ¼; on; ±; [; ¡g.
In addition, we define a database relation with duplicates as follows.
Definition 7. A relation with duplicates is a tuple (r[U ]; cardRA) where r[U ]
is a relation for a set of attributes U and cardRA is a function which annotates
every t 2 r[U ] with a positive integer.
        </p>
        <p>The cardinality of (r[U ]; cardRA), denoted as j(r[U ]; cardRA)jRA, is the
summation of cardRA(t) for all t 2 r[U ].</p>
        <p>Consequently we extend the operations of the relational algebra to be capable
of handling relations with duplicates as operands. For instance, the extended
version of the relational algebra operation Join, called ondup, additional takes
care of the cardRA values of its result tuples such that they resemble the card
values of the SPARQL basic operation Join. We also apply this concept for the
pairings ([,Union),(¡,Diff ), (¼,Select ) and (¾,Filter )
Definition 8. Let A be the set of all relations with duplicates. The functions
ondup, [dup, ¡dup, ¼dup and ¾dup are defined as follows:
Function ondub: A £ A ! A is defined as [dub((r1; cardRA;1); (r2; cardRA;2)) =
(r; cardRA) with r = r1 on r2 and for each t 2 r, t1 2 r1, t2 2 r2, t = t1 [ t2
holds cardRA(t) = cardRA;1(t1) ¢ cardRA;2(t2).</p>
        <p>Function [dub : A £ A ! A is defined as [dub((r1; cardRA;1); (r2; cardRA;2)) =
(r; cardRA) with r = r1 [ r2 and for each t 2 r, t1 2 r1, t2 2 r2, t = t1 = t2
holds cardRA(t) = cardRA;1(t1) + cardRA;2(t2).</p>
        <p>Function ¡dub : A £ A ! A is defined as [dub((r1; cardRA;1); (r2; cardRA;2)) =
(r; cardRA) with r = r1 ¡ r2 and for each t 2 r, t1 2 r1, t2 2 r2, t = t1 = t2
holds cardRA(t) = jcardRA;1(t1) ¡ cardRA;2(t2)j.</p>
        <p>Function ¼dub : Attn£A ! A is defined as ¼dub(a; (r1; cardRA;1)) = (r; cardRA)
with r = ¼(a; r1). For each t 2 r the cardinality cardRA(t) is the summation
of cardRA;1(t1) for all t1 2 r1 with t = t1jt. Here Att is an arbitrary set of
attributes and Attn the n-ary Cartesian Product of Att, t1jt is the tuple t1
restricted to the attributes of t.</p>
        <p>Function ¾dub : Att £ Dom £ A ! A is defined as [dub(a; d; (r1; cardRA;1)) =
(r; cardRA) with r = ¾(a; d; r1) and for each t 2 r, t1 2 r1, t = t1 holds
cardRA(t) = cardRA;1(t1). Here Att is an arbitrary set of attributes and
domains a set of arbitrary domains for a relational database.</p>
        <p>For the reduction in Theorem 1 we define in advance a simple helper
algorithm, called trans, to transform a solution mapping multi-set S = (S; card) to
an equivalent DB-relation with duplicates R = (r[U ]; cardRA): The union of the
variables in the domains of the solutions mappings in S is the set of attributes
U for r. Thus the number of attributes in U is the overall number of
different variables occurring in the the mappings in S. Furthermore each mapping
x 2 S is represented by a tuple tx in r with cardRA(tx) := card(x) where tx
has only values for the attributes which coincide with the domain of x (formally:
attributes of tx who don’t represent a variable in x have a globally defined null
value, which belongs to no domain of the database schema).</p>
        <p>In the following we use the denotations: trans(S) = R, trans(x) := tx,
trans(S) := r, trans(card) := cardRA, trans((S; card)) := (r; cardRA) and
trans-att(S) = U .</p>
        <p>For this algorithm we imply (without explicit proof):
1. The number of attributes in U is the overall number of different variables
occurring in the solution mappings in S
2. x 2 S iff trans(x) 2 trans(S)
3. The algorithm’s time and space complexity lies in O(j(S; card)j).
Theorem 1. Given a function f from SPARQL Abstract Terms to relational
algebra terms with</p>
        <p>f (Join(S1; S2)) =ondup (trans(S1); trans(S2));
f (Union(S1; S2)) = [dub(trans(S1); trans(S2));</p>
        <p>f (Diff (S1; S2)) = ¡dub(trans(S1); trans(S2));
f (LeftJoin(S1; S2)) = [dup(ondup (trans(S1); trans(S2));</p>
        <p>¡dup (trans(S1); trans(S2)));
where S1 and S2 are two multi-sets of solution mappings. Then f is a
reduction of the SPARQL basic operations Join, Union, Diff, LeftJoin to relational
algebra terms, i.e., for Á 2 fJoin; Union; Diff ; LeftJoing the following statements
hold
– x 2 S iff trans(x) 2 r[U ],
– card(x) = cardRA(trans(x)) for each x 2 S,
– trans-att(S) = U ,
where Á(S1; S2) = (S; card), f (Á(trans(S1); trans(S2))) = (r[U ]; cardRA) and
both S1 and S2 are two multi-sets of solution mappings. Moreover, f is
computable linear in time with respect to the sum of cardinalities of S1 and S2, i.e.
card(S1) + card(S2).</p>
        <p>Proof. We show that for the operations Join,Union, Diff and LeftJoin the
concluded statements hold.</p>
        <p>
          A SPARQL Abstract Term Join(S1; S2) = (S; card) with solution
mappings multi-sets S = (S1; f1) and S = (S2; f2) is reduced by f to a
(natural) join with duplicates in the relational algebra ondup (trans(S1); trans(S2)) =
(r[U ]; cardRA). For the generation of r from trans(S1) and trans(S2) operation
ondup (Definition 8) solely relies on on . By the definition of on ([
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] p.58, top) and
transformation algorithm trans it follows that trans-att(S) = U and the fact,
that a solution mapping x is element of S iff there exists a tuple tx in r[U ] ,
which has the same values for its attributes as x for its respectively variables.
Moreover, algorithm trans uses for the cardinality cardRA(r) of a tuple tx 2 r[U ]
the same value as its corresponding solution mapping x (i.e. trans(x) = tx). In
addition, ondup (Definition 8) relies on the same computation formula for
cardinalities of tuples as the SPARQL basic operation Join for solution mappings.
Thus card(x) = cardRA(trans(x)) holds for each x 2 S.
        </p>
        <p>In an analogous way this can be shown for U nion and Diff. LeftJoin is defined
as a SPARQL algebra term which combines Join, Union and Diff. Thus the
reduction for Join, Union and Diff yields also a reduction for LeftJoin.</p>
        <p>Furthermore, the definition of algorithm trans and reduction f implies that
f is computable in linear time with respect to card(S1) + card(S2).</p>
        <p>
          For most of the SPARQL basic operations we can determine the complexity
in the same way as in the proof of Theorem 1. Few are slightly different, e.g.
F ilter allowing regular expression in SPARQL which occasionally can not be
directly translated into the selection operation ¾ as pointed out in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. For these
exceptions we will estimate their complexity based on known efficient
implementations, also with orientation on similar operations in the relational algebra, if
possible.
        </p>
        <p>After the consideration above we are enabled to define a configuration
oriented on the complexities for relational algebra operations for the example
presented in Section 5. We use the complexity of relational algebra operations for the
cost estimation of the related SPARQL algebra operations under the assumption
that a SPARQL query processor is approximately as efficient as a RDBMS for
the corresponding query (see Table 2). For SPARQL basic operations which are
not reducible to relational algebra operations in the way described by Theorem
1, we rely on the complexities of best-practice implementations generally used
by the database and Semantic Web community.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Example</title>
      <p>At this point we showcase the expected benefits of a query optimizer based
on our cost model. Therefore we require a system of networked, collaborating</p>
      <p>Listing 1.1: SPARQL query q
PREFIX ub:&lt; http : //www. l e h i g h . edu / . . . / univbench . owl#&gt;
FSPREROLEMEFCI&lt;XTh?rtntdpf :?:&lt;//chetxtpa m:/p/lwe w.wor. w/3U.noirvge/r1s9i9t9y/00.2o/w2l2&gt;¡ rdf¡syntax¡ns#&gt;
WHEROE??Pss{TuIrObdNf: An:aLtmyp{ee ??nusb u.b: : GtarkaedsuCaotueSrsteud?enct . . }
}
hosts, whereas each host contains a RDF repository embodying such a query
optimizer. We provide an exemplary SPARQL query q (Listing 1.1, Figure 1) and
demonstrate how cost-based query decomposition will established an optimized
distributed evaluation of the query in this system.</p>
      <p>
        For this example we make the following presumptions for the host machines:
– The RAM size equals one block of persistent memory
– One I/O access equals reading of one block in persistent memory
– The RDF triples in the RDF repositories are indexed by a B+ tree as
described in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
– Statistics about cardinalities of solution mapping multi-sets for SPARQL
Algebra Expressions are available for each RDF repository (e.g. for BGPs a
schema-path-based index of the result cardinalities for n-way triple pattern
joins would be possible, analogous to the source index hierarchy proposed
by [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ])
In Table 2 we provide a list of IO and CPU cost functions necessary for this
example, i.e. for each occurring operator type. These functions were derived
from [
        <xref ref-type="bibr" rid="ref5 ref6">6,5</xref>
        ]. Table 1 explains the used symbols.
      </p>
      <p>Now we give a step by step walk-through of the distributed querying process:
1. Initially SPARQL query q (Listing 1.1) is transformed into a SQGM G
(Figure 1).
2. Then the decomposition component determines – by calling the cost
estimation component – the CPU-costs and IO-costs for each operator and
thereby for each sub graph in SQGM. We use the following randomly
chosen configuration of physical parameters for each operator in the SQGM G:
cgeneric = ccomp = 1, cswap = 10, chash = 2, nrepo = 106, bhost = 10
As implementation for the uppermost J oin operator in G we use the
MergeSortJoin algorithm (Note: A SQGM can be interpreted as rooted directed tree,
if the Default Graph operator is ignored.). For all other operators in G the
choice of the implementation is unambiguous (see Table 2).
3. After the decomposition component gets back the cost estimations (Figure
2) it decides the partitioning of SQGM G with one of the following strategies
(These strategies are intuitive choices, their sophisticated research is targeted
for the future.): If the query optimizer is regularly noticed about available
resources, the partitions can be chosen w.r.t. to their costs and these resources,
i.e. a matching between costs and resources is approximated. Alternatively
the decomposition component tries to partition SQGM G into chunks with
circa the same costs. Here we use the second strategy.</p>
      <p>The overall costs for the left and right sub-tree at the upper-most J oin
operator are 1012 + 1320 = 2332 and 6 + 310 = 316, if we weight CPU and
IO costs with factor 1. Assume two remote hosts A and B are available with
access to q’s default graph. A should have net-distance 3 and B net-distance
10 to the host which will process the uppermost J oin operator and the Select
operator. The remote processing costs for the left subtree on host A would
than be 2332 plus the net latency 3 £ 200 = 600, for the right subtree on host
B 316 plus the net latency 10 £ 250 = 2500. Since A and B work parallel
the overall cost is 2932 (if we have to wait until A is finished) for the remote
processing of the both sub trees compared to a sequential local processing
which is 2932 + 2816 = 6748. Thus we split G at the uppermost-most J oin
operator.
4. The resulting partitions for the sub trees of the uppermost J oin operator in
G lead to the sub queries q1 and q2 (Listings 1.2 and 1.3), which then are
forwarded to host A and B for processing. The retrieved sub query answers
routed back are combined to an answer for q according to its decomposition.
PREFIX ub:&lt; http : //www. l e h i g h . edu / . . . / univbench . owl#&gt;
FSPREROLEMEFCI&lt;XTh?rttdspf :?:&lt;//chetxtpa m:/p/lwe w.wor. w/3U.noirvge/r1s9i9t9y/00.2o/w2l2&gt;¡ rdf¡syntax¡ns#&gt;
WHERE?s { r d f : type ub : GraduateStudent .</p>
      <p>OPTIONAL { ? s ub : takesCourse ? c . }
}</p>
      <p>Listing 1.3: SPARQL query q2
PREFIX ub:&lt; http : //www. l e h i g h . edu / . . . / univbench . owl#&gt;
FSPREROLEMEFCI&lt;XTh?rttdspf :?:&lt;n//hetxtpa m:/p/lwe w.wor. w/3U.noirvge/r1s9i9t9y/00.2o/w2l2&gt;¡ rdf¡syntax¡ns#&gt;
WHERE {
} ? s ub : name ?n .
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>We have developed a cost model that provides System-R-like cost and cardinality
functions for SPARQL queries given as SQGMs and evaluated in an distributed
environment. In addition we proved by reduction that for the cost estimation
of most SPARQL basic operations complexities of relational algebra operations
can be used as upper bounds. Hence, it has yet to be proven – mathematically
or empirically – that these upper bounds are not to pessimistic.</p>
      <p>
        For the future we plan an implementation of the cost model as part of query
optimizer for distributed query processing. Since the optimize-then-execute
paradigm is not efficient for complex queries due to exponential growth of the
estimation error [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] we additionally expect that the query processor and its optimizer
operates in an intra-query adaptive fashion [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], i.e. the optimizer interleave query
processing for dynamic adjustments and cost estimations to provide better
response time or more efficient usage of resources.
      </p>
      <p>We will evaluate the cost model in form of representative benchmarks tests
for this implementation. In particular the accuracy of the upper bounds for
SPARQL basic operations provided by the reduction mentioned above will be
verified.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work is partially supported by the TripCom (IST-4-027324-STP) project;
http://www.tripcom.org.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abiteboul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hull</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Vianu</surname>
          </string-name>
          , V. Foundations of Databases. AddisonWesley, Reading, Massachusetts,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Rdfpeers: a scalable distributed rdf repository based on a structured peer-to-peer network</article-title>
          .
          <source>In WWW '04: Proceedings of the 13th international conference on World Wide Web</source>
          (New York, NY, USA,
          <year>2004</year>
          ), ACM, pp.
          <fpage>650</fpage>
          -
          <lpage>657</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>A relational algebra for SPARQL</article-title>
          .
          <source>Tech. Rep. HPL-2005-170</source>
          , Hewlett Packard Laboratories,
          <string-name>
            <surname>Sept. 28</surname>
          </string-name>
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Deshpande</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z. G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Raman</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Adaptive query processing</article-title>
          .
          <source>Foundations and Trends in Databases 1</source>
          ,
          <issue>1</issue>
          (
          <year>2007</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Elmasri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Navathe</surname>
          </string-name>
          , S. B., Eds.
          <source>Fundamentals of Database Systems</source>
          , second ed.
          <source>Benjamin/Cummings</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Graefe</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Query evaluation techniques for large databases</article-title>
          .
          <source>ACM Computing Surveys</source>
          <volume>25</volume>
          ,
          <issue>2</issue>
          (
          <year>June 1993</year>
          ),
          <fpage>73</fpage>
          -
          <lpage>170</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Optimized index structures for querying rdf from the web</article-title>
          .
          <source>In LA-WEB '05: Proceedings of the Third Latin American</source>
          Web Congress (Washington, DC, USA,
          <year>2005</year>
          ), IEEE Computer Society, p.
          <fpage>71</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Yars2: A federated repository for querying graph structured data from the web</article-title>
          .
          <source>In Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC/ASWC2007)</source>
          , Busan, South Korea (Berlin, Heidelberg,
          <year>November 2007</year>
          ),
          <string-name>
            <given-names>K.</given-names>
            <surname>Aberer</surname>
          </string-name>
          , K.-S. Choi,
          <string-name>
            <given-names>N.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Allemang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. J. B.</given-names>
            <surname>Nixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Golbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          , G. Schreiber, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudré-Mauroux</surname>
          </string-name>
          , Eds., vol.
          <volume>4825</volume>
          of LNCS, Springer Verlag, pp.
          <fpage>211</fpage>
          -
          <lpage>224</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Heese</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>The SPARQL query graph model for query optimization</article-title>
          .
          <source>In ESWC</source>
          (
          <year>2007</year>
          ), E. Franconi,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kifer</surname>
          </string-name>
          , and W. May, Eds., vol.
          <volume>4519</volume>
          of Lecture Notes in Computer Science, Springer, pp.
          <fpage>564</fpage>
          -
          <lpage>578</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ioannidis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Christodoulakis</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>On the propagation of errors in the size of join results</article-title>
          .
          <source>In 19 ACM SIGMOD Conf. on the Management of Data</source>
          , Boulder (May
          <year>1991</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kossmann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>The state of the art in distributed query processing</article-title>
          .
          <source>ACM Comput. Surv</source>
          <volume>32</volume>
          ,
          <issue>4</issue>
          (
          <year>2000</year>
          ),
          <fpage>422</fpage>
          -
          <lpage>469</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mackert</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lohman</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          <article-title>R* optimizer validation and performance evaluation for distributed queries</article-title>
          .
          <source>In Twelfth International Conference on Very Large Data Bases (Kyoto, Japan</source>
          ,
          <fpage>25</fpage>
          -
          <lpage>28</lpage>
          Aug.
          <year>1986</year>
          ),
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Chu</surname>
          </string-name>
          , G. Gardarin,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ohsuga</surname>
          </string-name>
          , and Y. Kambayashi, Eds., Morgan Kaufmann, pp.
          <fpage>149</fpage>
          -
          <lpage>159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>The semantics and complexity of SPARQL.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pirahesh</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellerstein</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <article-title>Extensible/rule based query rewrite optimization in Starburst</article-title>
          .
          <source>In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data</source>
          , San Diego, California, June 2- 5,
          <year>1992</year>
          <article-title>(pub-ACM:adr,</article-title>
          <year>1992</year>
          ), M. Stonebraker, Ed., vol.
          <volume>21</volume>
          (
          <article-title>2) of SIGMOD Record (ACM Special Interest Group on Management of Data)</article-title>
          , ACM Press, pp.
          <fpage>39</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Prud'hommeaux</surname>
          </string-name>
          , E., and
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>SPARQL query language for RDF. W3C recommendation, W3C</article-title>
          , Jan.
          <year>2008</year>
          . http://www.w3.org/TR/2008/REC-rdfsparql-query-
          <volume>20080115</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ruckhaus</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Vidal</surname>
          </string-name>
          , M.-E.
          <article-title>Query evaluation and optimization in the semantic web</article-title>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Selinger</surname>
            ,
            <given-names>P. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Astrahan</surname>
            ,
            <given-names>M. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chamberlin</surname>
            ,
            <given-names>D. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lorie</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Price</surname>
            ,
            <given-names>T. G.</given-names>
          </string-name>
          <article-title>Access path election in a relational database management system</article-title>
          .
          <source>Proc. ACM-SIGMOD International Conference on Management of Data (May 30 - June 1</source>
          ,
          <year>1979</year>
          ),
          <fpage>23</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vdovjak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Houben</surname>
          </string-name>
          , G.-J., and
          <string-name>
            <surname>Broekstra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Index structures and algorithms for querying distributed RDF repositories</article-title>
          .
          <source>In WWW</source>
          (
          <year>2004</year>
          ),
          <string-name>
            <given-names>S. I.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Uretsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Najork</surname>
          </string-name>
          , and C. E. Wills, Eds., ACM, pp.
          <fpage>631</fpage>
          -
          <lpage>639</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>