Incremental SPARQL Evaluation for Query
            Answering on Linked Data

                                Florian Schmedding

                          Department of Computer Science
                   Albert Ludwig University of Freiburg, Germany
                      schmeddi@informatik.uni-freiburg.de


       Abstract. SPARQL is the standard query language for RDF data. How-
       ever, its application to Linked Data is challenging because the assump-
       tion that all necessary data is present at the beginning of the evaluation
       does not apply. Some relevant data sources may only be discovered by
       processing available data. Existing approaches provide implementations
       that compute results for basic graph patterns incrementally while re-
       trieving the data. We contribute to this area by a formal analysis of the
       SPARQL algebra to provide incremental adaptions of the operations.
       This enables us to evaluate the costs of the incremental evaluation for
       the design of optimizers that choose the presumably best computation
       depending on the number of insertions and deletions. In addition, we pro-
       pose a construction of the SPARQL dataset from Linked Data resources
       that enables the usage of the Graph-operator in query answering for
       Linked Data.


1     Introduction
On the Semantic Web, data publishing according to the Linked Data [4] prin-
ciples gained significant importance. The numerous projects mentioned in the
Linking Open Data cloud diagram1 and recent ones in librarianship [8,17] show
its broad adaption by private, public, and governmental initiatives. Apart from
its appealing simplicity in data publication, its inherent distribution can forward
the demanded decentrality of the Web [2] by allocating data at many sites rather
than in centralized stores. However, Linked Data raises new challenges for query
answering. In our research, we investigate these problems and contribute novel
approaches for SPARQL [19] evaluation over Linked Data.
    By interweaving name and address (cf. [5]), resources become dereferenceable
in Linked Data and return an RDF description [14] on request. To put it simply,
each resource can be perceived as data source. This has three implications:
 1. The number of data sources is proportional to the number of resources.
 2. Creating and deleting resources changes the number of data sources.
 3. Data sources cannot be classified without retrieving their content because
    resource names are not related to the content.
1
    See http://richard.cyganiak.de/2007/10/lod/
 Accordingly, query answering on Linked Data is quite different from traditional
 distributed query answering. On the one hand, in general it is not possible to
 specify all relevant data sources for a query in advance. Even in the case that
 all relevant sources have been identified for some query, another query may
 require different sources, and any new resource in the data may be an additional
 relevant source. On the other hand, subqueries cannot be delegated to the remote
 sources because Linked Data does not require the presence of query processors.
 Thus concepts like the Service-operator from the federation extension2 [6] of
 SPARQL 1.1 cannot be used to distribute parts of the query, for instance.
     We illustrate our scenario in a small example with the query ‘Select ?x, ?n
 Where {alice knows ?x. ?x name ?n}’ to find out the names of Alice’s friends.
 Applying the “follow your nose”-approach from Hartig et al. [11] to discover
 relevant data sources, we start by dereferencing alice. We may receive the triples
 {(alice, knows, bob), (alice, knows, charlie), (bob, name, “Bobby”)}, and compute
 the result {{?x 7→ bob, ?n 7→ “Bobby”}}. Next, we request data about charlie
receiving {(charlie, name, “Charlie”)}, and extend the result with {?x 7→ charlie,
?n 7→ “Charlie”}. Searching for more results, we request also bob and get {(bob,
name, “Bob”)}. Having traversed all links3 , the final result is {{?x 7→ bob, ?n
7→ “Bobby”}, {?x 7→ charlie, ?n 7→ “Charlie”}, {?x 7→ bob, ?n 7→ “Bob”}}. In
 the present work, we investigate the suggested incremental computation of the
 results formally.

1.1   Related Work
Recent work has presented different strategies and implementations to evaluate
queries over Linked Data. In terms of Ladwig et al. [15], Hartig et al. [11] follow
a bottom-up approach, i. e., a query is evaluated without any prior information.
Dereferencing the query constants and then following some links in the data,
the result is computed incrementally, similar to our above example. However,
their implementation with so-called non-blocking iterators may miss some results
depending on the evaluation order. Hartig alleviates this problem in [10] with
heuristic adjustments. In contrast, Harth et al. [9] employ a top-down (cf. [15])
strategy. Acquired knowledge about sources is organized in a special index before
queries are processed. The index is used to select the most promising sources for a
given query. For our example their index would recommend alice, bob, and charlie
at best letting the result be generated in a single run. Corresponding to our
arguments, Harth et al. also mention that the query completeness can increase
when data sources which are encountered in the evaluation but are not indexed
are considered during the evaluation (e. g., retrieve charlie if only alice and bob are
indexed). Ladwig et al. elaborate this idea and propose a mixed resp. exploration-
based approach. They propose strategies for query-specific source selection, and
use a symmetric hash join for the incremental computation that, unlike the non-
blocking iterators, generates all results. In [16] they refine their join method with
2
  Working Draft at
  http://www.w3.org/TR/2010/WD-sparql11-federated-query-20100601/
3
  We do not consider the predicates here.
the intention to consider a local storage beside the remote sources. However, the
implementations are not based on an analysis of the SPARQL semantics. They
consider only basic graph patterns, a subset of SPARQL and do not deal with
decrements potentially induced by optional graph patterns.

1.2   Contribution
We are interested in the incremental SPARQL evaluation over increasing datasets
generated by bottom-up Linked Data query answering. In accordance to [10]
and [15], our assumption is that the solutions should be generated after each
addition—we speak of an immediate processing—because an exhaustive link
traversal prior to returning the first results seems unfeasible. The contrary, de-
ferred processing, would delay the result computation until all data has been
loaded. However, we need to investigate whether an incremental computation,
i. e. modifying current results when the data changes, is superior to a direct com-
putation, i. e. deriving all results from scratch in this case. At a first glance, this
seems true for monotonically increasing results, and questionable for optional
graph patterns that can introduce negation by failure. Therefore we study each
SPARQL operator in detail to provide adaptions that enable incremental com-
putation. By an estimation of the processing costs we get evidences for criteria
that render one computation method preferable over the other, and make a step
towards optimized immediate query processing for Linked Data.

Outline. Next, we introduce RDF, SPARQL, Linked Data, and some algebraic
equivalences that are necessary for our approach. In Sec. 3 we elaborate on
our approach and show the incremental adaptions of the SPARQL operators.
We compare our approach to the direct computation in Sec. 4. In Sec. 5, we
conclude our work and sketch a possible further development, the application of
the HTTP cache mechanism in query answering for Linked Data.


2     Preliminaries
We introduce the RDF data model and the SPARQL query language under
special attention to the construction of the dataset by link traversal and to the
relation between the Graph-operator and Linked Data resources.

2.1   RDF
The RDF data format [12] describes essentially graphs with directed labeled
edges. We consider RDF terms T comprising three pairwise disjoints sets I
(IRIs), B (blank nodes), and L (literals). An RDF triple (s, p, o) ∈ I ∪ B × I × T
connects node s (subject) through the directed labeled edge p (predicate) with
node o (object). A finite set of triples is called RDF graph. The blank nodes in
an RDF graph G are denoted by blank(G). We distinguish blank nodes with the
prefix ‘_:’ and literals with double quotes (e. g., _:bn01, “Rain”) from IRIs.
     A named graph G [7] is an entity hu, Gi with a name u ∈ I and an RDF
graph G. It holds name(G) = u and gr(G) = G. Distinct named graphs do not
share any blank nodes.
     An RDF dataset D (cf. [1]) is a set containing a possibly empty default
graph dg(D) = G0 and zero or more named graphs, ngs(D) = ∅ or ngs(D) =
{G1 , . . . , Gn } with Gi = hui , Gi i, where for i 6= j (i) name(Gi ) 6= name(Gj ), and
(ii) blank(Gi ) ∩ blank(Gj ) = ∅. We write names(D) to denote {name(Gi ) | Gi ∈
ngs(D)}, and gr(u)D = gr(Gi ) if name(Gi ) = u and Gi ∈ ngs(D), otherwise it is
the empty RDF graph.
     We use the operator ‘t’ to merge two RDF graphs and define the operator
‘+’ as follows. For a dataset D and a named graph G, D + G = D ∪ G with
G0 = dg(D) t gr(G) if name(G) ∈         / names(D), else D + G = D.

2.2   SPARQL
SPARQL is a W3C-recommended [19] query language for RDF data. We follow
the compositional semantics in [18,20] and include the Graph-operator as in [1].
Like there, our syntax differs from the W3C syntax in the previous example.

Syntax. Let V be a set of variables, V ∩ T = ∅. We indicate variables with a
leading question mark (e. g., ?x). A triple pattern (s, p, o) ∈ IV × IV × ILV is a
SPARQL expression.4 The variables of a triple pattern t are denoted by vars(t).
If P1 and P2 are SPARQL expressions, so are P1 Filter R, P1 Union P2 ,
P1 And P2 , P1 Opt P2 , u Graph P1 (u ∈ I), and ?x Graph P1 . R means a
filter condition: ?x =?y, ?x = c (c ∈ L ∪ I), and bnd(?x) are filter conditions;
if R1 and R2 are filter conditions then ¬R1 , R1 ∧ R2 , and R1 ∨ R2 are, too.
Finally, a query has the form SelectS,F1 ,F2 (P ) where S is a finite subset of V ,
P is a SPARQL expression, and the dataset specifications F1 and F2 are possibly
empty finite subsets of I (cf. From and From Named in Sec. 8 of [19]).

Semantics. A mapping µ is a partial function µ : V → T ; the domain of µ
is denoted by dom(µ). There is an empty mapping µ0 with dom(µ0 ) = ∅. Two
mappings µ1 , µ2 are compatible if for all ?x ∈ dom(µ1 )∩dom(µ2 ) holds µ1 (?x) =
µ2 (?x), written as µ1 ∼ µ2 . A mapping µ1 is subsumed by a mapping µ2 , denoted
µ1 v µ2 , if µ1 ∼ µ2 and dom(µ1 ) ⊆ dom(µ2 ). Mappings can be applied to triple
patterns, written as µ(t), and replace all ?x ∈ dom(µ) ∩ vars(t) in t by µ(?x).
    A mapping µ satisfies the filter condition R, denoted µ  R, iff one of the
following six conditions holds: (i) R is bnd(?x) and ?x ∈ dom(µ), or (ii) R
is ?x =?y and bnd(?x) ∧ bnd(?y) ∧ µ(?x) = µ(?y), or (iii) R is ?x = c and
bnd(?x) ∧ µ(?x) = c, or (iv) R is ¬R1 and ¬(µ  R1 ), or (v) R is R1 ∧ R2 ,
µ  R1 , and µ  R2 , or (vi) R is R1 ∨ R2 , and µ  R1 or µ  R2 .5
4
  Blank nodes are not considered because they act as anonymous variables and can
  be replaced w. l. o. g. by unselected variables in a query.
5
  The distinction between false and error in case of µ 6 R can be safely ignored in
  our context.
    The solution of a SPARQL expression or query is a set of mappings. Let R
be a filter condition and S a finite set of variables. For mapping sets Ω1 , Ω2 , the
SPARQL set algebra defines the operations join (./), union (∪), minus (\), left
join (¯o
       n), selection (σ), and projection (π):
      ¯
         Ω1 ./ Ω2 := {µ1 ∪ µ2 | µ1 ∈ Ω1 , µ2 ∈ Ω2 : µ1 ∼ µ2 }
         Ω1 ∪ Ω2 := {µ | µ ∈ Ω1 ∨ µ ∈ Ω2 }
          Ω1 \ Ω2 := {µ1 ∈ Ω1 | ∀µ2 ∈ Ω2 : µ1 6∼ µ2 }
            o Ω2 := (Ω1 ./ Ω2 ) ∪ (Ω1 \ Ω2 )
        Ω1 ¯n
           ¯
         σR (Ω1 ) := {µ ∈ Ω1 | µ  R}
          πS (Ω1 ) := {µ | dom(µ) ⊆ S ∧ ∃µ0 : dom(µ0 ) ∩ S = ∅ ∧ µ ∪ µ0 ∈ Ω1 }


    The evaluation semantics is defined by the help of a function [[.]] that trans-
fers a query Q into set algebra, written as [[Q]]. For expressions P , the same
function is used with two additional arguments to indicate the dataset D and
an active graph G for the evaluation, written [[P ]]D
                                                    G . Let t be a triple pattern,
P1 , P2 SPARQL expressions, u ∈ I, and R, S as before.
                        [[t]]D
                             G := {µ | dom(µ) = vars(t) ∧ µ(t) ∈ G}

          [[P1 Filter R]]D               D
                         G := σR ([[P1 ]]G )

          [[P1 Union P2 ]]D           D          D
                          G := [[P1 ]]G ∪ [[P2 ]]G

            [[P1 And P2 ]]D           D           D
                          G := [[P1 ]]G ./ [[P2 ]]G

             [[P1 Opt P2 ]]D           D
                                          o [[P2 ]]D
                           G := [[P1 ]]G ¯n        G
                                         ¯
                           D           D
           [[u Graph P1 ]]G := [[P1 ]]gr(u)D
                                    [
         [[?x Graph P1 ]]D                                [[ui Graph P1 ]]D
                                                                                           
                         G :=                                             G ./ {{?x 7→ ui }}
                                        ui ∈names(D)
                                                   ∗
                                        πS ([[P1 ]]D
                                                   G0 )     if F1 = F2 = ∅
      [[SelectS,F1 ,F2 (P1 )]] :=                    0
                                        πS ([[P1 ]]D
                                                   G0 )     else

According to [19], queries without dataset specification are evaluated over a de-
fault dataset D∗ available  to the query processor.  Otherwise thedataset is con-
structed as D0 :=
                   F                     S
                      ui ∈F1 deref(ui ) ∪   vi ∈F2 {hvi , deref(vi )i} . The function
deref maps an IRI to its corresponding graph.


2.3   Linked Data

The term Linked Data [3] refers to several conventions that integrate data pub-
lication into the Web’s HTTP stack as well as to the published data itself. We
focus on a key aspect, the identification of non-information (e. g., a person, cf.
[13]) resources with URLs though their essence is not a transmittable message.
A request for such a resource u ∈ I is thus redirected to an information resource
f(u) which serves a description (e. g., RDF graph or HTML page) for u. So the
references to other resources in descriptions are traversable and carry over the
“follow your nose”-principle from the Web of Documents to the Web of Data.
Graphs. Among others, the SPARQL Graph-operator is useful for restricting
mappings to authoritative information (the information provided by the URI
owner of a resource, cf. Sec. 2.2.2.1 in [13] and Sec. 5.1 in [3]). Unfortunately,
the rather intuitive adaption of our introductory example, Select{?x,?n},∅,∅
((alice, knows, ?x) And (?x Graph (?x, name, ?n))), is inconsistent because
non-information resources (friends of Alice here) are not RDF graphs. Therefore,
  – we define the dataset D0 :=
                                  F                     S                               
                                     ui ∈F1 deref(ui ) ∪   vi ∈F2 {hf(vi ), deref(vi )i}
     to beware the equation of non-information resources with named graphs.
     Consequently, however, graph names in D0 are unpredictable before evalu-
     ating a query, so
  – we add (u, f, f(u)) to G0 for each dereferenced resource u to make the relation
     between u and its description f(u) explicitly available.
    Of course, triples t with t.p = f got from external sources are not inserted
into D0 to prevent tampering. Hence the previous query can be expressed as
Select{?x,?n},∅,∅ (((alice, knows, ?x) And (?x, f, ?y)) And (?y Graph (?x,
name, ?n))) and does not return Alice’s nickname for Bob, Bobby.

Query Answering. We follow the bottom-up approach from [11] to illustrate
our approach. First, for all dereferenceable constants c ∈ I in a query we add
f(c) to F1 and F2 and compute the mapping sets for this dataset. Second, we
consider each dereferenceable c occurring in a mapping as a relevant source and
insert f(c) into F1 and F2 . The results over the extended dataset are computed
incrementally based on the present mapping sets. We repeat the second step
until F1 and F2 remain unchanged.

2.4   Algebraic Equivalences
Our approach is based on transformations of SPARQL algebra expressions. We
define difference (−) and intersection (∩) for mapping sets as usual (cf. union
above) and introduce two new equivalence rules additional to those from the
synoptical table in [20] shown in Tab. 1. With ‘P1 ≡ P2 ’ we denote the equiv-
alence between the algebra expressions P1 and P2 . Note that minus is indeed
distinct from difference: Consider Ω1 = {{?x 7→ a, ?y 7→ b}, {?x 7→ a}} and
Ω2 = {{?x 7→ a, ?y 7→ b}}, then Ω1 \ Ω2 = ∅ as against Ω1 − Ω2 = {{?x 7→ a}}.
Lemma 1 (FDPush). Let Ω1 , Ω2 be mapping sets and R a filter condition,
then σR (Ω1 − Ω2 ) ≡ σR (Ω1 ) − σR (Ω2 ).
Proof. We fix a mapping µ and show that it is contained in left hand side iff it
is contained in the right hand side. “⇒”: Suppose µ ∈ σR (Ω1 − Ω2 ). It holds
that µ ∈ Ω1 , thus µ ∈ Ω1 − Ω2 because selection does not add mappings and
µ∈ / Ω2 . It follows immediately that µ ∈ σR (Ω1 ) but µ ∈/ σR (Ω2 ). “⇐”: Suppose
µ ∈ σR (Ω1 ) − σR (Ω2 ). Then it holds that µ ∈ Ω1 and µ  R. We distinguish two
cases. Case (1): We assume µ ∈  / Ω2 and are done. Case (2): We assume µ ∈ Ω2 .
Then µ ∈ σR (Ω2 ) because µ  R and so µ ∈   / σR (Ω1 ) − σR (Ω2 ). This contradicts
the first presumption.                                                            t
                                                                                  u
(A ∪ B) ∪ C ≡ A ∪ (B ∪ C)                   (UAss)          Table 1. Algebraic Equiv-
(A ./ B) ./ C ≡ A ./ (B ./ C)               (JAss)          alences, where A, B, C de-
A∪B           ≡B∪A                          (UComm)         note mapping sets.
A ./ B        ≡ B ./ A                      (JComm)
(A ∪ B) ./ C ≡ (A ./ C) ∪ (B ./ C)          (JUDistR)
A ./ (B ∪ C) ≡ (A ./ B) ∪ (A ./ C)          (JUDistL)
(A ∪ B) \ C ≡ (A \ C) ∪ (B \ C)             (MUDistR)
(A \ B) \ C ≡ A \ (B ∪ C)                   (MMUCorr)


Lemma 2 (MDReord). Let Ω1 , Ω2 , Ω3 be mapping sets, then (Ω1 −Ω2 )\Ω3 ≡
(Ω1 \ Ω3 ) − Ω2 .
Proof. We proceed like above. “⇒”: Suppose µ ∈ (Ω1 − Ω2 ) \ Ω3 . Then for all
µ0 ∈ Ω3 holds µ 6∼ µ0 , and µ ∈ / Ω2 but µ ∈ Ω1 . It follows that µ ∈ Ω1 \ Ω3 , and
thus also µ ∈ (Ω1 \Ω3 )−Ω2 . “⇐”: Suppose µ ∈ (Ω1 \Ω3 )−Ω2 . Then µ ∈    / Ω2 , and
for all µ0 ∈ Ω3 holds µ 6∼ µ0 . So µ ∈ Ω1 − Ω2 and finally µ ∈ (Ω1 − Ω2 ) \ Ω3 . tu


3     Incremental SPARQL Evaluation
We want to evaluate SPARQL over an increasing dataset while we are inter-
ested in the result of a query after each addition to the data as outlined in the
introduction. This can certainly be achieved by a complete evaluation over the
whole data. However, we think that an approach that takes previously computed
results into account might perform better. Therefore we provide an incremental
adaption for each algebra operation to compute the result based on the changes
of the operands between the dataset D and the increased dataset D + ∆D . We
consider also the mechanism to select the active graph (Graph) and the evalu-
ation of triple patterns.
Definition 1 (Insertions and Deletions). For a SPARQL expression P , an
                                                            0       D+∆D
RDF dataset D, and a named graph ∆D , let A = [[P ]]D
                                                    G and A = [[P ]]G    .
                      +     0                   −         0
We define insertions ∆A := A − A and deletions ∆A := A − A .
                         −                             −               −
It follows that (i) ∆+                +                    0
                     A ∩∆A = ∅, (ii) ∆A ∩A = ∅, (iii) ∆A ∩A = ∅, (iv) ∆A ⊆ A,
           0           −    +
and (v) A = (A − ∆A ) ∪ ∆A . This must hold in the following transformations.

3.1    Algebra operations
For the transformations of union, join, and minus assume that A = [[P1 ]]D
                                                                         G, B =
            0         D+∆D                    D+∆D
[[P2 ]]D
       G , A = [[P1 ]]G    , and B 0 = [[P2 ]]G    have already been computed.

                                                                0        D+∆D
Projection. Given Cπ = [[SelectS,F1 ,F2 (P )]], A = [[P ]]D
                                                          G0 , A = [[P ]]G0   ,
                                            0
and ∆D = hf(u), Gi, we are interested in Cπ = [[SelectS,F1 ∪{u},F2 ∪{u} (P )]].
∆−     +
  π , ∆π can be used when projections are pushed down in query optimizations.
                                                   D+∆D
      [[SelectS,F1 ∪{u},F2 ∪{u} (P )]] = πS ([[P ]]G 0
                                                        ) = πS (A0 )
        = {µ | dom(µ) ⊆ S ∧ ∃µ0 : dom(µ0 ) ∩ S = ∅ ∧ µ ∪ µ0 ∈ A0 }
        = {µ | dom(µ) ⊆ S ∧ ∃µ0 : dom(µ0 ) ∩ S = ∅ ∧ µ ∪ µ0 ∈ (A − ∆−
                                                                    A )}
           ∪ {µ | dom(µ) ⊆ S ∧ ∃µ0 : dom(µ0 ) ∩ S = ∅ ∧ µ ∪ µ0 ∈ ∆+
                                                                  A}
        = (πS (A) − {µ ∈ πS (∆−        0   0       0          +
                              A ) | ¬∃µ ∈ A : µ v µ }) ∪ πS (∆A )
                   −
    ∆−                      0   0       0
     π = {µ ∈ πS (∆A ) | ¬∃µ ∈ A : µ v µ }
                   +
    ∆+
     π = {µ ∈ πS (∆A ) | µ ∈
                           / πS (A)}

                                                              0
Selection. Given Cσ = [[P Filter R]]D
                                    G , we are interested in Cσ = [[P Filter
   D+∆D                    D      0        D+∆D
R]]G    . Assume A = [[P ]]G and A = [[P ]]G     have been computed yet.

      [[P Filter R]]D+∆
                    G
                       D
                         = σR ((A − ∆−      +
                                     A ) ∪ ∆A )
                            = σR (A − ∆−          +
                                       A ) ∪ σR (∆A )           (FUPush)
                           = (σR (A) − σR (∆−           +
                                            A )) ∪ σR (∆A )     (FDPush)
                                   −
                        ∆−
                         σ = σR (∆A )
                                   +
                        ∆+
                         σ = σR (∆A )


                                            0                   D+∆D
Union. Given C∪ = [[P1 Union P2 ]]D
                                  G , find C∪ = [[P1 Union P2 ]]G    .

  [[P1 Union P2 ]]D+∆
                  G
                      D
                        = A0 ∪ B 0
       = ((A − ∆−      +            −      +
                A ) ∪ ∆A ) ∪ ((B − ∆B ) ∪ ∆B )
       = ((A − ∆−           −        +    +
                A ) ∪ (B − ∆B )) ∪ (∆A ∪ ∆B )              (UAss, UComm)
                                / ∆−
      = {µ ∈ A ∪ B | (µ ∈ A ∧ µ ∈                    / ∆−
                                    A ) ∨ (µ ∈ B ∧ µ ∈
                                                                 +    +
                                                        B )} ∪ (∆A ∪ ∆B )
                                           −                        −
   ∆−                   / A ∪ ∆+
    ∪ = {µ ∈ A ∪ B | (µ ∈                         / B ∪ ∆+
                                A ∨ µ ∈ ∆A ) ∧ (µ ∈       B ∨ µ ∈ ∆B )}
              +      +
   ∆+
    ∪ = {µ ∈ ∆A ∪ ∆B | µ ∈/ A ∪ B}

                                          0                  D+∆D
Join. Given C./ = [[P1 And P2 ]]D
                                G , find C./ = [[P1 And P2 ]]G    .

  [[P1 And P2 ]]D+∆
                G
                    D
                      = A0 ./ B 0
      = ((A − ∆−      +             −      +
               A ) ∪ ∆A ) ./ ((B − ∆B ) ∪ ∆B )
      = ((A − ∆−            −
               A ) ./ (B − ∆B ))
         ∪ ((A − ∆−       +       +     0
                  A ) ./ ∆B ) ∪ (∆A ./ B )              (JUDistR, JUDistL)
                                                         / ∆−
      = {µ ∈ A ./ B | ∃µ1 ∈ A, µ2 ∈ B : µ = µ1 ∪ µ2 ∧ µ1 ∈         / ∆−
                                                            A ∧ µ2 ∈  B}
         ∪ ((A − ∆−       +       +     0
                  A ) ./ ∆B ) ∪ (∆A ./ B )
  ∆−                             +              +
   ./ = {µ ∈ A ./ B | ∀µ1 ∈ A ∪ ∆A , ∀µ2 ∈ B ∪ ∆B :
                      (µ1 ∪ µ2 = µ) → (µ1 ∈ ∆−         −
                                             A ∨ µ2 ∈ ∆B )}
                    −       +       +     0
  ∆+
   ./ = {µ ∈ ((A − ∆A ) ./ ∆B ) ∪ (∆A ./ B ) | µ ∈
                                                 / A ./ B}
                                               0
Minus. Given Cr = A \ B, we are interested in Cr = A0 \ B 0 .
   0
  Cr = ((A − ∆−      +
              A ) ∪ ∆A ) \ B
                             0

      = ((A − ∆−            −      +        +    0
               A ) \ ((B − ∆B ) ∪ ∆B )) ∪ (∆A \ B )             (MUDistR)
    = ((A − ∆−                 +   −        +      0     −     +
              A ) \ ((B ∪ ∆B ) − ∆B )) ∪ (∆A \ B ) (∆B ∩ ∆B = ∅)
    = ((A − ∆−               +
              A ) \ (B ∪ ∆B ))
      ∪ {µ ∈ A − ∆−          0       +           0     0   −        +
                     A | ∀µ ∈ (B ∪ ∆B ) : µ ∼ µ → µ ∈ ∆B } ∪ (∆A \ B )
                                                                        0

    = (((A \ B) − ∆−          +
                     A ) \ ∆B )                        (MMUCorr, MDReord)
                     −
      ∪ {µ ∈ A − ∆A | ∀µ ∈ (B ∪ ∆B ) : µ ∼ µ → µ0 ∈ ∆−
                             0       +           0                  +   0
                                                           B } ∪ (∆A \ B )
 ∆−                         −    0    +
  r = {µ ∈ A \ B | µ ∈ ∆A ∨ ∃µ ∈ ∆B : µ ∼ µ }
                                                0

                   −      0                   0      0   −            0
 ∆+                                +
  r = {µ ∈ A − ∆A | ∀µ ∈ (B ∪ ∆B ) : µ ∼ µ → µ ∈ ∆B } ∪ (∆A \ B )
                                                                  +


Left Join. C 0n   = [[P1 Opt P2 ]]D+∆
                                  G
                                      D
                                        = A0 ¯o
                                              n B 0 can be expressed by join and
             ¯0o
             ¯        0    0     −      −    ¯−        +      +     +
minus, thus C n o = C./ ∪ Cr , ∆¯no = ∆./ ∪ ∆r , and ∆¯no = ∆./ ∪ ∆r .
              ¯¯                 ¯                     ¯
3.2   Active Graph Selection and Triple Patterns
The selection of the active graph propagates to the subexpressions and finally
takes effect in the evaluation of triple patterns. The active graph can also be
changed inside the scope of a Graph-operator, yet it is not possible to reactivate
the default graph. For example, [[u1 Graph (P1 And (u2 Graph P2 )]]D          G is
equivalent to [[P1 ]]D
                     gr(u1 )D ./ [[P 2 ]]D
                                         gr(u2 )D , ui ∈ I.

Default Graph. Let t be a triple pattern and ∆D = hu, Gi. Given CDG =
                                   0         D+∆D
[[t]]D
     dg(D) , we are interested in CDG = [[t]]dg(D+∆D ) .

                                     D+hu,Gi               D+hu,Gi
             [[t]]D+∆ D
                  dg(D+∆D ) = [[t]]dg(D+hu,Gi) = [[t]]dg(D)tG

                               = {µ | dom(µ) = vars(t) ∧ t ∈ dg(D) t G}
                               = {µ | dom(µ) = vars(t) ∧ t ∈ dg(D)}
                                  ∪ {µ | dom(µ) = vars(t) ∧ t ∈ G}
                                                   hu,Gi
                               = [[t]]D
                                      dg(D) ∪ [[t]]G
                                           hu,Gi
                          ∆+
                           DG = {µ ∈ [[t]]G          / [[t]]D
                                                   |µ∈      dg(D) }


Fixed Graph. The expression u Graph P with fixed graph name u is evaluated
as [[P ]]D                                                 D
         gr(u)D . Let t be a triple pattern and CFG = [[t]]gr(u)D . We are interested in
 0
CFG = [[t]]D+∆  D
           gr(u)D+∆      with ∆D = hu0 , Gi. The expression is rewritten like above.
                      D


                                    [[t]]D
                                  (
                       D+∆D               gr(u)D    if u ∈ names(D)
                  [[t]]gr(u)D+∆ =         {hu0 ,Gi}
                               D    [[t]]gr(u)∆ if u = u0
                                                    D
                                                 {hu0 ,Gi}
                                      (
                                          [[t]]gr(u)∆ if u = u0
                              ∆+
                               FG =                  D
                                          ∅            else

Variable    Graph. With variable graph name, ?x Graph P is evaluated as
                                 D
S
  ui ∈names(D) [[ui Graph P ]]G ./ {{x 7→ ui }} . Unlike before, it cannot be
completely pushed down to the subexpressions.
                                                    Let be ∆D = hu, Gi as above.
Given CVG = [[P ]]D                                          D
                    gr(u1 )D ./ {{x 7→ u1 }} ∪ . . . ∪ [[P ]]gr(un )D ./ {{x 7→ un }}
                             0          D+∆D
and Ai = [[P ]]Dgr(ui )D , Ai = [[P ]]gr(ui )D for ui ∈ names(D) we are interested in
  0
     = ui ∈names(D+∆D ) [[ui Graph P ]]D+∆
        S                                                               
CVG                                              G
                                                      D
                                                        ./ {{x 7→ ui }} .
    [
                           [[P ]]D+∆
                                                             
                                 gr(ui )D+∆D ./ {{x 7→ ui }}
                                       D
      ui ∈names(D+∆D )
             [
                                 A0i ./ {{x 7→ ui }} ∪ [[P ]]D+∆
                                                                                  
           =                                                   G
                                                                   D
                                                                      ./ {{x 7→ u}}
                ui ∈names(D) |            {z       }
                                          =Bi0
              [
      ∆−
       VG =           ∆−
                       Bi
              [i
                                  D+∆D
      ∆+              ∆+
                                                             
       VG =       i    Bi ∪ [[P ]]G    ./ {{x 7→ u}}

∆−        +                                                          +    −
 Bi and ∆Bi are composed as in Sec. 3.1 and thus disjoint. It holds ∆VG ∩∆VG =
                                       +          −
∅ because µ1 (?x) 6= µ2 (?x) for µ1 ∈ ∆Bi , µ2 ∈ ∆Bj where i 6= j.

4     Comparing Incremental and Direct Computation
We evaluate our approach by comparing the incremental computation to the
direct computation. We exemplify these considerations for projection and leave
out the other operations due to space limitations. Union behaves similarly, join
and minus are slightly more complicated, and selection is easier.
Definition 2 (Projection of mappings). Let µ be a mapping and S a fi-
nite set of variables. The mapping µ[S] is the projection of µ onto S where (i)
dom(µ[S]) := dom(µ) ∩ S and (ii) µ[S](?x) := µ(?x).

Costs of Projection. Let sets with the operations insert, delete, and fsub to
check for a subsuming mapping be given. We do not assume a specific order for
the sets. The costs of evaluating an operation op are denoted by kopk.
   The direct computation of the projection simply performs ‘for µ ∈ A0 do
µ0 ←− µ[S]; Cπ0 insert µ0 end’, so its costs can be estimated at |A0 | · (kµ[S]k
+ kCπ0 insert µ0 k). An achievable proceeding for the incremental case is shown
in Alg. 1. Its approximated costs are given in the following sum where α is the
number of successful subsumption checks and β the number of changes to Cπ .
     −               0       0        −         0          0     −          0
                                                                             
    |∆A | · kµ[S]k + kA fsub µ k + (|∆A | − α) kCπ delete µ k + k∆π insert µ k
                                                   0         0                  0
                              + |∆+                                   +
                                                                  
                                  A | · kµ[S]k + kCπ insert µ k + β k∆π insert µ k

Overestimating the insertions into ∆+π we can conclude that the incremental
adaption has fewer costs if ∆−
                             A =  ∅ and  |A0 | > 2 · |∆+
                                                       A |. Otherwise it depends
                −
on the size of ∆A and opens the way for optimizations.
    Algorithm 1: Compute Cπ0 = πS (A0 ) incrementally
     Data: Mapping sets Cπ = πS (A), A0 , ∆−    +
                                           A , ∆A , variable set S
     Result: Cπ0 = πS (A0 )
     begin
        ∆−           +
          π ←− ∅; ∆π ←− ∅
                   −
        for µ in ∆A do
            µ0 ←− µ[S]; if not A0 fsub µ0 then Cπ delete µ0 ; ∆−   π insert µ
                                                                              0


        for µ in ∆+
                  A do
            µ0 ←− µ[S]; if Cπ insert µ0 changes Cπ then ∆+
                                                         π insert µ
                                                                    0


Towards an optimizer. A useful cost estimation is subject to the data and es-
pecially to the chosen implementation. If the direct evaluation is presumably
cheaper, an optimizer may choose to compute Cπ0 only from A0 . However, to
support the incremental computation for operations that use Cπ0 as input, the
costs to compute ∆−                0         +       0
                      π := Cπ − Cπ and ∆π := Cπ − Cπ must be considered, too.
    A different improvement can be achieved by using mapping multi-sets (cf.[20])
that combine a mapping µ with a multiplicity m(µ). In the computation of ∆−          π,
it must be checked for each mapping µ ∈ ∆−     A whether there are still justifications
for µ[S] in A0 . By contrast, the multiplicities in a mapping multi-set are evidences
for the number of justifications. By defining −A as A with each multiplicity
multiplied by −1, we can compute Cπ0 = {µ ∈ πS (A) ∪ (−πS (∆−                     +
                                                                     A ) ∪ πS (∆A )) |
                                                                      −          +
m(µ) > 0} (assuming that the operations cover multiplicities). ∆π and ∆π can
be assigned during the computation of the leftmost union for deletions (µ with
m(µ) < 0) and insertions (µ with m(µ) > 0).

5     Conclusion and Future Work
We have presented a novel analysis of incremental SPARQL evaluation. Our
results show means to design an optimizer that is able to choose the presumably
best computation in the described Linked Data query answering scenario where
the immediate processing of new data is desired. We have also proposed an
integration of Linked Data resources and the Graph-operator based on specific
construction of the SPARQL dataset. We think that our findings provide a sound
formal basis for further research in this area.

Future Work. Next, we want to transfer the presented analysis to the SPARQL
bag semantics and implement our approach with the described possibilities for
optimization, and generalize ∆D in order to let it contain more than one named
graph. Our approach may also be useful in combination with local caches. As
Linked Data is built on top of HTTP, the cache-control mechanism6 could be
used to detect and update outdated data on the fly in order to integrate the
latest information during the query processing.
6
    RFC #2616 Draft Standard at http://tools.ietf.org/html/rfc2616
References
 1. Angles, R., Gutierrez, C.: The Expressive Power of SPARQL. In: Proceedings of
    the 7th Int’l Semantic Web Conference (ISWC). Karlsruhe, Germany (2008)
 2. Berners-Lee, T.: Long Live the Web: A Call for Continued Open Standards and
    Neutrality. Scientific American Magazine 12 (2010)
 3. Bizer, C., Cyganiak, R., Heath, T.: How to Publish Linked Data on the Web. Tech.
    Rep., Freie Universität Berlin, The Open University (2007), http://www4.wiwiss.
    fu-berlin.de/bizer/pub/LinkedDataTutorial/, Accessed Aug 15, 2011
 4. Bizer, C., Heath, T., Berner-Lee, T.: Linked Data – The Story So Far. International
    Journal on Semantic Web and Information Systems (IJSWIS) 5(3) (2009)
 5. Booth, D.: Four uses of a url: Name, Concept, Web Location and Document
    Instance (Jan 2003), http://www.w3.org/2002/11/dbooth-names/dbooth-names_
    clean.htm, Accessed Aug 15, 2011
 6. Buil-Aranda, C., Arenas, M., Corcho, O.: Semantics and Optimization of the
    SPARQL 1.1 Federation Extension. In: Proceedings of the 8th Extended Semantic
    Web Conference (ESWC). Heraklion, Greece (2011)
 7. Carroll, J.J., Bizer, C., Hayes, P., Stickler, P.: Named Graphs. Web Semantics:
    Science, Services and Agents on the World Wide Web 3(4) (2005)
 8. Hannemann, J., Kett, J.: Linked Data for Libraries. In: World Library and Infor-
    mation Congress: 76th IFLA Gen. Conf. and Assy. Gothenburg, Sweden (2010)
 9. Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.U., Umbrich, J.: Data
    Summaries for On-Demand Queries over Linked Data. In: Proceedings of the 19th
    Int’l Conference on World Wide Web (WWW). Raleigh, NC, USA (2010)
10. Hartig, O.: Zero-Knowledge Query Planning for an Iterator Implementation of Link
    Traversal Based Query Execution. In: Proceedings of the 8th Extended Semantic
    Web Conference (ESWC). Heraklion, Greece (2011)
11. Hartig, O., Bizer, C., Freytag, J.C.: Executing SPARQL Queries over the Web of
    Linked Data. In: Proc. of the 8th Int’l Sem. Web Conf. Chantilly, VA, USA (2009)
12. Hayes, P., McBride, B.: RDF Semantics. W3C Recommendation (Feb 2004), http:
    //www.w3.org/TR/2004/REC-rdf-mt-20040210/, Accessed Aug 15, 2011
13. Jacobs, I., Walsh, N.: Architecture of the World Wide Web. W3C Rec. (Dec 2004),
    http://www.w3.org/TR/2004/REC-webarch-20041215/, Accessed Aug 15, 2011
14. Klyne, G., Caroll, J.J., McBride, B.: Resource Description Framework (RDF):
    Concepts and Abstract Syntax. W3C Recommendation (Feb 2004), http://www.
    w3.org/TR/2004/REC-rdf-concepts-20040210/, Accessed Aug 15, 2011
15. Ladwig, G., Tran, T.: Linked Data Query Processing Strategies. In: Proceedings
    of the 9th Int’l Semantic Web Conference (ISWC). Shanghai, China (2010)
16. Ladwig, G., Tran, T.: SIHJoin: Querying Remote and Local Linked Data. In: Proc.
    of the 8th Extended Semantic Web Conference (ESWC). Heraklion, Greece (2011)
17. Nandzik, J., Heß, A., Hannemann, J., Flores-Herr, N., Bossert, K.: Contentus
    – Towards Semantic Multi-Media Libraries. In: World Library and Information
    Congress: 76th IFLA General Conf. and Assembly. Gothenburg, Sweden (2010)
18. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. ACM
    Transactions on Database Systems (TODS) 34(3) (2009)
19. Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF. W3C Rec-
    ommendation (Jan 2008), http://www.w3.org/TR/2008/REC-rdf-sparql-query-
    20080115/, Accessed Aug 15, 2011
20. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL Query Optimization.
    In: Proceedings of the 13th International Conference on Database Theory (ICDT).
    Lausanne, Switzerland (2010)