<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Caching and Prefetching Strategies for SPARQL Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Johannes Lorey</string-name>
          <email>johannes.lorey@hpi.uni-potsdam.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felix Naumann</string-name>
          <email>felix.naumann@hpi.uni-potsdam.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hasso-Plattner-Institut</institution>
          ,
          <addr-line>Potsdam</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Linked Data repositories offer a wealth of structured facts, useful for a wide array of application scenarios. However, retrieving this data using Sparql queries yields a number of challenges, such as limited endpoint capabilities and availability, or high latency for connecting to it. To cope with these challenges, we argue that it is advantageous to cache data that is relevant for future information needs. However, instead of only retaining results of previously issued queries, we aim at retrieving data that is potentially interesting for subsequent requests in advance. To this end, we present different methods to modify the structure of a query so that the altered query can be used to retrieve additional related information. We evaluate these approaches by applying them to requests found in real-world Sparql query logs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Linked Data sources offer a wealth of information about
a multitude of topics, including geo-location facts1,
government data2, or cross-domain information3. The SPARQL
Protocol and RDF Query Language (Sparql) has become
the de facto query language to retrieve this information from
publicly available endpoints.</p>
      <p>However, whereas in principle Sparql facilitates various
information needs by defining complex query constructs,
typical public Sparql endpoint characteristics such as high
latency, limited hardware resources, or unavailability restrict
on-demand utilization of the data. In this work, we propose
a novel approach for discovering and aggregating potentially
relevant data for subsequent user requests based on previous
query patterns by rewriting these preceding queries. To this
end, we introduce some fundamental concepts of Sparql
in Sec. 1.1 and outline the contribution and organization of
this paper in Sec. 1.2.</p>
    </sec>
    <sec id="sec-2">
      <title>SPARQL Preliminaries</title>
      <p>
        One central concept of a Sparql query is that of a triple
pattern T = (s, p, o) ∈ (V ∪ U ) × (V ∪ U ) × (V ∪ U ∪ L)
with V being a set of variables, U being a set of URIs, and
L being a set of literals. A Sparql query contains a number
of graph patterns P1, P2, . . . , which are defined recursively:
(i) A valid triple pattern T is a graph pattern. (ii) If P1 and
P2 are graph patterns, then P1 AND P2, P1 UNION P2, and P1
OPTIONAL P2 are graph patterns [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>1http://linkedgeodata.org 2http://data-gov.tw.rpi.edu/wiki 3http://dbpedia.org</title>
        <p>
          In terms of relational operations, AND represents an inner
join of two graph patterns, UNION denotes their union, and
OPTIONAL indicates a left outer join between P1 and P2. In
addition, AND takes precedence over UNION, and OPTIONAL is
always left-associative [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. While UNION and OPTIONAL are
reserved keywords in actual Sparql queries to indicate the
corresponding connection between two graph patterns, the
AND keyword is omitted.
        </p>
        <p>
          We call P the parent graph pattern of graph patterns
P1, . . . , Pn, when it has the form P = P1 ⊕ . . . ⊕ Pn, where ⊕
is any one of the introduced keywords. Whereas ⊕ is
symmetric for AND and UNION, it is not for OPTIONAL [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. In any
Sparql query Q, the graph pattern P with no non-trivial
parent graph pattern is referred to as the query pattern PQ.
        </p>
        <p>Typically, curly braces are used to delimit graph patterns:
{P }. Here, if such a delimited graph pattern P represents
a conjunction of multiple triple patterns Ti, i.e., its form is
P = T1 AND T2 . . . AND Tn, we call P a basic graph pattern
(BGP). While there is the notion of empty graph patterns
in Sparql, we only consider non-empty graph patterns.</p>
        <p>In our work, we focus on SELECT queries. In general,
Sparql allows to limit the projection to certain variables
listed after the SELECT statement. Moreover, graph patterns
may contain so-called filter conditions indicated by the
keyword FILTER. In a filter condition, specific restrictions, such
as regular expressions or inequality relations, can be placed
on resources to modify the scope of the selection of a query.
1.2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Contribution and Paper Organization</title>
      <p>As with traditional Web search, Sparql is suitable for
different information needs, such as data exploration or
navigational querying. However, in contrast to simple
keywordbased queries, the well-defined structure and constructs of
Sparql queries allow for more fine-grained expressions
representing the user intent. We build on this notion and
present methods to modify queries in order to retrieve
information relevant for subsequent related requests. Further, we
discuss features of queries and query sequences influencing
the suitability and effects of these modification methods.</p>
      <p>By materializing results potentially relevant for
subsequent queries locally, overall query execution time can be
tremendously decreased for a client. Moreover, if the Sparql
endpoint becomes unavailable for a period of time, thus
disallowing access to the remote information, a replicated
subset of the data may be utilized instead. On the other hand,
a Sparql endpoint may also utilize knowledge about what
data may be relevant for future queries, e.g., by storing it in
main memory. The main trade-offs for this caching approach
are increased storage requirements and potential data
staleness which may be negligible given the concrete scenario.</p>
      <p>The remainder of this paper is structured as follows: In
Sec. 2, we discuss related work for this paper and point out
differences in our approach. In Sec. 3, we introduce a means
to group Sparql queries by matching similarly structured
requests. We then present our different query rewriting
approaches in Sec. 4 and present an evaluation using real-world
query logs in Sec. 5. Finally, we conclude this paper in Sec. 6
and highlight some future work in this context.</p>
    </sec>
    <sec id="sec-4">
      <title>RELATED WORK</title>
      <p>Related work for this paper draws mainly from two fields:
(i) semantic caching and prefetching, e.g., techniques to
retain previously fetched data or retrieve data relevant for
subsequent queries, and (ii) query relaxation, e.g., parsing
and modifying a user query to discover relevant resources.
2.1</p>
    </sec>
    <sec id="sec-5">
      <title>Semantic Caching and Prefetching</title>
      <p>
        Semantic caching builds on the idea of maintaining a
local copy of retrieved data that can be used for subsequent
queries. As with traditional caching, one of the major
motivations for semantic caching is to reduce the transmission
overhead when retrieving data over a network link.
Conventional approaches, such as tuple or page caching, usually
retain fetched data based on either temporal or frequency
aspects, e.g., by prioritizing least-recently or most-frequently
used items. Such techniques also exist for Sparql query
result caching [
        <xref ref-type="bibr" rid="ref12 ref15">12, 15</xref>
        ]. Compared to these works, semantic
caching employs more fine-grained information to
characterize data, e.g., in order to establish variable-sized semantic
regions containing related tuples [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or detect data items
with similar geolocation information [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Closely related to semantic caching and our work is
prefetching. Instead of simply retaining tuples retrieved
previously, prefetching allows to gather data that is potentially
useful for subsequent queries based on semantic
information derived from past queries or the overall system state.
In computer architecture design, prefetching is usually
employed to request instructions that are anticipated to be
executed in the future and place them in the CPU cache. For
information retrieval, query prefetching typically assumes
a probabilistic model, e.g., considering temporal or spatial
features [
        <xref ref-type="bibr" rid="ref6 ref9">6, 9</xref>
        ]. However, there have been no attempts to
prefetch RDF data based on the structure of sequential
related Sparql queries within and across query sessions.
2.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>Query Relaxation</title>
      <p>
        Query relaxation aims at discovering interesting related
information based on a user request. For keyword queries,
this process is sometimes referred to as query expansion and
has been a major research topic in the field of information
retrieval (cf. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). In contrast to query refinement which aims
at increasing precision by restricting the scope of a query,
the goal of query relaxation or query expansion is to
improve the recall in retrieval effectiveness. To this end, often
precomputed metadata such as language models is utilized.
      </p>
      <p>
        There exist a number of projects for implementing query
relaxation for retrieving Linked Data. The authors of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
suggest logical relaxations based on ontological metadata.
In contrast, the approach in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] relies on precomputed
similarity tables for attribute values whereas in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] the authors
utilize a language model derived from the knowledge base.
      </p>
      <p>In comparison, our rewriting strategies are not targeted at
increasing recall when executing a single query, but instead
aim to retrieve additional data related to future queries.
Moreover, we do not assume any knowledge of the data
source itself or of metadata describing it. Thus, while most
previous approaches require at least some precomputation,
our approach can be used ad-hoc solely by analyzing and
modifying queries issued during run-time.
3.</p>
    </sec>
    <sec id="sec-7">
      <title>QUERY MATCHING</title>
      <p>For different aspects of our work, we need to identify and
cluster similarly-structured queries. To this end, we
introduce our bottom-up query pattern matching approach based
on matching similar triple patterns they contain.
3.1</p>
    </sec>
    <sec id="sec-8">
      <title>Triple Pattern Distance</title>
      <p>To match triple patterns, we determine their distance by
accumulating the distance scores between their parts, i.e.,
between the two subjects, predicates, and objects,
respectively. Here, if two triple pattern parts are both variables,
their distance is 0. In case they are both URIs or both
literals, their distance is the normalized Levenshtein distance of
the respective strings. Otherwise, e.g., when one subject is
a variable and the other a URI, the distance is 1. More
formally, if x1, x2 are either the subjects, predicates, or objects
of two triple patterns T1, T2, respectively, we define their
symmetric distance score Δ(x1, x2) as
Δ(x1, x2) :=

1,
0, if x1 ∈ V ∧ x2 ∈ V

 levenshtein(x1, x2) , if (x1 ∈ U ∧ x2 ∈ U )
max(|x1| , |x2|) ∨ (x1 ∈ L ∧ x2 ∈ L)
else.</p>
      <p>We determine the overall distance Δ(T1, T2) = Δ(T2, T1)
of two triple patterns T1, T2 by aggregating the
individual triple pattern parts distance scores Δ(s1, s2), Δ(p1, p2),
Δ(o1, o2) as follows: In case Δ(s1, s2) = Δ(p1, p2) =
Δ(o1, o2) = 0, we define Δ(T1, T2) := 0. Otherwise, there
exists a minimum triple pattern part distance score minΔ :=
min(Δ(s1, s2), Δ(p1, p2), Δ(o1, o2))) with minΔ &gt; 0. In this
case, the triple pattern distance score is defined as
Δ(T1, T2) := dΔ(s1, s2)e+dΔ(p1, p2)e+dΔ(o1, o2)e−(1−minΔ)</p>
      <p>This way, a distance Δ(T1, T2) ≤ 1 always indicates a
dissimilarity in at most one triple pattern part, whereas for
two non-equal triple pattern parts 1 &lt; Δ(T1, T2) ≤ 2, and
a distance score Δ(T1, T2) &gt; 2 hints at differences between
the two subjects, predicates, and objects.</p>
      <p>Consider the two basic graph patterns BGP1 and BGP2 in
Listing 1 and Listing 2, respectively, where the line numbers
serve as identifiers for the included triple patterns. Here,
the most similar triple pattern for T1 in BGP2 can be
determined by computing min(Δ(T1, T4), Δ(T1, T5), Δ(T1, T6)),
which results in Δ(T1, T5) = (d0e + d0e + d 1126 e − 146 ) = 0.75.
For T2, the minimum value is Δ(T2, T6) = (5d1e + d50e + d0e −
0) = 1, and for T3 it is Δ(T3, T4) = (d0e + d 14 e + d 9 e − 194 ) ≈
1.36. Thus, the most similar triple patterns for T1, T2, T3
in BGP2 are T5, T6, and T4, respectively.
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>Basic Graph Pattern Matching</title>
      <p>Using the triple pattern distance scores, we can now
determine matchings between basic graph patterns. In our</p>
      <sec id="sec-9-1">
        <title>1 ?city1 rdfs:label "Paris"@fr . 2 ?person ?relationWith ?city1 . 3 :Auguste_Comte foaf:givenName "Auguste" .</title>
        <p>Listing 1: Basic Graph Pattern Example BGP1.</p>
      </sec>
      <sec id="sec-9-2">
        <title>4 :Auguste_Comte foaf:surname "Comte" . 5 ?city2 rdfs:label "Montpellier"@en . 6 :Auguste_Comte ?association ?city2 .</title>
        <p>Listing 2: Basic Graph Pattern Example BGP2.
work, finding this matching is equivalent to deriving a
perfect (complete) bipartite cover with minimum cost between
the triple patterns of the two individual basic graph patterns
where the cost is determined by the triple pattern distance
Δ(Ti, Tj). Obviously, a perfect matching of triple patterns
is only possible for a complete bipartite graph, i.e., for two
BGPs with the same number of triple patterns. If this is not
the case, in order to generate a biclique of triple patterns we
pad the basic graph pattern containing less elements using
dummy triple patterns Tε so that for any triple pattern T
the score Δ(T, Tε) = Δ(Tε, T ) = ∞.</p>
        <p>
          As optimal solutions for generating maximal matchings
can be determined in polynomial time and the input size
(i.e., the number of triple patterns in the two BGPs)
is reasonably small, we choose the well-known Hungarian
Method [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] to create an assignment with minimum cost.
Furthermore, we assign a maximum cost threshold to all
derived matchings of triple patterns. Here, we only
consider triple pattern matchings with cost Δ(Ti, Tj) ≤ 1, i.e.,
all matched triple patterns are either identical or differ in
at most one of non-variable subject, predicate, or object,
respectively. The cost for triple pattern matchings with higher
cost, i.e., matchings (Ti, Tj) with Δ(Ti, Tj) &gt; 1 is set to ∞.
        </p>
        <p>The score Δ(BGP1, BGP2) of derived complete matchings
MT ⊂ {(Ti, Tj)|(Ti, Tj) ∈ BGP1 × BGP2} is defined as:
Δ(BGP1, BGP2) :=</p>
        <p>P
(Ti,Tj)∈MT Δ(Ti, Tj)
|MT |</p>
        <p>If the result of the Hungarian method for MT contains
any individual triple pattern matching with infinite cost,
Δ(BGP1, BGP2) is also infinite. In this case, no complete
matching with only finite triple pattern distance scores can
be determined between BGP1 and BGP2. Assuming such
a complete matching containing only valid triple pattern
matchings exists, this BGP matching would have been the
result of the algorithm as its cost would be &lt; ∞.</p>
        <p>Conversely, if the algorithm determines an optimal
matching with infinite cost, any other matching with cost &lt; ∞
cannot be complete as the algorithm does not terminate
before discovering a maximal matching. Given our setting of a
complete bipartite graph (or biclique), any maximal
matching is always bound to be a complete matching.</p>
        <p>Applying this approach on the basic graph patterns
illustrated in Listing 1 and Listing 2 to determine a complete
matching with minimum cost yields the same triple
pattern matchings listed earlier. Thus, the optimal matching
{(T1, T5), (T2, T6), (T3, T4)} has cost 0.75+1+∞ , i.e., BGP1
3
and BGP2 cannot be matched to another.
3.3</p>
        <p>Real-world Sparql query patterns can be more complex
than simple basic graph patterns, i.e., they may contain
multiple recursively layered BGPs connected using the UNION or
OPTIONAL keyword as introduced in Sec. 1.1. Along the lines
of the previous subsection, where we defined BGP matching
using the contained triple patterns, our matching approach
for general query patterns is based on recursively matching
the (basic) graph patterns they consist of.</p>
        <p>Due to the recursive structure of Sparql queries Q1, Q2,
we can only try to match two basic graph patterns
BGPi, BGPj to one another if these BGPs reside at the
same recursion level of the query patterns PQ1 , PQ2 ,
respectively. Additionally, the two basic graph patterns and any
of their parent graph patterns also need to be connected by
the same keyword to other (basic) graph patterns at their
respective recursion level. If these two conditions are met,
we say that BGPi, BGPj can be aligned to each other. More
generally, if all (basic) graph patterns contained in a parent
graph pattern P1 can be aligned to at least one (basic) graph
pattern of another parent graph pattern P2 and vice versa,
P1, P2 can also be aligned to each other.</p>
        <p>Consider the three Sparql parent graph patterns
P1 := BGP1 OPTIONAL (BGP2 UNION BGP3)
P2 := BGP4 OPTIONAL (BGP5 UNION BGP6),</p>
        <p>P3 := BGP7 OPTIONAL BGP8,
where BGP1, . . . , BGP8 are basic graph patterns. Here,
BGP1 can be aligned to BGP4 or BGP7. Additionally
BGP2 or BGP3 can be aligned to either BGP5 and BGP6,
respectively. However, BGP8 cannot be aligned to any other
basic graph pattern in P1 or P2, because its recursion level
is different from those of BGP2, BGP3, BGP5, BGP6 and
OPTIONAL is not symmetric, i.e., BGP7 OPTIONAL BGP8 6=
BGP8 OPTIONAL BGP7. Thus, while P1 and P2 can be
aligned to each other, P3 cannot be aligned to either of them.</p>
        <p>We use a bottom-up approach to try to match two query
patterns PQ1 , PQ2 to one another. Similarly to the matching
method introduced in the previous subsection, we try to
derive a complete matching with minimal (finite) cost between
all basic graph patterns of PQ1 and PQ2 that can be aligned
to each other. Here, the cost between two basic graph
patterns is determined by their distance score Δ(BGP1, BGP2).</p>
        <p>If for any BGP in PQ1 or PQ2 no alignment is possible
or its matching has infinite cost, PQ1 or PQ2 cannot be
matched. If we derive a complete matching for all basic
graph patterns, we check whether the parent graph patterns
of all pairs of matched BGPs can be aligned to each other.
If this is the case, we continue checking the alignment of the
parent graph patterns until we reach the query pattern.</p>
        <p>In case no alignment is possible, the two query patterns
PQ1 and PQ2 cannot be matched. Conversely, two query
patterns PQ1 and PQ2 can be matched if they can be aligned
to each other and a complete matching with finite cost can
be established between all basic graph patterns BGPi and
BGPj in PQ1 and PQ2 , respectively. Canonically, any graph
pattern can always be matched to itself.</p>
        <p>To group structurally similar queries, we introduce the
notion of query clusters. All queries in a query cluster can
be matched to all other queries within the cluster, i.e., there
exists a pairwise complete matching with finite cost between
all BGPs of any two queries in the same cluster. Note that
query clusters may be overlapping, i.e., a query can be
element of multiple query clusters.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>QUERY AUGMENTATION</title>
      <p>The main motivation of our work lies in retrieving
information for Sparql queries that is also relevant for
subsequent related requests. Beyond basic caching, we argue for
prefetching results: Here, we attempt to modify a query to
retrieve additional data potentially relevant for future
information needs. In this section, we motivate and illustrate
different approaches for modifying queries accordingly.
4.1</p>
    </sec>
    <sec id="sec-11">
      <title>Augmentation Concepts</title>
      <p>We call the process of modifying the query contents query
augmentation to emphasize that the results retrieved by
issuing the original query are included in the result set for the
modified query. In other words, the matches for the
unmodified query form a subgraph of the matches for the augmented
query. In this work, we are typically interested in retrieving
additional information related to a central concept, namely
the subject (i.e., either a variable or URI) occurring most
often in the query pattern. Moreover, we require a central
concept to be either part of the projection if it is a variable
or to influence the selection if it is a URI. We assume that a
URI influences the selection if at least one triple pattern, in
which the URI is the subject, contains a projection variable.</p>
      <p>Our intuition of query augmentation builds on concepts
from information retrieval. For example, in traditional
keyword-based search engines, a user might be unaware of
the most suitable string pattern to enter to retrieve all
relevant results at once. However, in several iterations the user
may choose to refine the initial query based on retrieved
results. In Linked Data terms, a user might query for more
detailed information about a certain resource or for similar
information of related resources after analyzing preliminary
results, thus incrementally modifying the initial query.</p>
      <p>In the remainder of this section, we introduce the
different augmentation strategies we have implemented while
emphasizing the particular requirements for their
application. We illustrate the effect of applying these strategies
on Query 1. Put simply, for the central concept French
philosopher :Auguste_Comte this query retrieves the birth
place(s) located in France alongside all influences on him
that died in Paris. Table 1 lists all bindings retrieved by
issuing Query 1 against the public DBpedia Sparql
endpoint4 currently containing DBpedia 3.8 data.</p>
      <p>PREFIX : &lt;http://dbpedia.org/resource/&gt;
PREFIX dbo: &lt;http://dbpedia.org/ontology/&gt;
SELECT ?birthPlace ?influence WHERE {
:Auguste_Comte dbo:birthPlace ?birthPlace .
?birthPlace dbo:country :France .
:Auguste_Comte dbo:influencedBy ?influence .
?influence dbo:deathPlace :Paris .
}</p>
      <p>Query 1: Example of a Sparql query
4.2</p>
    </sec>
    <sec id="sec-12">
      <title>Exploratory Augmentation</title>
      <p>In exploratory augmentation, we query for additional facts
that are available for the central concept. The idea of
exploratory augmentation is that based on previous results a</p>
      <sec id="sec-12-1">
        <title>4http://dbpedia.org/sparql</title>
        <p>birthPlace influence
:Montpellier :Jean-Baptiste_Say
:Montpellier :Émile_Durkheim</p>
        <p>Table 1: All results for Query 1 in DBpedia 3.8
user might be interested in more information for a specific
resource. However, this might also be the case if the initial
result set is empty, e.g., because of misspelled or ambiguous
vocabulary terms (such as foaf:img and foaf:Image).</p>
        <p>
          Potentially, there may also exist certain divergences
between ontological information assumed by the user and the
vocabulary used in actual data. For example, we discovered
that although a number of properties are used frequently for
instances of certain types in DBpedia, they are not defined
in the ontology (e.g., dbo:anthem for dbo:Country). On the
other hand, there are also a number of defined properties
that are rarely used in instance data for the corresponding
classes (e.g., dbo:depth for dbo:Place) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>This augmentation is applied by adding a triple pattern
to the query, where the subject is the central concept and
predicate as well as object are unique variables. Moreover,
these two variables are added to the projection, thus
evaluating all facts in which the central concept is subject. We
highlight the corresponding changes in the resulting Query 2
by underlining modified or added sections.</p>
        <p>SELECT ?p ?o ?birthPlace ?influence WHERE {
:Auguste_Comte dbo:birthPlace ?birthPlace .
?birthPlace dbo:country :France .
:Auguste_Comte dbo:influencedBy ?influence .
?influence dbo:deathPlace :Paris .</p>
        <p>:Auguste_Comte ?p ?o .
}</p>
        <p>Query 2: Exploratory augmentation of Query 1
An excerpt of the extended result set for Query 2 is listed
in Tab. 2. Please note that we exclude the bindings listed in
Tab. 1 which are also contained in the result set of Query 2.
p o
rdf:type dbo:Person
dbo:birthDate 1798-01-19
dbo:notableIdea :Positivism
dbo:influenced :Karl_Marx</p>
        <p>...</p>
        <p>Table 2: Some results for Query 2 in DBpedia 3.8
4.3</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Template Augmentation</title>
      <p>For the template augmentation approach, we first need to
identify the cluster the current query belongs to as discussed
in Sec. 3.3 by matching the current query pattern to those
of preceding requests. If we determine a matching with
finite cost, we require this matching to be non-trivial, i.e., to
include at least one match between two non-identical basic
graph patterns. This is true if the overall distance score of
the query pattern matching is greater than 0, i.e., if any
two matched basic graph patterns contain a pair of triple
patterns (Ti, Tj) for which either the subjects, predicates,
or objects differ.</p>
      <p>A matching between two query patterns P1 and P2
meeting these requirements allows us to construct a query
template: Initially, the template query string is identical to the
current query string. We then introduce unique variables
to replace all differing triple pattern parts included in the
matching, i.e., either the non-matching subject, predicate, or
object of triple patterns (Ti, Tj) for which 0 &lt; Δ(Ti, Tj) ≤ 1.
Here, if the same resource in a BGP is replaced more than
once, we use the same unique variable to ensure consistency.</p>
      <p>Instead of issuing many similarly structured queries with
only little variance, e.g., by using a crawler, a query template
instead retrieves all relevant information using only a single
query. A possible query template for Query 1 is illustrated
in Query 3. Here, one specific resource has been replaced by
a variable, which has also been added to the projection of
the query in the SELECT statement. As indicated in Tab. 3,
the result set contains the previous bindings as well as
information about other persons with similar properties, e.g.,
about poet Paul Fort or scientist Pierre de Fermat.</p>
      <sec id="sec-13-1">
        <title>SELECT ?s ?birthPlace ?influence WHERE {</title>
        <p>?s dbo:birthPlace ?birthPlace .
?birthPlace dbo:country :France .
?s dbo:influencedBy ?influence .
?influence dbo:deathPlace :Paris .</p>
        <p>Query 3: Template augmentation of Query 1
}
s
birthPlace
:Montpellier
:Auguste_Comte
:Auguste_Comte
:Paul_Fort
:Pierre_de_Fermat
type augmentation lies in determining a suitable class for
which instance data is retrieved, especially without
assuming any a priori knowledge of the underlying ontology.</p>
        <p>A number of techniques can be employed to gather this
data, e.g., using multiple preliminary queries to construct a
simple type hierarchy or utilizing aggregate functions such
as COUNT to generate heuristics about the distribution of
different types. In our approach, we introduce a FILTER
NOT EXISTS to exclude all those (generic) types that have
(more specific) subclasses. By doing so, we assume that the
endpoint supports Sparql 1.1 expressions and all resources
are instances of at least one leaf node in the type hierarchy.
If this is not the case, we exclude the filter condition.</p>
        <p>Query 3 illustrates the result of applying type
augmentation on the reference query, i.e., by introducing the new
triple patterns regarding rdf:type information, exchanging
the central concept in all other triple patterns, and applying
the filter condition. Whereas the results for template
augmentation also include bindings for ?s that are instances of
rather generic classes such as dbo:Person, the results for
Query 3 listed in Tab. 4 only include resources that are
instances of specific subclasses, e.g., dbo:Philosopher.
SELECT ?s ?birthPlace ?influence WHERE {
:Auguste_Comte rdf:type ?type .
?s rdf:type ?type .
?s dbo:birthPlace ?birthPlace .
?birthPlace dbo:country :France .
?s dbo:influencedBy ?influence .
?influence dbo:deathPlace :Paris .</p>
        <p>FILTER NOT EXISTS {?t1 rdfs:subClassOf ?type}</p>
        <p>Query 4: Type augmentation of Query 1
}
s
:Auguste_Comte</p>
        <p>:Montpellier
:Auguste_Comte
:Jean-Paul_
Sartre
:René_
Descartes
birthPlace
:Montpellier
:Paris
4.4</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>Type Augmentation</title>
      <p>If class membership information in the knowledge base is
available for the central concept, exploiting this
ontological data can help in discovering information for related
resources. In type augmentation we identify the rdf:type of
the central concept and retrieve data for the instances
belonging to the same classes by replacing the central concept
with a unique variable throughout the query.</p>
      <p>The goal of type augmentation is similar to that of
template augmentation, i.e., querying information about
different related resources. Whereas in template augmentation
this relatedness is solely determined by the context of the
replaced triple pattern part, in type augmentation it is derived
by exploiting both structural and ontological information.
Assuming the central concept is identical to the triple
pattern part replaced for matching BGPs, all bindings retrieved
using type augmentation are also retrieved using template
augmentation. However, for type augmentation the
connection between different resources may be stronger than for
template augmentation given their type information.</p>
      <p>According to the RDF Schema5, a resource may be
instance of multiple classes, where these classes may either be
unrelated or reside at different levels of the same type
hierarchy. Therefore, RDF resources may be instances of very
generic types, such as owl:Thing. Hence, one challenge for</p>
      <sec id="sec-14-1">
        <title>5http://www.w3.org/TR/rdf-schema</title>
        <p>influence
:Jean-Baptiste_
Say
:Émile_Durkheim
:Maurice_
Merleau-Ponty
:Marine_</p>
        <p>Mersenne
:Descartes,_
Indre-et-Loire</p>
        <p>...</p>
        <p>Table 4: Some results for Query 4 in DBpedia 3.8
4.5</p>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>Holistic Augmentation</title>
      <p>The intuition of holistic augmentation is that the scope of
Sparql queries can be broadened by removing certain triple
patterns they contain. However, to ensure that the result set
of this modified query still contains all results of the original
one, the removed parts must not contain variables essential
to the projection or selection of a query. In other words, the
variables in the SELECT statement still need to be present in
the modified query so that they may be bound to an RDF
term in a graph matching.</p>
      <p>Typically, if we select a triple pattern to remove from the
basic graph pattern BGPi that contains it, we also remove
the same triple pattern from any other basic graph pattern
BGPj that can be matched to BGPi as described in Sec. 3.3.
We call such a removal valid, if applying it on a valid Sparql
query results in a valid Sparql query, i.e., if all projection
variables are referenced in at least one remaining triple
pattern of the query pattern. Note that there exist queries for
which no valid triple pattern removal is possible, e.g., queries
containing only one triple pattern.</p>
      <p>
        To identify the most suitable triple pattern to remove from
a query, we utilize the variable counting heuristic introduced
in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Essentially, this heuristic is based on the
assumption that unbound subjects are more selective than unbound
objects which in turn are more selective than unbound
predicates. Hence, generally a triple pattern with an unbound
predicate, but bound subject and object matches less RDF
statements in a knowledge base than a triple pattern with
either only an unbound subject or object, i.e., is more
selective. Also, a triple pattern with two or three unbound parts
is less selective than a triple pattern with only one or two
unbound parts, respectively. Thus, the least selective triple
pattern is the one containing only variables.
      </p>
      <p>In any query pattern there can be more than one triple
pattern with maximum selectivity. In this case, we select an
arbitrary one for removal. If this removal is not valid, we
check whether a valid removal can be achieved for a different
triple pattern with same or lesser selectivity. We continue
until we have either exhaustively checked all triple patterns
or discovered a validly removable triple pattern. In the latter
case, we modify the query by deleting the triple pattern from
the parent basic graph pattern and any other basic graph
pattern this BGP can be matched to.</p>
      <p>
        Removing a highly-selective query triple pattern not
essential for the projection mostly assists in situations where
(i) the query is too restrictive, (ii) the query contains invalid
statements, or (iii) the data or ontology in the knowledge
base is inconsistent, e.g., as described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. One of the two
most selective triple pattern in Query 1 is crossed out in
Query 5, thereby indicating its removal in this augmented
query. Table 5 lists some results for Query 5.
}
      </p>
      <sec id="sec-15-1">
        <title>SELECT ?birthPlace ?influence WHERE {</title>
        <p>:Auguste_Comte dbo:birthPlace ?birthPlace .
?birthPlace dbo:country :France .
:Auguste_Comte dbo:influencedBy ?influence .
?influence dbo:deathPlace :Paris .</p>
        <p>Query 5: Holistic Augmentation of Query 1
birthPlace influence
:Montpellier :Adam_Smith
:Montpellier :David_Hume
:Montpellier :Jean-Baptiste_Say
:Montpellier :Émile_Durkheim</p>
        <p>...</p>
        <p>Table 5: Some results for Query 5 in DBpedia 3.8.</p>
      </sec>
    </sec>
    <sec id="sec-16">
      <title>EVALUATION</title>
      <p>
        To evaluate the applicability of our prefetching
strategies for Sparql queries, we analyzed parts of the
DBpedia 3.6 and LinkedGeoData (LGD) query logs provided for
the USEWOD 2013 data challenge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The log files
contain a number of requests received by the respective public
Sparql endpoints and were collected for different dates in
2011. While the specific metadata provided in the Apache
Common Log Format differs slightly for the individual
services, all requests contain the sender’s anonymized IP
address and a timestamp in addition to the actual query.
5.1
      </p>
    </sec>
    <sec id="sec-17">
      <title>Methodology</title>
      <p>We use the timestamp information to provide a
meaningful segmentation for successive queries by introducing query
sessions. A query session is a chronologically ordered
sequence of at least two queries issued sequentially by the
same user (represented by a unique IP address) over a
period of time. We call query sessions homogeneous, if they
contain queries belonging to the same cluster, i.e., any query
in a session can be matched to any other query in the same
session. Otherwise, we call this query session heterogeneous.</p>
      <p>We exploit statistical features and metadata extracted
from the query logs format for delimiting a query session.
Specifically, we restrict the duration of a session by
comparing the timestamps of issued queries with that of the initial
query of a session. Once this difference exceeds a certain
threshold, we assume a new query session has started.
Depending on the analyzed dataset, we put an upper limit on
the number of queries a single session may contain, thereby
splitting a single query sequence into separate sessions if the
number of successive queries is greater than this limit.</p>
      <p>Matchings for triple patterns can easily be transformed
into triples by applying the corresponding variable
bindings. By materializing these triples for the original,
unmodified query and for all queries generated by applying the
augmentation strategies introduced in Sec. 4, we maintain
individual result set caches. We then evaluate how many
triples generated for bindings of subsequent queries are
already contained in the individual caches. If we determine
that a newly generated triple is contained in a cache, we
consider this a cache hit. Note that the set of cached triples
for each augmentation strategy includes at least the cached
triples for the original query.</p>
      <p>We use OpenLink Virtuoso Open-Source Edition version
6.1.3 as Sparql endpoint containing the English
DBpedia 3.6 dataset released on January 17, 2011, and the
LinkedGeoData dump released on April 26, 2011 in separate named
graphs. For each augmentation strategy, we restrict the
number of retrieved results for performance reasons to a
maximum of 100,000 using the LIMIT keyword for Sparql.
5.2</p>
    </sec>
    <sec id="sec-18">
      <title>DBpedia 3.6 Query Logs</title>
      <p>The requests included in the DBpedia 3.6 query logs
exhibit timestamps in hourly resolution, e.g., [24/Jan/2011
01:00:00 +0100]. In our analysis, only successive queries
from the same user with identical timestamps may belong to
the same session. However, as this heuristic introduces some
vagueness regarding the contents of a session, e.g., by
ignoring session timeouts, we also limit the maximum number of
queries belonging to any single session to 25. Notice that by
applying this conservative restriction, we potentially lower
our number of cache hits which will most likely increase for
longer sessions. In summary, once a user query with a
different timestamp is detected or we discover more than 25
successive queries, we assume a new session has started.</p>
      <p>Consequently, we base our evaluation on 288 query
sessions for which we were able to retrieve results for at least
one contained query. Of these query sessions, 176 (61%)
were homogeneous. On average, the query sessions
contained around 21 queries. In around 34% of all query
sessions, we could not identify any cache hits. We assume that
this is because the total result set size for these sessions is
relatively small: The sessions with no cache hits only result
in about 100 generated triples compared to around 3,300
yg105
e
t
a
tr
s
n
itao104
t
n
e
m
g
tsau103
e
b
r
o
f
ish102
t
e
h
c
a
c
f
reo101
b
m
u
n
l
triples for sessions with cache hits. There are two possible
reasons for this: (i) Our local Sparql endpoint did not
contain the entire DBpedia 3.6 corpus and (ii) some queries did
not yield any results even when executed against the public
DBpedia endpoint (e.g., because of syntax errors).</p>
      <p>For all sessions with at least one cache hit, we illustrate
the total number of cache hits in relation to the total
number of generated (unique) triples for all unmodified queries
in a session in Fig. 1. Each marker represents the best
augmentation strategy resulting in the most cache hits for this
session. If for none of the augmentation strategies the
number of hits was greater than the cache hits of the original
query, the marker “no augmentation” is used.</p>
      <p>We also evaluated how caching the result of every
(augmented) query influenced the number of cache hits for
subsequent (unmodified) queries in a session. While the number
of query sessions with no cache hits dropped to around 24%,
for those query sessions with cache hits we observed
comparable results to the ones illustrated. Hence, we assume that
by analyzing the first query, a suitable caching strategy can
be determined for all subsequent queries of the same session.
For example, for homogeneous query sessions applying
template or type augmentation on the first query most likely
results in the same augmented query as applying it on any
of the subsequent session requests.</p>
      <p>In general, due to the large number of homogeneous query
sessions in the DBpedia query logs, type and template
augmentation appear to be the most successful among the
augmentation strategies. Given the large number of resources
in DBpedia and our restriction on the maximum number of
results, we are impairing the success of these strategies to
some extent, as many potentially relevant facts are simply
not retrieved. Without this restriction, these prefetching
strategies should yield even better results.</p>
      <p>On the other hand, the amount of query sessions
benefiting most from holistic or exploratory augmentation is
limited. This might stem from the apparently small number of
queries issued by human users, towards which these
strategies are targeted. Moreover, in more than half the query
sessions (55%) holistic augmentation could not be applied
as no valid triple pattern removal was possible. Naturally,
in these cases the number of cache hits for holistic
augmentation equals the one of not applying any augmentation.
5.3</p>
    </sec>
    <sec id="sec-19">
      <title>LinkedGeoData Query Logs</title>
      <p>As the timestamps provided for the queries in the LGD
logs are precise to the second, we delimit query sessions in
these logs more accurately by introducing a session timeout
and maximum session duration. If for a query from a specific
user we cannot discover another query from the same user
within 10 minutes time, we assume the identified query is
the last one in a session. Overall, we delimit a query session
by restricting its duration to a maximum of 60 minutes, its
session timeout to 10 minutes, and its maximum number of
queries to 50 (whichever comes first).</p>
      <p>As with the DBpedia query logs, we analyzed only those
query sessions in which we determined at least one query
for which we were able to generate a result as described
above, i.e., we based our evaluation on 440 query sessions.
This time, for only 9% of these query sessions no cache hits
could be discovered at all. The analysis of which
augmentation strategy resulted in the most cache hits for the
remaining 424 query sessions is illustrated in Fig. 2.</p>
      <p>For the LGD logs, caching the results of unmodified
queries resulted in the most hits (48% of all query sessions),
followed by exploratory augmentation (26%), template
augmentation (15%), type and holistic augmentation (5% each).
For heterogeneous query sessions (55% of all query sessions),
exploratory augmentation is the best augmentation strategy
(30%), followed by template augmentation (19%), holistic
augmentation (7%), and type augmentation (6%).</p>
      <p>We also discovered that the number of mean cache hits
was much higher than the ones of the DBpedia log files: For
those sessions that benefited from template augmentation
the average number of cache hits was highest with 72,917
and lowest for type augmentation with 48,320. On the other
101 102 103 104
Total number of triples generated for all unmodified queries in a session
105</p>
      <p>Overall, our findings indicate that caching the results of
the first, unaugmented query of a session yields the most
amount of cache hits in about 38% of all analyzed query
sessions. The results for type and template augmentation
have the most cache hits in 28% and 23% of the sessions,
respectively, whereas applying exploratory and holistic
augmentation on the first session query only results in the most
cache hits for 7% and 4% of all sessions.</p>
      <p>When considering homogeneous query sessions only, type
augmentation yields the most cache hits in 39% of all
sessions, followed by no augmentation (30%), template
augmentation (28%), exploratory and holistic augmentation
(1% each). Notice that we discovered a number of
homogeneous query sessions which contain the exact same query
multiple times. For example, this is the case if only one
triple is generated for all queries in a session and identified
as cache hit repeatedly as represented by markers located on
the y-axis. Obviously, if the exact same query is issued over
again within the course of a session, no additional cache hits
can be generated when applying an augmentation strategy.</p>
      <p>The mean number of cache hits is highest for exploratory
augmentation (4,541 cache hits) and lowest for type
augmentation (33 cache hits) when considering only those
sessions where these two augmentation strategies yield the most
cache hits. On average, in all sessions with markers above
the diagonal (indicated by the dashed line in Fig. 1), each
generated triple represents a cache hit at least once.
toaT100100</p>
      <p>101 102 103 104 105
Total number of triples generated for all unmodified queries in a session
106
hand, the average session length was comparatively small
with only 12 queries per session. This could be because the
LGD log queries were actually issued by human users (as
opposed to crawlers or other software agents). Intuitively,
this would also explain why the majority of query sessions
are heterogeneous: Whereas software agents use somewhat
hard-coded HTTP requests to retrieve Linked Data, human
users are more flexible when issuing queries, e.g., by using
the LGD Sparql web interface6.</p>
      <p>Again, caching the result of every query in a session had
only little impact on the choice of augmentation strategy,
which is to be expected considering the small mean number
of queries in a session. However, the percentage of query
sessions not benefiting from any caching decreased slightly
to 6%. In general, the cached results for the queries can
almost always be used for subsequent queries in a session
for the LGD logs. Again, our local Sparql endpoint might
have not contained all data available in the public Sparql
endpoint. However, the impact on our results is negligible as
we were able to generate query results for the vast majority
of sessions (75%) and cache hits in almost all of these (91%).</p>
    </sec>
    <sec id="sec-20">
      <title>6. CONCLUSION AND OUTLOOK</title>
      <p>In this paper, we have presented a number of approaches
to modify Sparql queries with the goal to retrieve
additional results that are potentially relevant for subsequent
requests of the same query session. We evaluated how
often and how many of these additional results are actually
relevant by identifying query sessions that can exploit this
prefetched data in subsequent requests compared to only
caching the results of the unmodified query.</p>
      <p>While the majority of all analyzed query sessions
benefited from caching the results of the unmodified or
augmented first query (around 66% for DBpedia and 91% for
LinkedGeoData), the most suitable augmentation strategy
(if any) differs from session to session. For example,
largescale homogeneous query sessions can exploit a cache
containing similar data of related resources, e.g., as generated
by applying template and type augmentation.</p>
      <p>On the other hand, human users might retrieve more
“diverse” information, e.g., specific facts about a resource as
generated by applying exploratory augmentation. Overall,
whereas not all augmentation strategies can be applied on
certain queries (e.g., holistic augmentation for queries
containing only one triple pattern), combining different
strategies during the course of a query session might still prove
advantageous for such diversified scenarios.</p>
      <p>In future work, we aim at capturing the intent in a query
session more accurately, e.g., by using more fine-grained
approaches than that of a central concept. Based on this, more
customized augmentation strategies are conceivable, e.g., by
analyzing individual resources contained in a query. Finally,
we want to be able to discover on the most suitable
augmentation strategy based on the contents of a query session and
thus adjust other query sessions accordingly.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Abedjan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lorey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Naumann</surname>
          </string-name>
          .
          <article-title>Reconciling ontologies and the web of data</article-title>
          .
          <source>In Proceedings of the International Conference on Information and Knowledge Management (CIKM)</source>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>1536</lpage>
          , Maui,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA,
          <year>October 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Berendt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Luczak-Rösch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Möller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Vallet</surname>
          </string-name>
          .
          <article-title>USEWOD2013 - 3rd international workshop on usage analysis and the web of data</article-title>
          .
          <source>In Proceedings of the Extended Semantic Web Conference (ESWC)</source>
          , Montpellier, France,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Carpineto</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Romano</surname>
          </string-name>
          .
          <article-title>A survey of automatic query expansion in information retrieval</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>44</volume>
          (
          <issue>1</issue>
          ):1:
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          :
          <fpage>50</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Jónsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Semantic data caching and replacement</article-title>
          .
          <source>In Proceedings of the International Conference on Very Large Databases (VLDB)</source>
          , pages
          <fpage>330</fpage>
          -
          <lpage>341</lpage>
          , Bombay, India,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Elbassuoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ramanath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Query relaxation for entity-relationship search</article-title>
          .
          <source>In Proceedings of the Extended Semantic Web Conference (ESWC)</source>
          , pages
          <fpage>62</fpage>
          -
          <lpage>76</lpage>
          , Heraklion, Greece,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Fagni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Perego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Orlando</surname>
          </string-name>
          .
          <article-title>Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          ,
          <volume>24</volume>
          (
          <issue>1</issue>
          ):
          <fpage>51</fpage>
          -
          <lpage>78</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mellotte</surname>
          </string-name>
          , G. Powell, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Stampouli</surname>
          </string-name>
          .
          <article-title>Towards fuzzy query-relaxation for RDF</article-title>
          .
          <source>In Proceedings of the Extended Semantic Web Conference (ESWC)</source>
          , pages
          <fpage>687</fpage>
          -
          <lpage>702</lpage>
          , Heraklion, Greece,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Hurtado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Poulovassilis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. T.</given-names>
            <surname>Wood</surname>
          </string-name>
          .
          <article-title>Journal on data semantics X. chapter Query relaxation in RDF</article-title>
          , pages
          <fpage>31</fpage>
          -
          <lpage>61</lpage>
          . Springer-Verlag, Berlin, Heidelberg,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jonassen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Cambazoglu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          .
          <article-title>Prefetching query results and its impact on search engines</article-title>
          .
          <source>In Proceedings of the ACM International Conference on Information Retrieval (SIGIR)</source>
          , pages
          <fpage>631</fpage>
          -
          <lpage>640</lpage>
          , Portland,
          <string-name>
            <surname>OR</surname>
          </string-name>
          , USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Jorge</surname>
          </string-name>
          <string-name>
            <surname>Pérez</surname>
          </string-name>
          ,
          <source>Marcelo Arenas. Semantics and complexity of SPARQL. ACM Transactions on Database Systems (TODS)</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <volume>16</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          :
          <fpage>45</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          .
          <article-title>The hungarian method for the assignment problem</article-title>
          .
          <source>Naval Research Logist. Quarterly</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          -2):
          <fpage>83</fpage>
          -
          <lpage>97</lpage>
          ,
          <year>1955</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Unbehauen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          .
          <article-title>Improving the performance of semantic web applications with SPARQL query caching</article-title>
          .
          <source>In Proceedings of the Extended Semantic Web Conference (ESWC)</source>
          , pages
          <fpage>304</fpage>
          -
          <lpage>318</lpage>
          , Heraklion, Greece,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ren</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Dunham</surname>
          </string-name>
          .
          <article-title>Using semantic caching to manage location dependent data in mobile computing</article-title>
          .
          <source>In Proceedings of the International Conference on Mobile Computing and Networking</source>
          , pages
          <fpage>210</fpage>
          -
          <lpage>221</lpage>
          , Boston, MA, United States,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiefer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          .
          <article-title>SPARQL basic graph pattern optimization using selectivity estimation</article-title>
          .
          <source>In Proceedings of the International World Wide Web Conference (WWW)</source>
          , pages
          <fpage>595</fpage>
          -
          <lpage>604</lpage>
          , New York, NY, USA, Apr.
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Wu.</surname>
          </string-name>
          <article-title>Caching intermediate result of SPARQL queries</article-title>
          .
          <source>In Proceedings of the International World Wide Web Conference (WWW)</source>
          , pages
          <fpage>159</fpage>
          -
          <lpage>160</lpage>
          , Hyderabad, India,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>