<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Role of the GRAPH Clause in the Performance of Federated SPARQL Queries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David Chaves-Fraga</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Gutierrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Corcho</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department &amp; CIWS, Universidad de Chile</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ontology Engineering Group, Universidad Politenica de Madrid</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Federated SPARQL queries give uni ed answers from multiple and distributed SPARQL endpoints. A good example may be the collection of stops from di erent transport companies in the same city to create a route planning application. The performance of the evaluation of these types of queries is usually poor, a fact that makes di cult their use in real-life applications that need good performance requirements. In this paper we present a preliminary analysis on the improvement that can be achieved by using the GRAPH clause in federated SPARQL queries. The main goal is to reduce the search space of such queries by setting the NAMED GRAPH to the graph pattern where the corresponding patterns should be evaluated. We perform a preliminary comparison between a federated query and a rewriting that uses systematically the GRAPH clause. These experiments show that the inclusion of the GRAPH clause may only improve performance of query evaluation between 5% and 10%. These early ndings suggest that, although the GRAPH clause may indeed play a role in speeding up federated SPARQL queries, hurdles are yet to be overcome when using the GRAPH clause as named graphs are semantically ambiguous.</p>
      </abstract>
      <kwd-group>
        <kwd>SPARQL</kwd>
        <kwd>federated queries</kwd>
        <kwd>graph</kwd>
        <kwd>performance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In recent years several public SPARQL endpoints have been made available on
the Web containing billions of RDF triples that can be queried [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A way to query
multiple SPARQL endpoints with a single query is by building a federated query.
The de nition of federated query is speci ed in the SPARQL 1.1 recommendation
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and its syntax, semantics and evaluation are described on [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>While these queries are useful for this purpose, their main disadvantage is
their low performance, hence their use on real applications is limited. Other
approaches are being currently applied, like getting the data from APIs, doing a
manual data merge or joining data at the client side. Even so, federated queries
are still an interesting approach to query RDF data sources exposed via SPARQL
endpoints. Finding ways to improve their performance is one of the current
research topics in the eld.</p>
      <p>A SPARQL endpoint is conceptualized as shown in Figure 1: one
defaultgraph and from 0 to N NAMED GRAPHS. Each of these NAMED GRAPHS
contains a set of triples and has a unique identi er (URI). In a SPARQL query
it is possible to indicate the graph where a particular part of a query (e.g. a
pattern) will be evaluated completely or partially, by using the GRAPH clause
with the graph URI. We hypothesize that using this feature in a systematic form
may reduce the search space when evaluating federated SPARQL queries. The
goal of this paper is to add evidence to this claim.</p>
      <p>The paper is organized as follows: Section 2 presents some of the related
work on the conceptualization of SPARQL endpoints and federated queries.
Section 3 describes our proposal about systematically using the GRAPH clause in
federated queries. Section 4 shows a preliminary evaluation and presents some
conclusions and future areas of work to be explored.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>In this section we describe related work on two topics. On the one hand, the
SPARQL endpoint conceptualizations; on the other hand, several optimizations
that have been proposed in the area of federated SPARQL queries.</p>
      <p>
        The SPARQL endpoint conceptualization is not an easy task. Hernandez
and Gutierrez [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] deal with current problems of the notion of the dataset in
SPARQL that is de ned as a set fG0; (u1; G1); ::; (un; Gn)g where G0 is called
default graph and each Gi is a RDF graph with an associated URI (ui). They
try to identify the possibilities and the constraints of a dataset in a (federated)
SPARQL query. There are further analysis on this topic in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], in the speci cation
of SPARQL [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and in the documentation of federated SPARQL [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        There are several works in the literature on federated SPARQL queries that
try to optimize the performance of query evaluation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], so as to make them
applicable to real cases. The current approach in federated SPARQL queries
deals essentially with the problem of source selection [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ][
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. A di erent
approach is detailed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], where the authors include the GRAPH clause in the
queries to support graph-level access during the source selection step. By using
the SERVICE clause in the queries the problem of source selection is
automatically solved, but other interesting problems show up, like the availability
of SPARQL endpoints in federated queries [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or how to query heterogeneous
resources following W3C standards like R2RML [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Using GRAPH clauses in federated SPARQL queries
In this section we propose an approach to improve federated query evaluation by
systematically using the GRAPH clause to rewrite a query into a semantically
equivalent one.</p>
      <p>
        When the GRAPH clause is used in a SPARQL query, only triples saved
into that GRAPH clause will occur in the resulting SPARQL bindings. We want
to compare the performance between a federated SPARQL and a semantically
equivalent query that makes use of the GRAPH clause to direct the evaluation.
The de nition of what is a graph pattern and its semantics and syntax can be
found in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Clearly the main challenge here is to identify the graph patterns
that will only retrieve data from a unique GRAPH in the SPARQL endpoint.
In this preliminary report we will assume this hypothesis, because we would
like to show that it is actually possible to rewrite queries using this idea and
this method add e ciency in the evaluation. We are aware that the possibility to
identify this pattern is linked to information about sources. Once all the patterns
are identi ed, one can rewrite the query in order to direct precisely the pattern
evaluations to the relevant sources. The rationale behind this approach is that
our experience shows that in a query usually there are many graph patterns that
bind results from only one of the graphs in the SPARQL endpoint.
      </p>
      <p>In Figure 2 we show graphically what does this approach mean for
examples using the DBpedia SPARQL endpoint3. When evaluating a query with a
graph pattern (?s &lt;dbo:person&gt; ?o), the data returned will belong to the graph
&lt;http://dbpedia.org&gt;. Similarly, a pattern of the form (?s &lt;wikidata:P646c&gt;
?o) retrieves data only from graph &lt;http://wikidata.org&gt; in the same SPARQL
endpoint. Thus would be interesting to be able to explicit this information in the
query to be issued, so that the server can have more information and optimize
the retrieval.</p>
      <p>An example of how this transformation from a standard query into on which
uses the GRAPH clause with this purpose is shown in the queries 1.1 and 1.2
below. The expectation is that the search space of a federated SPARQL query
will be reduced and hence performance of query evaluation will improve.
3 https://dbpedia.org/sparql</p>
      <p>Listing 1.1. Federated query
SELECT WHERE f</p>
      <p>SERVICE &lt;S1&gt; f GP1 g
SERVICE &lt;S2&gt; f GP2 g</p>
      <p>SERVICE &lt;S3&gt; f GP3 g
g
Listing 1.2. Federated query with the
GRAPH clause
SELECT
FROM NAMED &lt;S1 G2&gt;
FROM NAMED &lt;S2 G3&gt;
FROM NAMED &lt;S3 G1&gt;
WHERE f</p>
      <p>SERVICE &lt;S1&gt; f</p>
      <p>GRAPH &lt;S1 G2&gt; fGP1g g
SERVICE &lt;S2&gt; f</p>
      <p>GRAPH &lt;S2 G3&gt; fGP2g g
SERVICE &lt;S2&gt; f</p>
      <p>GRAPH &lt;S3 G1&gt; fGP3g g
g
4</p>
    </sec>
    <sec id="sec-3">
      <title>Preliminary evaluation</title>
      <p>The results obtained after the experimentation re ect that a federated query
that includes the GRAPH clause does not improve relevantly the performance
of a federated query, as one could hypothesize. We only got an improvement
in the queries evaluation time between 5% and 10%. The reason of this is that
the conceptual information that a SPARQL endpoint provides about its graphs
is not the same as the physical storage organization. If a query over a public
SPARQL endpoint is made to list all the graphs of that SPARQL endpoint
4 https://dbpedia.org/sparql
5 https://kb.3cixty.com/sparql
6 http://data.linkedmdb.org/sparql
7 https://github.com/dachafra/federated-graph-queries
and their related triples (an example of DBpedia8 is provided) conceptually the
information that the user receives is that this number of triples are stored into
those graphs. The reality, however, is that usually all the triples are stored in
the same place and thus it does not make relevant di erence the approach we
are proposing. As re ected on Virtuoso documentation9, it currently puts the
triples in a single table with the graph URI as a key part although they are
already working on an improvement to store triples in their graphs. Testing our
approach with this future improvement of Virtuoso (or other triple store that
includes this feature) could prove that including the GRAPH clause in federated
SPARQL queries improves their performance.
4.1</p>
      <p>Conclusion: The role of the GRAPH clause in SPARQL
After this short experiment one could ask: What is the role of the GRAPH clause
in SPARQL? We know that it is useful from a semantic point of view, that is,
when a graph pattern matches triples that are saved in multiple graphs and
the user only wants the triples from one of them. But we have identi ed issues
that are not evident when using a GRAPH clause: (i) semantic ambiguity; (ii)
o ers a view of how the data is conceptually ordered but not physically; (iii) is
a SPARQL clause that seems to not allow any type of optimization.</p>
      <p>One of the important problems related with the graphs of a SPARQL
endpoint is the correctness of their characterization. When an open data portal or a
controlled vocabulary is developed, the format of the URIs is highly structured.
Also, only by analysing the URI itself, one usually can get some interesting
information about what type of data they are storing10;11. If this feature were taken
into account at the moment of building the graphs of a SPARQL endpoint, it
might be useful for the users to have an idea of what type of triples are loaded
in each graph. In summary, the role of the GRAPH clause needs to be clari ed
in order to be really useful as a construct of the language.</p>
      <p>
        The problems presented generate di erent lines for future research, such as
creating a federated benchmark that includes the SERVICE and GRAPH clauses
and do the evaluation in a controlled environment, applying our approach to
other triple stores or carrying out semantic equivalence between open data portal
and a SPARQL endpoint. It would also be very useful to develop a way for
automatically linking unique graph patterns with their pertained graphs in a
SPARQL endpoint using an index or other current solutions like Loupe[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This
problem is re ected in some inconsistencies between the speci cation and
reallife implementations. For example, although the syntax of federated SPARQL
1.1 does not allow to include GRAPH clauses in a federated query, Virtuoso and
Apache Jena can execute the approach presented in this note.
8 https://github.com/dachafra/federated-graph-queries/blob/master/DBpediaGRAPHS.csv
9 https://virtuoso.openlinksw.com/rdf/
10 http://ec.europa.eu/transport/facts-fundings/statistics/pocketbook-2013 en
11 http://ec.europa.eu/eurostat/web/products-datasets/-/rail go quartal
Acknowledgements. The research leading to this paper has been partially done
in the context of the FP7 Marie Curie IRSES SemData project
(http://www.semdataproject.eu/), grant agreement No PIRSES-GA-2013-612551, and it is also
supported by the 4V Spanish national project (TIN2013-46238-C4-2-R) .
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Maribel</given-names>
            <surname>Acosta</surname>
          </string-name>
          ,
          <string-name>
            <surname>Maria-Esther</surname>
            <given-names>Vidal</given-names>
          </string-name>
          , Tomas Lampo, Julio Castillo, and
          <string-name>
            <given-names>Edna</given-names>
            <surname>Ruckhaus</surname>
          </string-name>
          .
          <article-title>Anapsid: an adaptive query processing engine for SPARQL endpoints</article-title>
          .
          <source>The Semantic Web{ISWC</source>
          <year>2011</year>
          , pages
          <fpage>18</fpage>
          {
          <fpage>34</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Renzo</given-names>
            <surname>Angles</surname>
          </string-name>
          and
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>SQL nested queries in SPARQL</article-title>
          . In AMW,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Buil-Aranda</surname>
          </string-name>
          , Marcelo Arenas, Oscar Corcho, and
          <string-name>
            <given-names>Axel</given-names>
            <surname>Polleres</surname>
          </string-name>
          .
          <article-title>Federating queries in SPARQL 1.1: Syntax, semantics and evaluation</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>17</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Buil-Aranda</surname>
          </string-name>
          , Aidan Hogan, Jurgen Umbrich, and
          <string-name>
            <surname>Pierre-Yves Vandenbussche</surname>
          </string-name>
          .
          <article-title>SPARQL web-querying infrastructure: Ready for action</article-title>
          ? In International Semantic Web Conference, pages
          <volume>277</volume>
          {
          <fpage>293</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Steve</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <source>Andy Seaborne, and Eric Prudhommeaux. SPARQL 1</source>
          .
          <article-title>1 query language</article-title>
          .
          <source>W3C recommendation</source>
          ,
          <volume>21</volume>
          (
          <issue>10</issue>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Hernandez</surname>
          </string-name>
          and
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>Disentangling the notion of dataset in SPARQL</article-title>
          .
          <source>In Alberto Mendelzon International Workshop on Foundations of Data Management, page 213</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Yasar</given-names>
            <surname>Khan</surname>
          </string-name>
          , Muhammad Saleem, Aftab Iqbal, Muntazir Mehdi, Aidan Hogan,
          <string-name>
            <surname>Axel-Cyrille Ngonga</surname>
            <given-names>Ngomo</given-names>
          </string-name>
          , Stefan Decker, and
          <string-name>
            <given-names>Ratnesh</given-names>
            <surname>Sahay</surname>
          </string-name>
          .
          <article-title>Safe: policy aware SPARQL query federation over RDF data cubes</article-title>
          .
          <source>In Proceedings of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences</source>
          , Berlin, Germany, December 9-
          <issue>11</issue>
          ,
          <year>2014</year>
          .,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Nandana</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Raul</given-names>
            <surname>Garc</surname>
          </string-name>
          a-Castro, and
          <string-name>
            <surname>Miguel</surname>
          </string-name>
          Esteban-Gutierrez.
          <article-title>Linked data platform as a novel approach for enterprise application integration</article-title>
          .
          <source>In Proceedings of the Fourth International Conference on Consuming Linked DataVolume 1034</source>
          , pages
          <fpage>146</fpage>
          {
          <fpage>157</fpage>
          . CEUR-WS. org,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Jorge</given-names>
            <surname>Perez</surname>
          </string-name>
          , Marcelo Arenas, and
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <source>Semantics and complexity of SPARQL. ACM Transactions on Database Systems (TODS)</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <fpage>16</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Freddy</surname>
            <given-names>Priyatna</given-names>
          </string-name>
          , Carlos Buil Aranda, and
          <string-name>
            <given-names>Oscar</given-names>
            <surname>Corcho</surname>
          </string-name>
          .
          <article-title>Applying SPARQL-DQP for federated SPARQL querying over google fusion tables</article-title>
          .
          <source>In Extended Semantic Web Conference</source>
          , pages
          <volume>189</volume>
          {
          <fpage>193</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Eric</surname>
            <given-names>Prudhommeaux</given-names>
          </string-name>
          , Carlos
          <string-name>
            <surname>Buil-Aranda</surname>
          </string-name>
          , et al.
          <source>SPARQL 1</source>
          .
          <article-title>1 federated query</article-title>
          .
          <source>W3C Recommendation</source>
          ,
          <volume>21</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Muhammad</given-names>
            <surname>Saleem</surname>
          </string-name>
          and
          <string-name>
            <surname>Axel-Cyrille Ngonga Ngomo</surname>
          </string-name>
          . Hibiscus:
          <article-title>Hypergraph-based source selection for SPARQL endpoint federation</article-title>
          .
          <source>In European Semantic Web Conference</source>
          , pages
          <volume>176</volume>
          {
          <fpage>191</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Muhammad</surname>
            <given-names>Saleem</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Axel-Cyrille Ngonga</surname>
            <given-names>Ngomo</given-names>
          </string-name>
          , Josiane Xavier Parreira, Helena F Deus,
          <string-name>
            <given-names>and Manfred</given-names>
            <surname>Hauswirth</surname>
          </string-name>
          . Daw:
          <article-title>Duplicate-aware federated query processing over the web of data</article-title>
          .
          <source>In International Semantic Web Conference</source>
          , pages
          <volume>574</volume>
          {
          <fpage>590</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Maria-Esther</surname>
            <given-names>Vidal</given-names>
          </string-name>
          , Simon Castillo, Maribel Acosta, Gabriela Montoya, and
          <string-name>
            <given-names>Guillermo</given-names>
            <surname>Palma</surname>
          </string-name>
          .
          <article-title>On the selection of SPARQL endpoints to e ciently execute federated SPARQL queries</article-title>
          .
          <source>In Transactions on Large-Scale Data-and KnowledgeCentered Systems XXV</source>
          , pages
          <volume>109</volume>
          {
          <fpage>149</fpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>