<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SPARQL with XQuery-based Filtering?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Nagoya University</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Linked Open Data (LOD) has been proliferated over various domains, however, there are still lots of open data in various format other than RDF. Document-centric XML data are such open data that are connected with entities in LOD as supplemental documents for these entities. To utilize document-centric XML data linked from entities in LOD, in this paper, a SPARQL-based seamless access method on RDF and XML data is proposed. In particular, an extension to SPARQL, XQueryFILTER, which enables XQuery as a filter in SPARQL is proposed. For efficient query processing of the combination of SPARQL and XQuery, a query optimization is proposed. Experimental scenarios using real-world data showcase the effectiveness of XQueryFILTER and optimization efficiency.</p>
      </abstract>
      <kwd-group>
        <kwd>SPARQL Extension</kwd>
        <kwd>XQuery Filtering</kwd>
        <kwd>Query Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        There are few works on combining queries for XML (like XPath, XSLT and
XQuery) into SPARQL query except [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Droop et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have proposed a
translationbased XPath embedding for SPARQL in order to enable XPath processing in
? Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
SPARQL query processor. To this end, they have proposed translation model
from XML to RDF and from XPath to SPARQL. There are three major
differences from their work to the proposed filtering in this paper. One is query type
for XML, namely, XQuery and XPath. In general, XQuery is more expressive
than XPath. Another difference is that, in the proposed approach, no
preprocessing is applied to XML data, while [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] requires translation into RDF. The
other difference is that, the proposed approach fully utilizes the dedicated query
processing technique in XML DB, but [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] translates an XPath instance into a
complicated SPARQL query and processes it on a SPARQL query processor.
      </p>
      <p>
        The related context is to query XML and RDF which can be translated each
other. So-called data-centric XML data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] are designed for representing objects
with hierarchical attributes, while document-centric (a.k.a. content-oriented)
XML data preserve document structures where they are still understandable
without XML tags. Therefore, data-centric XML data are easier to convert into
RDF, while document-centric XML data require large efforts on designing
ontologies and mappings. The existing approaches assume data-centric XML data and
RDF data as their translation, which can be roughly classified into the following:
(1) to use XQuery to query on RDF data [
        <xref ref-type="bibr" rid="ref2 ref7">7, 2</xref>
        ], (2) to use SPARQL to query on
XML data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and (3) query translation between XQuery and SPARQL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>SPARQL with XQuery-based Filtering</title>
      <p>In this section, the proposed extension of SPARQL, the basic architecture of
mixed databases, and the query optimization technique are introduced.
Query Definition: XQuery-based filtering aims to filter bindings of graph
pattern in SPARQL that XML documents in the bindings satisfy XQuery. To
this end, this XQuery is required to return boolean value. In the extended
query, XQueryFILTER expression is added to the SPARQL definition based on
the FILTER expression, which argument is an XQuery expression including
variables in SPARQL. The following is an example of SPARQL with XQueryFILTER.</p>
      <p>SELECT ?s
WHERE{ ?s a :Country; :safetyInfo ?doc; :populationTotal ?pop.</p>
      <p>FILTER ( ?pop &gt; 10,000,000 ) . }
XQueryFILTER (</p>
      <p>LET $x := doc(?doc)//mail[leaveDate &gt; xs:date(’2020-03-01’)]</p>
      <p>RETURN contains($x, ’coronavirus’)
). }
This query is to search for countries with more that 10 million people and with
safety information about the coronavirus after March, 2020. Its XQueryFILTER
contains SPARQL variable ?doc which referred by ex:safetyInfo for a safety
information XML file. During query evaluation, the XQuery in the XQueryFILTER
can be performed by replacing ?doc variable with one of its bindings. If the
XQuery returns true, the binding remains in results, eliminated otherwise.
System Architecture: Fig. 1 represents a basic architecture for realizing
SPARQL with XQueryFILTER. A basic assumption is that RDF and XML data</p>
      <p>User Interface
Parser</p>
      <p>Query Manager</p>
      <p>Optimizer Executor
SPARQL
Processesor</p>
      <p>XQuery
Processesor</p>
      <p>Catalog</p>
      <p>Manager
SPARQL EP</p>
      <p>XML DB</p>
      <p>SPARQL</p>
      <p>XQuery
(b) SPARQL
(c)</p>
      <p>XQuery</p>
      <p>Join
XQuery
are separately stored in SPARQL EP (endpoint) and XML DB, respectively.
Dedicated query processors are associated to communicate with the databases,
namely, SPARQL processor and XQuery processor. Query manager handles user
query, decomposes it into SPARQL and XQuery, explores optimal query plans,
executes them and merges the results. User interface communicates with users
by receiving queries and returns their results.</p>
      <p>Optimization: To realize efficient SPARQL with XQueryFILTER, the optimizer
in the query manager chooses one query plan with the least cost among three
possible plans to execute SPARQL with XQueryFILTER as shown in Fig. 2.
Parallel : Execute SPARQL and XQuery in parallel and join results afterward.
SPARQL First : Execute SPARQL first, push its bindings down into XQuery,
and evaluate the bindings with XQuery results.</p>
      <p>XQuery First : Execute XQuery first and push its results down into SPARQL.
Execution costs of these query plans modeled on the basis of the following idea.
Let CSPARQL, CXQuery respectively denote the processing costs of SPARQL and
XQuery, and CJoin denotes the join cost. The costs of the parallel plan, C(p),
the SPARQL first plan, C(s), and the XQuery first plan is as follows.</p>
      <p>C(p) = max(CSPARQL; CXQuery ) + CJoin ;
C(s) = CSPARQL + SPARQL CXQuery + CJoin ;
C(x ) = CXQuery +</p>
      <p>XQuery CSPARQL;
(1)
(2)
(3)
where SPARQL and XQuery denote the selectivities of the preceding SPARQL
and XQuery, respectively.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Evaluation</title>
      <p>Experimental evaluation for showing efficiency of the proposed SPARQL with
XQueryFILTER and query optimization was conducted. In this experiment, three
scenarios are prepared by collecting XML documents related with LOD datasets,
which have different settings of databases in terms of network latency and data
size. XML data are stored in a local XML database using eXist-db (v. 5.2.0)1,
and RDF data in this experiment are stored locally using Apache Jena Fuseki
1 http://exist-db.org/</p>
      <p>SSSPPPAAARRRQQQLLLFFFiiirrrsssttt(((129)1047)) XXXQQQuuueeerrryyyFFFiiirrrsssttt(((129)1047)) PPPaaarrraaallleeelll(((921)0174))</p>
      <p>SSSPPPAAARRRQQQLLLFFFiiirrrsssttt(((148506398))8) XXXQQQuuueeerrryyyFFFiiirrrsssttt(((481065983))8) PPPaaarrraaallleeelll(((148506398))8)</p>
      <p>SSSPPPAAARRRQQQLLLFFFiiirrrsssttt(((266168)50)3) XXXQQQuuueeerrryyyFFFiiirrrsssttt(((626618)05)3) PPPaaarrraaallleeelll(((626816)50)3)
(v. 3.14.0)2 ) or stored in external SPARQL endpoints (i.e., DBpedia SPARQL
endpoint). Here, query processing efficiency is measured by query execution time.
Scenarios: In this experiment, three scenarios are prepared in order to observe
query performances over different performances of underlying databases as
summarized in Table 1. The first scenario (C.S. scenario) is to search countries with
governmental safety information messages. This scenario is that cost of querying
to SPARQL endpoint is high due a federated query using the SERVICE clause and
that of querying to XML DB is low due to the small number of XML documents.
The second scenario (L.S. scenario) is to search acts with their body texts. This
scenario is that costs of querying SPARQL and XQuery are comparable due to
using the local SPARQL endpoint and the moderate number of XML documents.
The third scenario (D.S. scenario) is to search discussions in minute books about
law enactment. This scenario is that cost of querying SPARQL is low and that
of querying XQuery is high due to using the local SPARQL endpoint and the
large number of XML documents. Queries are generated by using templates to
control the selectivities of SPARQL and XQuery.</p>
      <p>Results: Fig. 3 shows selected results for each scenario in terms of
selectivities of SPARQL and XQuery. The common observation is as follows. First, the
2 https://jena.apache.org/documentation/fuseki2/index.html
SPARQL first plan (resp. XQuery first plan) is linearly performed w.r.t.
selectivity of SPARQL (resp. XQuery) query and it is scarcely affected by selectivity
of XQuery (resp. SPARQL) query. Second, the parallel plan is nearly constant
w.r.t.both selectivities of SPARQL and XQuery when their execution
performances have a large gap like C.S. and D.S. scenarios. When these performances
are comparable (as the L.S. scenario), this plan depends on both selectivities.</p>
      <p>In Fig. 3(b), the optimal plan for XQuery selectivity of 660 is switched from
the SPARQL first plan to the parallel plan around SPARQL selectivity of 1,000.
Fig. 4 indicate a reason for this switching. This figure shows a breakdown of
execution time in query plans in the form of a stacked bar graph of execution
times of SPARQL, XQuery and Join (blue, orange and green colors, respectively).
Basically, in this SPARQL with XQueryFILTER, SPARQL is executable relatively
faster than XQuery (Fig. 4(b) and Fig. 4(c)). Therefore, SPARQL first plan is
better plan as far as the number of XML documents being queried afterward is
small. In this scenario, the number of queried XML documents more than 1,000
is a turning point that the parallel plan overcomes the SPARQL first plan.</p>
      <p>This experiment indicates that the three query plans reflect the pros and
cons of underlying databases. The proposed optimization technique captures
these characteristics of plans by the cost equations in Equation 1, 2 and 3 and
can successfully discover the best plan if statistics of database performances and
selectivity estimations of SPARQL and XQuery are accurate.</p>
      <p>Acknowledgements
This work was partly supported by JSPS KAKENHI Grant Number JP18K18056
and the Artificial Intelligence Research Promotion Foundation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Akhtar</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kopecký</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krennwallner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>XSPARQL: Traveling between the XML and RDF Worlds - and Avoiding the XSLT Pilgrimage</article-title>
          .
          <source>In: ESWC 2008</source>
          . pp.
          <fpage>432</fpage>
          -
          <lpage>447</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Almendros-Jiménez</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becerra-Terón</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torres</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Integrating and Querying OpenStreetMap and Linked Geo Open Data</article-title>
          .
          <source>Comput. J</source>
          .
          <volume>62</volume>
          (
          <issue>3</issue>
          ),
          <fpage>321</fpage>
          -
          <lpage>345</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bikakis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsinaraki</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stavrakantonakis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gioldasis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christodoulakis</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The SPARQL2XQuery interoperability framework - Utilizing Schema Mapping, Schema Transformation and Query Translation to Integrate XML and the Semantic Web</article-title>
          .
          <source>World Wide Web</source>
          <volume>18</volume>
          (
          <issue>2</issue>
          ),
          <fpage>403</fpage>
          -
          <lpage>490</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked Data - The Story So Far</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst</source>
          .
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cowie</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehnert</surname>
          </string-name>
          , W.G.:
          <article-title>Information Extraction</article-title>
          .
          <source>Commun. ACM</source>
          <volume>39</volume>
          (
          <issue>1</issue>
          ),
          <fpage>80</fpage>
          -
          <lpage>91</lpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Droop</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flarer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groppe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groppe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Linnemann</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinggera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santner</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schöpf</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staffler</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zugal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Embedding Xpath Queries into SPARQL Queries</article-title>
          .
          <source>In: ICEIS 2008</source>
          . pp.
          <fpage>5</fpage>
          -
          <lpage>14</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Groppe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groppe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Linnemann</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kukulenz</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoeller</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reinke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Embedding SPARQL into XQuery/XSLT</article-title>
          . In:
          <string-name>
            <surname>SAC</surname>
          </string-name>
          <year>2008</year>
          . pp.
          <fpage>2271</fpage>
          -
          <lpage>2278</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sherkhonov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Data Exchange for Document-Centric XML</article-title>
          .
          <source>In: Proc. PhD Symposium@SIGMOD 2014</source>
          . pp.
          <fpage>26</fpage>
          -
          <lpage>30</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>