<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Subqueries in SPARQL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Renzo Angles</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Gutierrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Universidad de Chile</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Universidad de Talca</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Subqueries functionality is a powerful feature which allows to enforce reuse, composition, rewriting and optimization in a query language. In this paper we perform a comprehensive study of the incorporation of subqueries into SPARQL. We consider several possible choices as suggested by the experience of similar languages, as well as features that developers are incorporating and/or experimenting with. Based on this study, we present an extension of SPARQL, with syntax and formal semantics, which incorporates all known types of subqueries in a modular fashion and preserves the original semantics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This paper addresses the design issues raised by the introduction of subqueries
to the standard RDF query language SPARQL. By a subquery it is usually
understood a query that is nested inside another query.</p>
      <p>The advantages of having subqueries in a query language are well known [6,4];
among the most important for SPARQL we can mention incorporation of views,
reuse of queries, query rewriting and optimization, and facilitating distributed
queries [2]. For SPARQL, whose 2008 Recommendation lacks these features, the
issue was early raised by the RDF Data Access Working Group in July 2004 with
the name of cascadedQueries [1]. Also it has been gradually incorporated into
SPARQL engines, like ARQ and Virtuoso. To the best of our knowledge, there
are neither formal semantics de ned for these types of queries nor a systematic
study of their expressive power.</p>
      <p>There are several desirable design considerations when including a new
feature to a language. Among the most important that will guide our study here
are: precise semantics, hopefully a formal one, that avoids case by case
analysis and missing corner cases; compositional semantics, that is, the meaning of
an expression should be the same wherever it appears, and expressions with
equal result types should be allowed to appear in the same contexts; modularity,
that is, each functionality should be \basic" (atomic) and hopefully semantically
independent of others. This is particularly important in subquerying, because
queries nested in other queries bring with them many features, sometimes in a
hidden or undesirable manner. Finally, one always wants simplicity, which in
this case amounts to avoiding adding new features if already present. Most of
the current proposals lack some or all of these desiderata.</p>
      <p>In this paper we study the introduction of subqueries to SPARQL following
these guidelines. We study the diverse proposals, both theoretical and
practical, that have been presented, analyze their basic constructs, and show their
interplay and implications. We unify these diverse constructions in an extension
of SPARQL that includes all these features {modulo some necessary
consistency constraints{, and extend for them the standard semantics of SPARQL.
We present the syntax and a formalized semantics.</p>
      <p>
        Our proposal includes the following features: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) extends the FROM and
FROM NAMED clauses by allowing a construct query as input; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) allows a
select-query in the place of a graph pattern; (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) introduces the set-membership
operator IN in order to nd a value in the sequence of solutions returned by
a select-query; (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) introduces the quanti er operators SOME and ALL which
allow, using a scalar comparison operator (e.g. \="), to compare a value with
some or all the values returned by a select-query; and (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) introduces the operator
EXISTS to allow an ask-query to be a type of lter constraint.
      </p>
      <p>The paper is organized as follows. In Section 2, we present, via examples, an
informal introduction to subqueries in SPARQL. We show di erent existing
approaches and introduce informally the problems and advantages of each of them.
In Section 3, we present the de nition of a formal semantics for subqueries, which
is exible and expressive enough to cover all known cases. We show that it
preserves the original semantics of SPARQL when the constructs are expressible in
SPARQL, and extends it consistently in the other cases. In Section 4, we discuss
one by one the diverse functionalities that subqueries in SPARQL incorporate
(explicitly or as side-e ect), like creation of new values, projection in patterns,
possibility of choosing set semantics for patterns. We study their expressiveness
and interrelation. In Section 5, we present the works related to our study.
Finally, in Section 6 we summarize our ndings and suggest future steps in the
process of incorporating subqueries into SPARQL.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Motivating Examples</title>
      <p>In this section we give an informal introduction to subqueries in SPARQL
through several motivating examples. In order to describe the basic constructs
de ned by SPARQL, consider the query expression presented in Example 1.
Example 1. A simple SPARQL query. It returns the names of people whose age
is between 18 and 30, plus the names of people having (or not) an email.
SELECT DISTINCT ?N
FROM &lt;http://.../people.rdf&gt;
WHERE { {{ ?X name ?N . ?X age ?A } FILTER (?A &gt; 18 &amp;&amp; ?A &lt; 30)}</p>
      <p>UNION {{?X name ?N} OPTIONAL {?X email ?E}} }</p>
      <p>A SPARQL query consists of three sections. First, a query form section which
de nes the type and result-format of the query: a SELECT query returns a
sequence of variable bindings (solutions); an ASK query returns a Boolean value;
and a CONSTRUCT query returns an RDF graph. Second, a dataset section
which describes the RDF dataset to be queried. This is a sequence of FROM
and FROM NAMED clauses indicating the datasets to be queried, including
RDF graphs which are identi ed by and obtained from a given URIs. Third,
a query pattern section WHERE, which contains an expression representing a
graph pattern. A graph pattern is a combination of triple patterns (e.g., ?X
name ?N) and binary operators \.", UNION, OPTIONAL, FILTER, for,
respectively, conjunction, disjunction, optional-conjunction, and ltering of graph
patterns. Additionally, there are solution modi ers which can be used to modify
the sequence of solutions obtained from the evaluation (e.g., DISTINCT ensures
that the elements in the sequence of solutions are unique). The evaluation of
a SPARQL query is essentially a matching of the graph pattern against the
dataset.</p>
      <p>There are many choices to incorporate subqueries into SPARQL1. Two main
approaches emerge naturally: to consider those features suggested by the
experience of similar languages, notably SQL; and to consider the features that
implementors are already incorporating or exploring. In this work we follow both.
On the one hand, subqueries using known constructs taken from standard
languages like SQL (subqueries in WHERE and FROM clauses), and on the other,
subqueries in places where the developers are considering it necessary (FILTER
constraints for example) and also in places where there is still little experience
(e.g. in the NAMED FROM clauses).</p>
      <p>In what follows, we motivate these choices via examples and discuss their
implications, interplay, and completeness in detail in Section 4.
2.1</p>
      <sec id="sec-2-1">
        <title>SPARQLF rom : Subqueries in dataset clauses</title>
        <p>Based on the observation that construct-queries return RDF graphs, a natural
extension is to allow the inclusion of such queries in FROM or FROM NAMED
clauses (currently accepting only URI references to a graph).</p>
        <p>
          Example 2. Mails of pairs of people having a co-authorship relation. The second
FROM clause includes a subquery (lines 3 to 6) that outputs an RDF graph
whose triples represent co-authors.
1 In this work we do not consider subqueries in the SELECT and CONSTRUCT
clauses because: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) although the former are allowed in SQL, they are hardly used;
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) there are not use cases reported for the latter.
        </p>
        <p>These types of subqueries (see Example 2) open the door to composition,
modularization and optimization of queries. It is interesting to note that, as a
side-e ect of this extension, the creation of values is introduced (by using ground
triples in the template of the construct-query).
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>SPARQLSubS : Subqueries as graph patterns</title>
        <p>Another form to build SPARQL subqueries, is to allow a query in the place of
a graph pattern (See Example 3). This extension allows: expressing operations
among select-queries (not currently supported by SPARQL, e.g. union,
intersection, etc.); projection of solution variables at the level of graph patterns; and
elimination of duplicates in patterns.</p>
        <p>Example 3. Mails of people having publications. The subquery returns identi ers
(without repetitions) of authors from the bibliographic database.
Note the crossed correlation of variables. How should such pattern be evaluated?
The SPARQL semantics indicates that each one should be evaluated
independently and then \join" the solutions. But in this example, the correlation of
variables is recursive: the second should be evaluated once the values of ?X are
known; similarly with the rst and ?A. A simple solution is to avoid correlation,
but this decision limits severely the power of subqueries.</p>
      </sec>
      <sec id="sec-2-3">
        <title>SPARQLF ilter : Subqueries in</title>
        <p>
          lters constraints
This extension follows the same design philosophy of subqueries in SQL by
introducing the operators IN, SOME, ALL, and EXISTS. It results in the following
types of lter constraints:
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Set membership condition: It uses the IN operator to test whether the result
of a select-subquery contains a value (Example 4);
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Quanti ed condition: It combines a value, a quanti er operator (SOME/ALL),
and a scalar comparison operator (e.g., \&lt;") to test whether the comparison
condition is satis ed by some (Example 4) or all (Example 5) the data values
resulting of a select-subquery.
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Existencial condition: It uses the EXISTS operator to enclose and verify
whether an ask-subquery has at least one solution (Example 6).
        </p>
        <p>
          Note that the types (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) demand a select-subquery with a single result
variable. Additionally, the corresponding negation operators in SQL (e.g.,
NOTIN) can be represented by using the negation of lter constraints (i.e., \!").
Example 4. Names of researchers, without duplicates, with at least one paper in
some edition of the ISWC. (The query also works with IN replaced by '= SOME').
1 SELECT ?Name FROM &lt;http://.../biblio.rdf&gt;
2 WHERE { ?Aut name ?Name . ?Pub author ?Aut . ?Pub conf ?Conf
3 FILTER (?Conf IN ( SELECT ?ConfX FROM &lt;http://.../biblio.rdf&gt;
4 WHERE { ?ConfX series "ISWC" }))}
Example 5. The names of the oldest people. Note that the MAX aggregate
operator is not useful here because we are not asking for the highest age.
1 SELECT ?Name FROM &lt;http://.../people.rdf&gt;
2 WHERE { ?Per name ?Name . ?Per age ?Age .
3 FILTER ( ?Age &gt;= ALL ( SELECT ?AgeX FROM &lt;http://.../people.rdf&gt;
4 WHERE { ?PerX age ?AgeX }))}
Example 6. Emails of people with at least one paper in every edition of the
ISWC (this is a typical query solved using the division operator).
1 SELECT ?Email FROM &lt;http://.../people.rdf&gt;
2 WHERE { ?Per email ?Email .
3 FILTER !( ASK FROM &lt;http://.../biblio.rdf&gt;
4 WHERE { ?Conf series "ISWC" .
5 FILTER !( ASK FROM &lt;http://.../biblio.rdf&gt;
6 WHERE { ?Pub author ?Per .
7 ?Pub conf ?Conf }) }) }
        </p>
        <p>It turns out that this type of subqueries preserves faithfully the original
semantics of SPARQL. The evaluation follows a per-mapping approach called
the nested iteration method [6] (i.e., the subquery is evaluated once for each
solution of the graph pattern of the main query).</p>
        <p>The above examples deserve some comments. The IN operator is less
expressive than the SOME operator, because the former is restricted to equality of
values whereas the latter allows any scalar comparison operator. In fact every
query using IN can be rewritten to a query using \= SOME".</p>
        <p>Subqueries with SOME/ALL operators without correlated variables are
wellsuited for query composition (i.e., direct copy/paste of queries). In contrast, the
use of EXISTS is not good for query composition because it needs correlated
variables to make sense. This helps the user to express complex queries but makes
the evaluation harder (because the application of the nested iteration method).</p>
        <p>An interesting feature of these subqueries is the natural elimination of
duplicates. For instance, Example 7 shows the query in Example 4 expressed using
DISTINCT. Although the use of DISTINCT here is simple and clear, it must be
stressed that the duplicate elimination is expensive.2 Thus, the use of subqueries
could improve and optimize this task in many cases.</p>
        <p>Example 7. Name of researchers, without duplicates, with at least one paper in
some edition of the ISWC (solution using DISTINCT).
1 SELECT DISTINCT ?Name FROM &lt;http://.../biblio.rdf&gt;
2 WHERE { ?Aut name ?Name . ?Art author ?Aut .
3 ?Art conf ?Conf . ?Conf series "ISWC" }</p>
        <p>In a previous work [2] we studied these types of queries and proved that
subqueries using SOME, ALL and IN can be simulated by using the EXISTS
operator. In fact, in this paper we restrict our study to subqueries using the
EXISTS operator.</p>
        <p>
          Note 1. On could add - like in SQL - an additional type of subquery called
aggregate subquery. It would extend the de nition of subqueries type (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) by allowing
an aggregate operator (e.g., AVG) in the subquery. For example, consider the
graph pattern f?Per age ?Age . FILTER ?Age &lt; (SELECT AVG(?AgeX) ... )g.
Although the inclusion of aggregation does not introduce severe complications,
we do not consider them in this paper because aggregate operators have not
been standardized yet in SPARQL.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Syntax and Semantics for Subqueries</title>
      <p>In this section the introduce a proposal of syntax and semantics for subqueries
in SPARQL. We will follow the widely accepted formalization of SPARQL given
in [8], and present here the corresponding extensions. We refer the reader to such
citation for additional details about the following de nitions.</p>
      <p>We assume the pairwise disjoint, in nite sets I, B, L, and V (IRIs, blank
nodes, RDF literals, and variables respectively). An RDF triple is a tuple from
I [ B I I [ B [ L. An RDF graph is a nite set of RDF triples. An RDF
dataset is a set fG0; hu1; G1i; : : : ; hun; Gnig where each Gi is an RDF graph and
each uj is an IRI. G0 is the default graph and each pair hui; Gii is a named graph.</p>
      <p>A triple pattern is an RDF triple with variables. A SPARQL query is a tuple
Q = (R; F; P ) where R is a query form, F is a set of dataset clauses, and P is a
graph pattern. Speci cally,
{ If W V is a set of variables and H is a set of triple patterns (called a graph
template) then the expressions SELECT W , CONSTRUCT H, and ASK are
query forms.
2 The time it takes to sort the solutions, so that duplicates may be eliminated, is often
greater than the time it takes to execute the query itself [5](Sec. 6.4.1).
{ If u 2 I then the clauses FROM u and FROM NAMED u are dataset clauses.
{ A graph pattern is de ned recursively as follows. A triple pattern is a graph
pattern. If P1 and P2 are graph patterns then the expressions (P1 AND P2),
(P1 OPT P2) and (P1 UNION P2) are graph patterns. If P is a graph pattern
and C is a lter constraint then the expression (P FILTER C) is a graph
pattern. A lter constraint is a combination of equality predicates, built-in
functions, and Boolean operators (':', '^' and '_').
3.1</p>
      <sec id="sec-3-1">
        <title>Syntax for Subqueries</title>
        <p>
          We extend the syntax of SPARQL, as de ned above, and introduce the following
three families of subqueries:
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) SPARQLF rom: Subqueries in dataset clauses. If u 2 I and QC is a
constructquery, then FROM(QC ) and FROM NAMED u(QC ) are dataset clauses.
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) SPARQLSubS : Subqueries as graph patterns. If QS is a select-query, then the
expression (QS ) is a graph pattern.
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) SPARQLF ilter: Subqueries in lter constraints. If QA is an ask-query, then
the expressions EXISTS(QA) and : EXISTS(QA) are lter constraints.
Flat and nested queries. Let Q = (R; F; P ) be a query. A query Q0 is nested in
Q if and only if Q0 occurs in Q as part of one of the expressions de ned above.
In such case, Q is known as the outer query and Q0 is known as the inner query.
If Q does not contain nested queries then Q is called a at query. This de nition
supports queries with any level of nesting.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Semantics for Subqueries</title>
        <p>In order to give a formal semantics for subqueries, we introduce some basic
de nitions about SPARQL semantics (for the details we suggest to read [8]).</p>
        <p>A mapping is a partial function : V ! I [ B [ L. The domain of ,
dom( ), is the subset of V where is de ned. The empty mapping 0 is a
mapping such that dom( 0) = ;.</p>
        <p>
          Scope of variables. Given a query Q, as in programming languages, a variable
can occur in (some place of) Q as free or bound. The precise recursive de nition
is as follows. For a query Q = (R; F; P ), an occurrence of a variable ?x is free
in Q i ?x occurs free in the pattern P and does not occur in the clause R. For
a graph pattern P , an occurrence of ?x is free i either: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) P is a SPARQL
pattern and ?x occurs in P , or (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) P is the pattern of an ask-query QA in a lter
constraint, and ?x occurs free in QA.3
3 In this paper we do not consider free occurrences in FROM subqueries, because there
is yet little background and no known use cases for them. Note also that if P is a
subSelect then all variables occur bound in P .
Query correlation. Two queries Q and Q0 are correlated i Q0 is nested in Q, and
there is some variable occurring free in both, the graph pattern of Q and in the
graph pattern of Q0. Such variables are called correlated variables. This notion
of correlated queries will be allowed only for subqueries in lter constrains.
        </p>
        <p>Let Q be a query and be a mapping. We denote by (Q) the query resulting
by replacing each free occurrence of a variable ?x in Q by the constant (?x)
(recursively if necessary). Note that the same variable ?x could occur free and
bound in the same query and we are replacing only the free occurrences.</p>
        <p>Informally, the answer for a query Q = (R; F; P ) is a function ans(Q) which
returns: (i) a sequence of mappings when Q is a select-query; (ii) an RDF graph
when Q is a construct-query; and (iii) a Boolean value (true = false) when Q is
an ask-query.</p>
        <p>Semantics of SPARQLF rom. Let F be a set of dataset clauses (that is, FROM u,
FROM(QC ), FROM NAMED u, and FROM NAMED(QC ) clauses). The dataset
D obtained from F contains:
(i) A default graph which consists of the merge of the graphs referred in clauses
FROM u, plus the graphs ans(QC ) obtained from clauses FROM(QC ). If
there are no such clauses, then the default graph is the empty graph.
(ii) A named graph hu; graph(u)i for each clause FROM NAMED u.
(iii) A named graph hu; ans(QC )i for each clause FROM NAMED u(QC ).
Semantics of SPARQLSubS and SPARQLF ilter. The evaluation of graph pattern
P over a dataset D, denoted [[P ]]D (as de ned in [8]), is extended as follows:
{ If P is a subSelect graph pattern (R; F; P ), then [[(R; F; P )]]D = ans(R; F; P ).
{ Let P 0 be a graph pattern and QA be an ask-query. If P is the graph pattern
(P 0 FILTER EXISTS(QA)) then</p>
        <p>[[(P 0 FILTER EXISTS(QA))]]D = f 2 [[P 0]]D : ans( (QA)) is trueg
The semantics presented gives for correlated queries the standard semantics
of the nested iteration method [6], i.e., the inner query is performed once for each
solution of the outer query. For example, consider the graph pattern
((?X name ?N ) FILTER : EXISTS(ASK(?X email ?E))):
Considering that ?X is a correlated variable, the method establishes that the
graph pattern ( (?X) email ?E) must be evaluated over and over again, once
for each result mapping of the graph pattern (?X name ?N ).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Design Issues and Expressive Power</title>
      <p>
        In this section we discuss ve orthogonal issues susceptible of being incorporated
to the language: query composition, creation of new values, projection in graph
patterns, elimination of duplicates, and correlation of variables.
Query composition. Composition of queries enforces: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) reuse of queries,
by introducing directly either pieces of text in a query or an URI pointing to
such query; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) rewriting, by allowing distributed evaluation (by pushing the
maximum possible information from the WHERE into the FROM clauses); and
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) optimization, by bringing patterns from the FROM into the WHERE clause.
      </p>
      <p>Neither current SPARQL nor SPARQL 1.1 are able to express query
composition. In contrast, this feature is naturally included in SPARQLF rom.
Creation of New values. This feature consists in the facility to output atomic
values di erent from those found in the database to be queried.</p>
      <p>SPARQL has no mechanism to create new values. In SPARQL 1.1 this
functionality is introduced in the SELECT clause by the construct AS. For example a
discounted price variable can be expressed as: (?p*(1-?discount) AS ?price).
By allowing subSelect queries, this functionality is smuggled into graph patterns.</p>
      <p>The proposed fragment SPARQLF rom allows the creation of values. Speci
cally, the intermediate graph created by a construct subquery can contain new
values when its template contains triples with no variables (known as ground
or explicit triples). Therefore we have the following results about the expressive
power of SPARQLF rom.</p>
      <p>Lemma 1. SPARQLF rom allows the creation of new values (i.e., values not
existing in the database) in a query.</p>
      <p>Theorem 1. SPARQLF rom is more expressive than SPARQL.</p>
      <p>Projections in graph patterns. SPARQL does not provide projection in
graph patterns, that is, there are no \local" variables at the level of graph
patterns (or in other terms, all variables in a pattern are part of the solutions of
the pattern). A partial (and dirty) shortcut to solve this issue are blank nodes
(e.g., the solution mapping to the pattern f :b1 ?X ?Y g has only ?X and ?Y as
solution variables). There are two problems with this patch solution: (i) blank
nodes are not allowed in predicate positions (although most implementations
allow it); (ii) the \variable" :b1 cannot be constrained with lters.</p>
      <p>Note that this feature is naturally supported in SPARQLSubS because one
can project variables in a select subquery. This is useful to avoid unnecessary
clashes of variables (for example in automatic construction of queries, or even
when cutting and pasting pieces of code). Projection in graph patterns is the
type of feature that does not increase the expressive power of the language, but
is very useful for programmers.</p>
      <p>DISTINCT in graph patterns. The evaluation of a graph pattern in SPARQL
has bag semantics by default. This is an important issue considering the
complexities that introduces the interplay between bag and set semantics [3].</p>
      <p>In SPARQLSubS , one could choose bag or set semantics by writing a
subSelect (SELECT DISTINCT * WHERE { ... }). This functionality adds expressive
power as the following example shows.</p>
      <p>Assume a database of university people, codi ed with triples of the form
(u; name; n) (u's name is n) and (u; dept; d) (u is attached to department d).
The query \list names of people in either the CS or the Math department" is
the following in SPARQL 1.1:
SELECT ?N
WHERE {?U name ?N . { SELECT DISTINCT *</p>
      <p>WHERE {{?U dept "CS"} UNION {?U dept "Math"}}} }
Considering that the DISTINCT will be evaluated before the projection of ?N,
the query could return either more or less names than existing selected
individuals (for example, if there are three Peters in both departments). This query
has no equivalent one in current SPARQL because one has to project in the
SELECT before eliminating duplicates with the DISTINCT.</p>
      <p>Theorem 2. SPARQLSubS is more expressive than SPARQL.</p>
      <p>Variable correlation. Correlation of variables is a basic and useful feature
of subqueries (as shown in the examples of Section 2). In SPARQLSubS and
SPARQL 1.1, only variables projected by the select subquery are visible to
operations outside the subquery.</p>
      <p>Allowing correlated variables (for example, through projected variables in a
pattern to be visible from outside), would bring an undesirable design: the need
to introduce an order into the evaluation of patterns, thus changing its current
SPARQL semantics. In fact, to make consistent the design, subSelects should be
evaluated at the end. But still in this case we could nd troubles like the one
signaled in Section 2.2.</p>
      <p>This problem is solved, for example in SQL, by giving the subqueries the
function of lters. Filters are evaluated last. Note that the extension of SPARQL
with subqueries in lter constraints and in dataset clauses follow this philosophy,
hence do not need any arti cial constraint on visibility of variables.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Current proposals / Related work</title>
      <p>The current working draft4 of SPARQL 1.1 presents the following features related
to our work:
{ Graph patterns f P FILTER EXISTS f P' g g where P 0 is basically a graph
pattern to test. Therefore there is not a direct notion of subquery.
{ Two styles of negation: (i) f P FILTER NOT EXISTS f P' g g which tests
the absence of a match to the pattern P'; and (ii) f P MINUS f P' g g which
removes the solutions of P occurring also in P'. Both can be used to represent
negation in SPARQL, however in some cases they are not equivalent.
{ SubSelects, which consist in allowing a select-query within the graph pattern
of another query.
4 http://www.w3.org/TR/2010/WD-sparql11-query-20101014/
Di erently to our proposal, in SPARQL 1.1 FROM and FROM NAMED clauses
are not permitted in subSelects, hence a subSelect is forced to share the same
RDF dataset with its parent query. Moreover, it has no formal de nition for the
semantics of query correlation.</p>
      <p>Two notions of subquery have been proposed in works that study
translations from SPARQL to Datalog. Polleres [9] suggests that Boolean queries (i.e.,
queries having ASK query form) can be safely allowed within lter constraints.
Additionally, Schenk [11] proposes the use of views as parts of a dataset, that is,
the inclusion of construct-queries in FROM clauses. Although in both cases such
extensions are well supported by their translations from SPARQL to Datalog,
they do not include further developments on issues arising from these extensions.</p>
      <p>Regarding real-life practice, implementations are beginning to provide
extensions of SPARQL that include support for some types of subqueries. Virtuoso
includes extensions5 related to subqueries. Among them, it allows an embedded
select query in the place of a triple pattern; and lter constraints of the form
"EXISTS (&lt;scalar subquery&gt;)". Similarly, ARQ , the query engine for Jena,
supports a type of subSelect which uses aggregate functions6. For example,
consider the pattern f ?x a :Toy . f SELECT ?x ( COUNT(?order) AS ?q) ...gg.</p>
      <p>Additionally, ARQ allows expressions SERVICE &lt;URI&gt; f P g, which can
be used to send and execute the graph pattern P into the SPARQL endpoint
named &lt;URI&gt; (basic federated queries). DARQ [10] o ers a single interface for
querying multiple, distributed SPARQL endpoints and makes query federation
transparent to the client.</p>
      <p>The extension of Virtuoso corresponds to the inclusion of the select-query
in a FROM clause. If one does not consider aggregate functions, not present in
current SPARQL, the \EXISTS" extension is equivalent to our de nition, and
the \SELECT" of ARQ can be simulated by our SOME queries.</p>
      <p>None of these implementations present systematic covering nor analysis of
these extensions.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper we presented a comprehensive discussion of several ways of
introducing subqueries into SPARQL, while preserving the original semantics and
spirit. The arguments presented in Section 2, show that it is completely feasible
to incorporate all the exibility of subqueries that has been systematically tested
by users and developers; to include all syntaxes currently in use; plus some new
constructs that would add facilities to the users.</p>
      <p>In this direction we presented in Section 3 the syntax and semantics of a
complete proposal of adding subqueries and composition to SPARQL in all
reasonable forms. Our extension can be done while preserving the original semantics
and adding negligible overhead over the current SPARQL.
5 http://www.w3.org/2009/sparql/wiki/Extensions_Proposed_By_OpenLink#</p>
      <p>Nested_Queries
6 http://jena.sourceforge.net/ARQ/sub-select.html</p>
      <p>We presented (Section 4) expressiveness results of these extensions, analyzed
their interplay and their implications, particularly regarding unexpected
sidee ects produced by the incorporation of certain features.</p>
      <p>To be complete, the discussion should include at some point the incorporation
of subqueries in the aggregate constructs that are planned for SPARQL 1.1.
Again, we strongly suggest to follow the SQL-philosophy. We have showed that it
can be incorporated to SPARQL with little noise, in a perfectly coherent manner,
without altering the original semantics of SPARQL, and adding few syntactic
construct with a clear semantics. In this regard, current implementations of
SPARQL can be modularly extended to include these new features.
Acknowledgements. R. Angles was supported by Fondecyt Project No. 11100364.
C. Gutierrez was supported by Fondecyt Project No. 1090565.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>W3C RDF Data Access Working Group - Issues List</surname>
          </string-name>
          . http://www.w3.org/2001/sw/DataAccess/issues.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez. SQL Nested</surname>
          </string-name>
          <article-title>Queries in SPARQL</article-title>
          .
          <source>In Proc. of the IV Alberto Mendelzon Workshop on Foundations of Data Management</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <article-title>Equivalence of queries combining set and bag-set semantics</article-title>
          .
          <source>In Proc. of the 25th Symp. on Principles of DB Systems (PODS)</source>
          , pages
          <fpage>70</fpage>
          {
          <fpage>79</fpage>
          . ACM,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Ganski and H. K. T. Wong</surname>
          </string-name>
          .
          <article-title>Optimization of nested SQL queries revisited</article-title>
          .
          <source>In Proceedings of the 1987 Int. Conf. on Management of data (SIGMOD)</source>
          , pages
          <fpage>23</fpage>
          {
          <fpage>33</fpage>
          , New York, NY, USA,
          <year>1987</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Ullman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <source>Database Systems - The Complete Book. Prentice Hall</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>W.</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>On optimizing an SQL-like nested query</article-title>
          .
          <source>ACM Transactions on Database Systems (TODS)</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <volume>443</volume>
          {
          <fpage>469</fpage>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>Semantics and Complexity of SPARQL</article-title>
          .
          <source>In Proceedings of the 5th International Semantic Web Conference (ISWC)</source>
          ,
          <source>number 4273 in LNCS</source>
          , pages
          <volume>30</volume>
          {
          <fpage>43</fpage>
          . Springer-Verlag,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <source>Semantics and Complexity of SPARQL. ACM Transactions on Database Systems (TODS)</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):1{
          <fpage>45</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          .
          <article-title>From SPARQL to Rules (and back)</article-title>
          .
          <source>In Proceedings of the 16th International World Wide Web Conference (WWW)</source>
          , pages
          <fpage>787</fpage>
          {
          <fpage>796</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>B.</given-names>
            <surname>Quilitz</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Leser</surname>
          </string-name>
          .
          <article-title>Querying Distributed RDF Data Sources with SPARQL</article-title>
          .
          <source>In Proc. of the 5th European SW Conf. (ESWC)</source>
          , volume
          <volume>5021</volume>
          <source>of LNCS</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>S.</given-names>
            <surname>Schenk. A SPARQL Semantics</surname>
          </string-name>
          <article-title>Based on Datalog</article-title>
          .
          <source>In 30th Annual German Conf. on Advances in AI (KI)</source>
          , volume
          <volume>4667</volume>
          <source>of LNCS</source>
          , pages
          <volume>160</volume>
          {
          <fpage>174</fpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>