Introduction

SQL Nested Queries in SPARQL

Renzo Angles

Claudio Gutierrez

0 0 Department of Computer Science, Universidad de Chile 1 Department of Computer Science, Universidad de Talca

SPARQL currently does not include any form of nested queries. In this paper we present a proposal to incorporate nested queries into SPARQL along the design philosophy of SQL nested queries. We present rewriting algorithms and show that all the proposed nested queries can be expressed in a natural and simple extension of SPARQL syntax.

Introduction

One of the most powerful features of a query language is the nesting of queries, that is, the possibility of writing in a single expression a query which uses the output of other queries. The current W3C recommendation of SPARQL [13] does not include any form of nesting, although it has been considered as an issue by the RDF Data Access Working Group, and has been gradually incorporated into SPARQL engines.

For SPARQL, the incorporation of nested queries has several motivations. One of the most important is the reuse of queries. Once a query is executed, the user may presumably direct the output to some storage medium, assign an IRI to it and then run a query against that extract. Also, with the right interface, the user might be able to just cut-and-paste a query that was debugged into ad-hoc queries. Hence, this feature would allow to build queries incrementally from separately debugged pieces. Another important motivation, as SPARQL was thought of to work on a distributed environment like the Web, is the notion of distributed queries. A SPARQL user will have access to vast and time-varying input RDF graphs, containing huge volumes of data that is not of interest to the user. Hence the query computation can be distributed and only relevant data used to compute the nal query. For example, AllegroGraph supports queries with distributed databases. A third and important motivation is query rewriting. Complex queries can be structured in a way more intuitive and understandable for the user. Examples of these uses are provided in the paper.

Introducing nesting in a coherent form, that is, providing a clean syntax and semantics, is a delicate task. There are several models and \philosophies", the most widely known those of SQL and XQuery. Based on the similarities between SQL and SPARQL [1,4], it seems natural to investigate how the SQL nesting model can be introduced into SPARQL. In this paper, we address this challenge by investigating systematically the behavior of SQL nesting operators in the context of SPARQL.

There have been several proposals for introducing nesting in SPARQL. Indeed, it has been considered as an issue by the RDF Data Access Working Group. It was raised on July 2004 and was denominated cascadedQueries1. Currently, the W3C Working Group of SPARQL is working on new features for the language [9]. Among them, the notion of Subqueries (to nest a query within another query) is a required feature. As a possible type of subquery, the working draft of SPARQL 1.1 [7] introduces the notion of subselect, that is to allow a SELECT query to be a graph pattern.

Regarding real-life practice, some implementations of SPARQL provide extensions that include support for some types of nested queries. ARQ, the query engine for Jena, supports a type of nested SELECT which uses aggregate functions2. Virtuoso has also included some extensions3 related to nested queries. Among them, an embedded select query in the place of a triple pattern, and lter conditions of the form \exists (<scalar subquery>)". None of these proposals and/or implementations present systematic covering and analysis of these extensions, nor a formal semantics for them. It is important to note that all of them introduce nested queries as a new pattern of the form (SELECT ....). This approach introduces several design decisions that are either non-desirable or not necessary at this level. Minor problems are the creation of values at the pattern level by allowing patterns like (SELECT ?X ?Y AS 5 ...), and the introduction of projection in patterns by allowing patterns of the form (SELECT ?X ?Y WHERE P(?X)) where ?Y does not appear in the pattern P. A more relevant problem is introduced once correlation of variables with subqueries is accepted, a desirable and necessary functionality when dealing with nesting. Either the original SPARQL unorder evaluation strategy of patterns or the standard semantics of correlation would need to be reworked. For example, consider the graph pattern P(?X) AND (SELECT ?Y WHERE R(?X,?Y)), where ?X is a correlated variable. The standard semantics would evaluate rst P and then for each value of ?X the corresponding instance of the SELECT pattern. In the proposals presented it is not clear how to deal with this issue. Additionally, we do not consider the orthogonal functionality of composition which in SPARQL would correspond to the discussion of how to nest queries in the FROM and FROM NAMED clauses. Schenk [14] proposes the use of views as parts of a dataset, that is, the inclusion of CONSTRUCT queries in FROM clauses. We do not address this topic in this paper.

An alternative approach to the introduction of SELECT as a new type of pattern, is to incorporate nesting as a ltering device. An early attempt in this direction is Polleres [12], where it is suggested that boolean SPARQL queries (i.e., queries having ASK query form) can be safely allowed within lter constraints, but the extension is not developed. In this paper we develop this design philosophy to its end based on the design philosophy of SQL nested queries, 1 http://www.w3.org/2001/sw/DataAccess/issues#cascadedQueries 2 http://jena.sourceforge.net/ARQ/sub-select.html 3 http://www.w3.org/2009/sparql/wiki/Extensions_Proposed_By_OpenLink# Nested_Queries which restrict the nested queries to a form of ltering in the WHERE condition. This approach allows to have a clean semantics for correlated variables, permits to modularly extend the language, and naturally extend the original SPARQL semantics. We develop this feature through the inclusion of SQL-like nested queries in usual SPARQL lter constraints. In particular, we present the following contributions. First, we present a proposal to incorporate nested queries to SPARQL along the design philosophy of SQL, by presenting the syntax and a formal semantics completely compatibles to the current one, and discuss design features. Second, we show, via illustrative examples, that all SQL facilities are also relevant in SPARQL, and present a set of equivalences and rewriting rules among them. Third, we prove that all classical SQL nesting operators (i.e., IN, SOME/ANY, ALL and EXISTS) can be reduced into one of them (i.e., EXISTS), hence proving that all standard nesting constructs can be expressed with the standard lter part of a SPARQL query.

The paper is organized as follows: Section 2 presents the syntax and the semantics of the extension of SPARQL with nested queries. Section 3 presents examples of nested queries. Section 4 presents the algorithms to rewrite among nested queries. Finally, Section 5 presents some conclusions. 1.1

Nested queries in SQL

SQL is a paradigmatic example of the power of nested queries. In fact, this feature plays and important role in SQL for several reasons [16]. SQL (as de ned in the ANSI/ISO SQL-92) allows nesting of query blocks in FROM and WHERE clauses, in any level of nesting. In the FROM clause, a subquery is imported and used to conform the set of relations to be queried. A subquery in the WHERE clause can be either aggregate (it returns a single value due to an aggregate operator) or non-aggregate (it returns either a set of values or empty, i.e., a SELECT query).

Let QA be an aggregate query, QS and Q be non-aggregate queries where QS returns one-column relation (i.e., it has a single projection predicate), and be a scalar comparison operator (<; ; >; =; 6=). A selection predicate containing nested queries can be of the form: ( 1 ) hvalue j attributei (QA) (value-set comparison predicate). ( 2 ) hvalue j attributei IN j NOT-IN (QS ) (set-membership predicate). ( 3 ) hvalue j attributei SOME j ALL(QS ) (quanti ed predicate). ( 4 ) EXISTS j NOT-EXISTS(Q ) (existential predicate).

For example, consider the relations EMPLOYEES(EMP#,NAME,SAL,DEPT) and DEPARTMENTS(DEPT#,NAME,LOCATION). The following expression shows a SQL nested query Q1 with aggregate (Q2) and non-aggregate (Q3) subqueries.

SELECT E.NAME FROM EMPLOYEES E WHERE E.SAL > (SELECT AVG(F.SAL) FROM EMPLOYEES F WHERE F.DEPT IN (SELECT D.DEPT# FROM DEPARTMENTS D WHERE D.LOCATION = 'DENVER') ); (Q1) (Q2) (Q3)

Syntax and Semantics of nested queries

In this section we extend the syntax and semantics of SPARQL to support nested queries. This extension, is based on the ideas and syntax of SQL nested queries as presented above. Considering that aggregate operators for SPARQL have not been de ned yet, we are only considering non-aggregate queries (i.e., SELECT and ASK queries) as nested queries. Hence, we have not included the value-set comparison predicates of SQL as de ned before. The de nition of this extension follows the formalization presented in [11]. 2.1

Preliminaries: The RDF Model and RDF Datasets

Assume there are pairwise disjoint in nite sets I, B, L (IRIs, blank nodes, and RDF literals respectively). We denote by T the union I [ B [ L (RDF terms). A tuple (v1; v2; v3) 2 (I [ B) I T is called an RDF triple, where v1 is the subject, v2 the predicate, and v3 the object. An RDF Graph [10] (just graph from now on) is a set of RDF triples. Given a graph G, we denote by term(G) the set of elements of T appearing in G and by blank(G) the set of blank nodes in G. If G is referred to by an IRI u, then graph(u) returns the graph available in u, i.e, G = graph(u).

We de ne two operations on two graphs G1 and G2. The union of graphs, denoted G1 [ G2, is the set theoretical union of their sets of triples. The merge of graphs, denoted G1 + G2, is the graph G1 [ G02 where G02 is the graph obtained from G2 by renaming its blank nodes to avoid clashes with those in G1.

An RDF dataset is a set D = fG0; hu1; G1i; : : : ; hun; Gnig where each Gi is a graph and each uj is an IRI. G0 is called the default graph and each pair hui; Gii is called a named graph. Every dataset satis es that: (i) it always contains one default graph, (ii) there may be no named graphs, (iii) each uj is distinct, and (iv) blank(Gi) \ blank(Gj ) = ; for i 6= j. Given D, we denote by term(D) the set of terms occurring in the graphs of D. The default graph of D is denoted dg(D). For a named graph hui; Gii de ne name(Gi)D = ui and graph(ui)D = Gi; otherwise name(Gi)D = ; and graph(ui)D = ;. We denote by names(D) the set of IRIs fu1; : : : ; ung. Although name(G0) = ;, we sometimes will use g0 when referring to G0. Finally, the active graph of D is the graph Gi used for querying the dataset. 2.2

Syntax of nested queries

Assume the existence of an in nite set V of variables disjoint from T . Let var( ) the function which returns the set of variables occurring in the structure .

A triple pattern is a tuple in (T [ V ) (I [ V ) (T [ V ). A nested query is a tuple (R; F; P )4 where R is a result query form, F is a set {possibly empty{ of dataset clauses, and P is a graph pattern. Next we de ne each component. 4 In this paper we do not consider the solution modi ers de ned in [13]. ( 1 ) If W V is a set of variables and H is a set of triple patterns (called a graph template) then the expressions SELECT W , CONSTRUCT H, and ASK are result query forms. ( 2 ) If u 2 I and QC is a query of the form (CONSTRUCT H; F; P ), then the expressions FROM u and FROM NAMED u are dataset clauses. ( 3 ) A lter constraint is de ned recursively as follows: { If ?X; ?Y 2 V and v 2 I [ L then ?X = v, ?X = ?Y , and bound(?X) are (atomic) lter constraints5. { If u 2 T , is a scalar comparison operator (=; 6=; <; <=; >; >=), and Q?X is a query of the form (SELECT ?X; F; P ), then the expressions (u SOME(Q?X )), (u ALL(Q?X )) and (u IN (Q?X )) are lter constraints. { If QA is a query of the form (ASK; F; P ), then the expression EXISTS(QA) is a lter constraint. { If C1 and C2 are lter constraints, then (:C1), (C1 ^ C2), and (C1 _ C2) are (complex ) lter constraints. ( 4 ) A graph pattern is de ned recursively as follows: { A triple pattern is a graph pattern. { If P1 and P2 are graph patterns then the expressions (P1 AND P2), (P1 OPT P2), (P1 UNION P2), and (P1 MINUS P2) are graph patterns.6 { If P is a graph pattern and u 2 I [ V then the expression (u GRAPH P ) is a graph pattern. { If P is a graph pattern and C is a lter constraint then the expression (P FILTER C) is a graph pattern.

Let Q = (R; F; P ) be a query. A query Q0 is nested in Q if and only if Q0 occurs in the graph pattern P , i.e., when Q0 is nested in P . In such case, Q is known as the outer query and Q0 is known as the inner query. If Q does not contain nested queries then Q is called a at query.

Note that, nested queries in SPARQL have been de ned by extending the de nition of lter constraints with SQL-like predicates for nesting, speci cally by including the IN, SOME, ALL and EXISTS operators. The corresponding opposite operators of nesting can be represented by using the negation of lter constraints, i.e., NOT-IN and NOT-EXISTS are expressed as (:(u IN(Q?X ))) and (: EXISTS(QA)) respectively.

We have de ned two explicit restrictions about inner queries. On the one hand, lter expression using IN, ALL and SOME are restricted to use SELECT queries with a single projection-variable. On the other hand, lter expressions using EXISTS are restricted to use ASK queries. The latter condition has been included for simplicity. In practice, EXISTS lters could have queries having any result query form (e.g. SELECT), because the EXISTS condition does not really use the results of the inner query at all. 5 For a complete list of atomic lter constraints see the SPARQL speci cation [13] 6 The MINUS operator is not de ned in the SPARQL speci cation, however it can be simulated by a combination of the OPT and FILTER operators [1].

Similar to SQL, the extension presents two features inherent to query nesting. First, the language allows queries with any level of nesting. However, it is wellknown that queries with more than two levels of nesting are not recommended in practice because makes the query more di cult to read, understand, maintain and increases the execution time [16]. Second, variables from an outer query block can be accessed inside a nested query block. Such variables, called correlated variables, perform as outer references from the inner query to the outer query. A subquery containing correlated variables is called a correlated subquery. 2.3

Semantics of nested queries

A mapping is a partial function : V ! T . The domain of , dom( ), is the subset of V where is de ned. The empty mapping 0 is a mapping such that dom( 0) = ;. Given a triple pattern t and a mapping such that var(t) dom( ), (t) is the triple obtained by replacing the variables in t according to . Abusing notation, for a query Q, we denote by (Q) the query resulting from replacing variables in Q according to .

Two mappings 1 and 2 are compatible when for all ?X 2 dom( 1 )\dom( 2 ) it satis es that 1(?X) = 2(?X), i.e., when 1 [ 2 is also a mapping. The operations of join, union, di erence and left outer-join between two sets of mappings 1 and 2 are de ned as follows: { { { { 1 on 2 = f 1 [ 2 j 1 2 1; 2 2 1 [ 2 = f j 2 1 or 2 2g 1 n 2 = f 1 2 1 j for all 2 2 1qyon 2 = ( 1 on 2) [ ( 1 n 2)

2; 1 and 2 are compatibleg 2; 1 and 2 are not compatibleg

The answer for a query Q = (R; F; P ), denoted ans(Q), is a function which returns: (i) a set of mappings when R is a SELECT query; (ii) an RDF graph when R is a CONSTRUCT query; and (iii) a boolean value (true = false) when R is an ASK query. We will use this informal de nition of ans( ) to de ne the semantics for the components of a query. ( 1 ) Semantics of result query forms. Let be a mapping and R be a result query form. The result of R given , denoted result(R; ), is de ned as follows: { If R is SELECT W then result(R; ) is the restriction of to W , that is the mapping denoted jW such that dom( jW ) = dom( ) \ W and jW (?X) = (?X) for every ?X 2 dom( jW ). { If R is CONSTRUCT H then result(R; ) is the set of RDF triples (i.e.

an RDF graph) f (t) j t 2 H and (t) (I [ B) I T g.

{ If R is ASK then result(R; ) is false if = ; and true otherwise. ( 2 ) Semantics of dataset clauses. Let F be a set of dataset clauses. The dataset resulting from F , denoted dataset(F ), contains: (i) a default graph consisting of the merge of the graphs referred in clauses FROM u. If there is no FROM u, then the default graph is an empty graph G0 = ;; and (ii) a named graph hu; graph(u)i for each dataset clause \FROM NAMED u". ( 3 ) Semantics of lter constraints. Let be a mapping and C be a lter constraint. We say that satis es C, denoted j= C, if: { C is ?X = v, ?X 2 dom( ), and (?X) = v; { C is ?X = ?Y , ?X 2 dom( ), ?Y 2 dom( ), and (?X) = (?Y ); { C is bound(?X) and ?X 2 dom( ); { C is (:C1) and it is not the case that j= C1; { C is (C1 _ C2) and j= C1 or j= C2; { C is (C1 ^ C2), j= C1 and j= C2. { C is (u SOME(Q?X )) and there exists a mapping 0 2 ans( (Q?X )) satisfying that either u 0(?X) when u 2 I [ L or (u) 0(?X) when u 2 V . { C is (u ALL(Q?X )) and for every mapping 0 2 ans( (Q?X )) it holds that either u 0(?X) when u 2 I [ L or (u) 0(?X) when u 2 V . { C is (u IN (Q?X )) and there exists a mapping 0 2 ans( (Q?X )) satisfying that either u 0(?X) when u 2 I [ L or (u) 0(?X) when u 2 V .

{ C is EXISTS(QA) and ans( (QA)) is true. ( 4 ) Semantics of graph patterns. The evaluation of a graph pattern P over a dataset D with active graph G, denoted J KGD, is de ned recursively as follows: { P is a triple pattern then JP KGD = f j dom( ) = var(P ) and (P ) Gg { J(P1 AND P2)KGD = JP1KGD on JP2KGD. { J(P1 OPT P2)KGD = JP1KGDqyon JP2KGD. { J(P1 UNION P2)KGD = JP1KGD [ JP2KGD. { J(P1 MINUS P2)KGD = JP1KGD n JP2KGD. { If u 2 I then J(u GRAPH P1)KGD = JP1KgDraph(u)D . { If ?X 2 V and ?X!v is a mapping such that dom( ) = f?Xg and (?X) = v, then

J(?X GRAPH P1)KGD = Sv 2 names(D)(JP1KgDraph(v)D on f ?X!vg). { J(P1 FILTER C)KGD = f j 2 JP1KGD and j= Cg { If P is a SELECT query QS then JP KGD = ans(QS).

De nition 1 (Answer for a query). Let Q = (R; F; P ) be a query, D be the dataset obtained from F , and G be the default graph of D. The answer to Q, denoted ans(Q), is de ned as follows: { if R is SELECT W then ans(Q) = fresult(R; ) j 2 JP KGDg. { if R is CONSTRUCT H and blank(H) is the set of blank nodes appearing in H, then ans(Q) = f i(result(R; i)) j i 2 JP KGDg where i : blank(H) ! (B n blank(H) is a blank renaming function satisfying that for each pair of mappings j; k 2 JP KGD, range( j) \ range( k) = ;. { if R is ASK then ans(Q) = false when JP KGD = ; (i.e., there exists no mapping 2 JP KGD) and ans(Q) = true otherwise.

The semantics for correlated queries as de ned above follows the nested iteration method [8], i.e., the inner query is performed once for each solution of the outer query (it is because the results of the inner query are correlated with each individual solution of the outer query). This procedure is attained by replacing variables in the inner query with the corresponding values given by the current mapping of the outer query (e.g., by applying (Q?X )). For example, consider the graph pattern (((?X name ?N ) OPT(?X knows ?Y )) FILTER EXISTS(ASK(?Y email ?E))): The method establishes that the graph pattern (?Y email ?E)) (i.e., the subquery) is evaluated over and over again, once for each result mapping of the OPTIONAL graph pattern (i.e., the outer query).

We have identi ed two issued related to the use of correlated variables. First, loss of correlation due to unbounded variables. Consider that P is the the OPTIONAL graph pattern in the above example. If is a mapping in JP K such that (?X) = a, (?N ) = b and (?Y ) is unbounded (i.e., there was no solution for the OPTIONAL part), then there is no value to replace the variable ?Y in the inner query, and consequently there exists no correlation. This loss of correlation results in an undesirable evaluation because, when the inner query has at least one solution, the lter condition is true and the mapping is accepted as a solution. Clearly, it is not what the query intuitively means such that the evaluation of the inner graph pattern depends directly on the evaluation of the outer graph pattern. This problem, produced by correlated variables that could be evaluated to unbounded, is intrinsic to the language because the semantics of the UNION and OPTIONAL operators (i.e., they can generate unbounded variables). Hence, we restrict our study by avoiding graph patterns of this type.

Another issue concerns the use of correlated variables in the projection part of a nested SELECT query. Consider the graph pattern

((?X p ?Y ) FILTER ?Y = SOME (SELECT ?Y WHERE (?Z q ?Y ))): Note that, the use of variable ?Y as a projected-variable in the nested SELECT query, generates ambiguity about its scope. In fact, it is not clear whether ?Y must be considered local to the inner query or it occurs as correlated with the outer query. To minimize the possibility of confusion, the scope of a variable will be interpreted using the nearest result query form possible (i.e., the nearest SELECT). Hence, variable ?Y is local in the inner query of the example. 3

Examples of nested queries in SPARQL

Let G1; G2 be two RDF graphs identi ed by IRIs foaf and bib respectively. G1 contains personal information using the FOAF vocabulary 7. G2 contains bibliographic information using the bibTex Vocabulary 8. Consider the following examples of nested queries. 7 http://xmlns.com/foaf/spec/ 8 http://zeitkunst.org/bibtex/0.1/ Example 1. The oldest people.

SELECT ?Per1 FROM foaf WHERE ((?Per1 foaf:age ?Age1)

FILTER (:(?Age1 < SOME ( SELECT ?Age2 FROM foaf

WHERE (?Per2 foaf:age ?Age2))))) Example 2. The youngest people.

SELECT ?Per1 FROM foaf WHERE ((?Per1 foaf:age ?Age1)

FILTER ( ?Age1 ALL ( SELECT ?Age2 FROM foaf

WHERE (?Per2 foaf:age ?Age2)))) Example 3. Mails of people being part of at least one group.

SELECT ?Mail FROM foaf WHERE ((?Per foaf:mbox ?Mail)

FILTER (?Per IN ( SELECT ?Mem FROM foaf

WHERE (?Mem foaf:member ?Group)))) Example 4. Mails of people having at least one publication.

SELECT ?Mail FROM foaf WHERE ((?Per foaf:mbox ?Mail)

FILTER ( EXISTS (ASK FROM bib

WHERE (?Art bib:has-author ?Per)))))

The above examples deserve several comments. IN expressions are less expressive than SOME expressions because the former are restricted to equality of values, whereas the latter allows all scalar comparison operators. Nested queries with SOME = ALL operators without correlated variables are better for query composition, i.e., simple and direct copy/paste of queries. The use of EXISTS is not adequate for distributed queries because it needs correlated variables to make sense. This helps the user to express complex queries but makes the evaluation harder (because the application of the nested iteration method). 4

Equivalences among nested queries

In this section we present transformations among nested queries. We will show that all types of nested queries can be simulated by lter conditions with the EXISTS operator. Several equivalences presented in this section are well know in SQL [3]. 4.1

Normalization

In order to simplify the transformations, we will avoid complex lter constraints, i.e., expressions of the form C1 ^ C2 and C1 _ C2 where C1 and C2 are lter constraints. This assumption is supported by the following lemma. Lemma 1. Every graph pattern having complex lter constraints can be transformed in a graph pattern without complex lter constraints 9.

Proof. Let P be a graph pattern and C, C1, C2 be lter constraints. The lemma is supported by the following equivalences: (P FILTER(C1 ^ C2))

((P FILTER C1) FILTER C2) (P FILTER(C1 _ C2))

((P FILTER C1) UNION(P FILTER C2)) (P FILTER(:C))

(P MINUS(P FILTER C)) Is not hard to see that the equivalences holds. 4.2

Transformations

Consider the following de nition of query equivalence. ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 ) ( 8 )

De nition 2 (Equivalence of queries). Two graph patterns P1 and P2 are

equivalent, denoted P1 P2, if and only if JP1KGD = JP2KGD for every RDF dataset D with active graph G. Additionally, given two queries Q = (R; F; P ) and Q0 = (R; F; P 0), we say that Q and Q0 are equivalent, denoted Q Q0, if and only if P P 0.

Next we will de ne transformations among several types of nested queries. Based on transformations de ned in Section 4.1, we assume that queries do not contain complex lter constraints.

Proposition 1 (Transforming IN queries). Let P be a pattern of the form

(P1 FILTER(u IN fQ2g)) where u 2 T . Then, P is equivalent to expression: (P1 FILTER(u = SOMEfQ2g)) Proposition 2 (Transforming SOME queries). Let P be a pattern of the form (P1 FILTER(u SOME(Q2))) where u 2 T , Q2 = (SELECT ?X2; F2; P2), and ^ is the inverse operator to . Then, P is equivalent to the following expressions:

(P1 FILTER(:(u ^ ALL(Q2)))) (P1 FILTER EXISTS(ASK; F2; (P2 FILTER(u ?X2))))

Proposition 3 (Transforming ALL queries). Let P be a pattern of the form

(P1 FILTER(u ALL(Q2))) where u 2 T , Q2 = (SELECT ?X2; F2; P2), and ^ is the inverse operator to . Then, P is equivalent to the following expressions: (P1 FILTER(:(u ^ SOME(Q2)))) (P1 FILTER(: EXISTS(ASK; F2; (P2 FILTER(u ^ ?X2))))) 9 Lemma 1 is true under set semantics. The inclusion of bag semantics, as de ned for SPARQL, introduces complexity issues which are not discussed here.

For example, the following queries show the application of transformations ( 7 ) and ( 8 ) to the query of Example 2.

Example 5. The youngest people (using the SOME operator).

SELECT ?Per1 FROM foaf WHERE ((?Per1 foaf:age ?Age1)

FILTER ( :(?Age1 > SOME (SELECT ?Age2 FROM foaf

WHERE (?Per1 foaf:age ?Age2))))) Example 6. The youngest people (using the EXISTS operator).

SELECT ?Per1 FROM foaf WHERE ((?Per1 foaf:age ?Age1)

FILTER (: EXISTS (ASK FROM foaf

WHERE ((?Per2 foaf:age ?Age2)

FILTER (?Age1 > ?Age2)))))))

From the transformations de ned above we can present the following result. Theorem 1. Nested queries using SOME, ALL and IN can be simulated by using nested queries with the EXISTS operator. 5

Conclusions

We have studied how to extend SPARQL to support nesting along the design philosophy of SQL. We showed that there is a simple syntax and semantics for such extensions in SPARQL. We have shown that incorporating ASK queries in FILTERS (through the EXISTS operator) gives the full power and exibility of SQL nesting, allowing additionally to extend the semantics of SPARQL in a clean and modular form. The proposal presented here permits a simple and direct implementation of nested queries as known in the relational world (and hence by known translation results) in the SPARQL world.

Future work. An interesting problem studied in the database literature is the e cient implementation of nested queries, where a well known approach is the development of algorithms which transform nested queries into equivalent non-nested queries which can be processed more e ciently by query-processing subsystems [8,5]. On this line, most results are concentrated on aggregate subqueries; optimization of non-aggregate subqueries has some limitations, specially for queries with multiple subqueries and null values [2].

Although decorrelation often results in cheaper non-nested plans, decorrelation is not always applicable, and even if applicable may not be the best choice in all situations since decorrelation carries a materialization overhead [15,6]. In this direction, the issue of e cient methods of processing nested queries is one of the main problems to be addressed in future works on this topic. Acknowledgments. C. Gutierrez was supported by FONDECYT projects No. 1070348 and No. 1090565. The authors wish to thank the reviewers for their comments.

Angles and

Gutierrez . The Expressive Power of SPARQL . In Proceedings of the 7th International Semantic Web Conference (ISWC) , number 5318 in LNCS , pages 114 { 129 , 2008 .

Cao and

Badia . A nested relational approach to processing SQL subqueries . In Proc. of the 2005 ACM SIGMOD international conference on Management of data , pages 191 { 202 , New York, NY, USA, 2005 . ACM Press.

Ceri and

Gottlob . Translating SQL into relational algebra: optimization, semantics, and equivalence of SQL queries . IEEE Transactions on Software Engineering , 11 ( 4 ): 324 { 345 , 1985 .

Cyganiak . A relational algebra for SPARQL . Technical Report HPL-2005-170 ,

Labs , 2005 .

R. A.

Ganski and H. K. T. Wong . Optimization of nested SQL queries revisited . In Proceedings of the 1987 ACM SIGMOD international conference on Management of data , pages 23 { 33 , New York, NY, USA, 1987 . ACM Press.

Guravannavar ,

H. S.

Ramanujam , and

Sudarshan . Optimizing nested queries with parameter sort orders . In Proc. of the 31st Int. Conf. on Very large Data Bases (VLDB) , pages 481 { 492 . VLDB Endowment, 2005 .

Harris and

Seaborne . SPARQL 1 .1 Query. W3C Working Draft. http://www.w3.org/TR/2009/WD-sparql11 - query-20091022/, October 22 2009 .

Kim . On optimizing an SQL-like nested query . ACM Transactions on Database Systems (TODS) , 7 ( 3 ): 443 { 469 , 1982 .

Kjernsmo and A. Passant. SPARQL New Features and Rationale . W3C Working Draft. http://www.w3.org/TR/2009/WD-sparql-features- 20090702 /, July 2 2009 .

10. G. Klyne and

Carroll . Resource Description Framework (RDF) Concepts and Abstract Syntax . http://www.w3.org/TR/2004/REC-115 - concepts-20040210/, February 2004 .

11. J. Perez , M.

Arenas , and C.

Gutierrez . Semantics and Complexity of SPARQL . In Proceedings of the 5th International Semantic Web Conference (ISWC) , number 4273 in LNCS , pages 30 { 43 . Springer-Verlag, 2006 .

12.

Polleres . From SPARQL to Rules (and back) . In Proceedings of the 16th International World Wide Web Conference (WWW) , pages 787 { 796 . ACM, 2007 .

13. E. Prud

'hommeaux and

Seaborne. SPARQL Query

Language for RDF. W3C Recommendation 15 January . http://www.w3.org/TR/2008/REC-115 - sparqlquery-20080115/, 2008 .

14.

Schenk. A SPARQL Semantics Based on Datalog . In 30th Annual German Conference on Advances in Arti cial Intelligence (KI) , volume 4667 of LNCS , pages 160 { 174 . Springer, 2007 .

15.

Seshadri ,

Pirahesh , and

T. Y. C.

Leung . Complex query decorrelation . In Proc. of the 12th Int. Conf. on Data Engineering (ICDE) , pages 450 { 458 . IEEE Computer Society, 1996 .

16.

Weinberg ,

Gro , and A. Oppel. SQL , The Complete Reference . McGraw-Hill , 2010 .