SPARQL with Qualitative and Quantitative Preferences Position Paper Marina Gueroussova1 , Axel Polleres2 , and Sheila A. McIlraith1 1 Department of Computer Science, University of Toronto, Toronto, Canada 2 Vienna University of Economics and Business, Welthandelsplatz 1, 1020 Vienna, Austria Abstract. The volume and diversity of data that is queriable via SPARQL and its increasing integration motivate the desire to query SPARQL information sources via the specification of preferred query outcomes. Such preference-based queries support the ordering of query outcomes with respect to a user’s measure of the quality of the response. In this position paper we argue for the incorporation of preference queries into SPARQL. We propose an extension to the SPARQL query language that supports the specification of qualitative and quantitative preferences over query outcomes and examine the realization of the resulting preference- based queries via off-the-shelf SPARQL engines. 1 Introduction Once the sole purview of IT departments, today’s data is produced, shared, and con- sumed by a diversity of stakeholders – corporate and consumer. It is stored in a variety of formats, dynamically integrated, and queried by a variety of users, many of whom are largely unfamiliar with the content of those sources. As such, querying today’s data sources using standard query languages is evolving into a manual/iterative search pro- cess, in which repeated queries are devised to hone in on outcomes that meet some criteria. Consider looking for a car to buy on the web – one might prefer a car with a powerful engine, but only if it’s a hybrid, or failing that an electric car if it’s under a cer- tain price, and so on. Such a query requires not only the specification of the information to be returned, but also specification of an ordering or preference over what is returned. With these fundamental changes in the nature of data management and querying comes the need for query languages and engines that are better suited to the specification of and search for high-quality outcomes. Preferences have been a long studied subject across many fields including eco- nomics, philosophy, and artificial intelligence. Within the database community there is a growing literature on preferences (e.g., [19]). Indeed, both Chomicki [5, 3] and independently, Kießling, Endres, and Wenzel [10] have proposed extensions to SQL that support the specification of quantitative (e.g., top-k [9]) and qualitative (e.g., sky- line [1]) SQL queries. Top-k queries use a scoring function to determine an ordering over query results. In contrast, skyline queries filter a dataset with respect to a set of preference relations, returning a set of undominated tuples. Our concern in this paper is with SPARQL [8] and with the provision of a means of succinctly specifying queries that will enable a user to search for data sources (SPARQL 2 endpoints) and data content that is tailored to their individual preferences, and in turn for SPARQL query engines to return ordered outcomes that reflect those preferences. Within the semantic web community, there has been significant recent work on the com- putation of top-k queries (e.g., [14, 2, 13, 20]), but little on how to extend the expres- siveness of SPARQL to address a broad spectrum of qualitative and quantitative pref- erences. As such, (qualitative) preference-based querying is often realized by multiple lengthy queries that stipulate different combinations of hard constraints, or via “stan- dard tricks3 ” such as “stacking” OPTIONAL patterns, as illustrated in the following example which preferably returns the email address of my friends, and the homepage if there is no email. SELECT ?Contact WHERE { me foaf:knows ?X OPTIONAL {?X foaf:mbox ?Contact} OPTIONAL {?X foaf:homepage ?Contact} } In 2006 Siberski, Pan, and Thaden introduced the notion of qualitative preferences into SPARQL queries [18]. In that work they realized their preference queries through the development of solution modifiers. Interestingly, their work was done before the semantics of OPTIONAL was established, and it was our hypothesis that much of what they did in 2006 with solution modifiers could be done in native SPARQL 1.0 and SPARQL 1.1 through rewriting. Building on that work, we propose an extension of the SPARQL query language, PrefSPARQL that, like Siberski et. al.’s work, builds on the vetted work on SQL preference queries, extending it here to support the expression of conditional preferences. In Section 3 we focus on the realization of qualitative prefer- ences, and in particular on skyline and conditional preferences, showing how they can be rewritten into SPARQL 1.1 (and also SPARQL 1.0) and thus realized by existing SPARQL query engines. The work presented here is only the first step of a larger en- deavour that will see the extension of the presented query grammar with a number of interesting features, and the development of optimized query processing techniques that are tailored to the efficient computation of preference-based SPARQL queries. 2 PrefSPARQL Syntax In this section we propose a core grammar for PrefSPARQL that addresses a selection of qualitative and quantitative preferences in support of specifying preferred SPARQL query outcomes. We illustrate our grammar with respect to skyline and conditional preference queries. Skyline queries for relational databases have been the subject of significant research (e.g., [5, 1]), and refer to a set of results that are no worse than any other result across all dimensions of a set of independent boolean or numerical preferences [1]. Skyline queries have been studied extensively in the SQL context by Chomicki [4, 5], and by Kießling, Endres, and Wenzel as an essential part of their Preference SQL language and system [10, 11]. We build upon an earlier approach to adopt features of Preference SQL in SPARQL, by Siberski, Pan, and Thaden [18]. In particular, we extend the SPARQL query lan- guage in a similar fashion to the proposal in [18]; we also use ‘AND’ to separate 3 e.g., http://answers.semanticweb.com/questions/20682/preference-patterns-for-sparql-11 3 independent dimensions of skyline queries. The key differences in our proposal are that, firstly, we add preferences at the level of filters (production 68 of the SPARQL grammar [8, Section 19]) rather than as solution modifiers, with the justification that preferences semantically filter the solution set rather than ‘ordering’ or ‘slicing’ it. This approach makes preferences usable inside any patterns in a nested fashion as op- posed to just at the end of queries.4 Secondly, we follow the latest version of Pref- erence SQL [10] in replacing ‘CASCADE’ with ‘PRIOR TO’. Furthermore, we sup- port additional atomic preference constructs such as ‘AROUND’, ‘MORE THAN’, ‘LESS THAN’, and ‘BETWEEN’; ‘x BETWEEN (Low,High)’ differs from writ- ing ‘((x >= Low) && (x <= High))’ in SPARQL in that – in the absence of a value in the chosen interval – ‘BETWEEN’ will return the closest value to Low (or High, resp.); AROUND(x), MORE THAN(x), and LESS THAN(x) are analogous to BETWEEN(x, x), BETWEEN(x, ∞) and, BETWEEN(−∞, x) respectively. Finally, we augment the grammar with conditional (IF-THEN-ELSE) preferences. We note that, given the availability of conditional preferences, BETWEEN (as well as AROUND, MORE THAN, and LESS THAN) come for free5 . Filter ::= ’FILTER’ Constraint | ’PREFERRING’ ’(’ MultidimensionalPref ’)’ MultidimensionalPref ::= PrioritizedPref (’AND’ PrioritizedPref)* PrioritizedPref ::= ConditionalOrAtomicPref (’PRIOR TO’ ConditionalOrAtomicPref)* ConditionalOrAtomicPref ::= ConditionalPref | AtomicPref ConditionalPref ::= ’IF’ Expression ’THEN’ ConditionalOrAtomicPref ’ELSE’ ConditionalOrAtomicPref AtomicPref ::= Expression | HighestPref | LowestPref | AroundPref | BetweenPref | MoreThanPref | LessThanPref HighestPref ::= ’HIGHEST’ Expression LowestPref ::= ’LOWEST’ Expression AroundPref ::= Expression ’AROUND’ Expression BetweenPref ::= Expression ’BETWEEN’ ’(’ Expression ’,’ Expression ’)’ MoreThanPref ::= Expression ’MORE THAN’ Expression LessThanPref ::= Expression ’LESS THAN’ Expression To illustrate this, consider a modification of the example from [18]. The knowledge base contains therapist ratings and their appointment offerings (in Turtle syntax). @prefix : . :mary a :therapist ; :rated :excellent ; :offers :appointment1, :appointment2 . :appointment1 :day "Tuesday"; :starts 1500; :ends 1555 . :appointment2 :day "Sunday"; :starts 1600; :ends 1655 . 4 While with the addition of subqueries in SPARQL1.1 this does not add to language expressiv- ity, it still allows one to write certain preference queries more concisely. 5 x BETWEEN(Low,High) can be viewed as syntactic sugar for LOWEST (IF x >= Low && x <= High THEN 0 ELSE IF x < Low THEN Abs(Low−x) ELSE Abs(High−x) ). 4 For the case of the skyline preference, query Q1 prefers excellent therapists, ap- pointments around lunchtime (between 12:00 and 13:00) over those outside lunchtime (written as a BETWEEN preference), and later appointments over earlier ones provided that both are equal with respect to lunchtime. SELECT ?A WHERE { ?T :rated ?R; :offers ?A. ?A :starts ?S; :ends ?E . PREFERRING ( ?R = excellent AND ( ?S BETWEEN(1200,1300) AND ?E BETWEEN(1200,1300) PRIOR TO HIGHEST ?E ) ) } For the case of conditional preferences, in query Q2 we prefer appointments before 6PM on the weekends, and appointments after 6PM on the weekdays. SELECT ?A WHERE { ?A :day ?D; :starts ?S. PREFERRING ( IF (?D = "Saturday" || ?D = "Sunday") THEN ?S < 1800 ELSE ?S >= 1800 ) } 3 Realizing PrefSPARQL Through Query Rewriting As opposed to a contrary conjecture in [18], we argue that preferences such as the ones presented in Section 2 can be expressed in SPARQL itself by the following high-level translation schema, where P is a SPARQL pattern and P ref is a ‘PREFERRING’ clause as defined in the grammar above. P PREFERRING P ref P FILTER NOT EXISTS {P 0 FILTER (tr(P, P 0 , P ref ))} Here, P 0 is a copy of P where all variables are renamed with fresh variables. The high-level idea here is that we reformulate PREFERRING clauses as FILTERs, asking for the non-existence of a solution to the pattern P 0 which dominates the solution of P according to P ref . This simple recipe, along with a translation of the domination relation as a FILTER expression, suffices to express the intended semantics of returning only dominating solutions. While we do not provide the full translation function tr(P, P 0 , P ref ) of PREFER- RING clauses in this short position paper (see our technical report [7] for the transla- tion), let us illustrate the feasibility by means of some examples. As for skyline queries, we take a simplified version of Q1 from above without BETWEEN,6 where we prefer appointments outside rush hour, i.e., either starting after 18:00 or ending before 16:00, instead of over lunchtime. SELECT ?A WHERE { ?T :rated ?R; :offers ?A. ?A :starts ?S; :ends ?E . PREFERRING ( (?R = excellent) AND ((?S >= 1800) || ?E <= 1600)) PRIOR TO HIGHEST ?S )} 6 As mentioned before BETWEEN can be translated using conditional preferences, which we will illustrate with the second example. 5 This query translates to the following SPARQL1.1 query: 1 SELECT ?A WHERE { ?T :rated ?R; :offers ?A. ?A :starts ?S; :ends ?E . 2 BIND ( (?R = pt:excellent) AS ?Pref1 ) 3 BIND ( (?S >= 1800 || ?E <= 1600) AS ?Pref2 ) 4 BIND ( ?S AS ?Pref3 ) 5 NOT EXISTS { 6 ?T_ :rated ?R_; :offers ?A_. ?A_ :starts ?S_; :ends ?E_ . 7 BIND ( (?R_ = :excellent) AS ?Pref1_ ) 8 BIND ( (?S_ >= 1600 || ?E_ <= 1600) AS ?Pref2_ ) 9 BIND ( ?S_ AS ?Pref3_ ) 10 FILTER( 11 ( (?Pref1_ > ?Pref1) && 12 !((?Pref2_ < ?Pref2) || (?Pref3_ < ?Pref3 && ?Pref2 = ?Pref2_ ))) 13 || 14 ( !(?Pref1_ < ?Pref1) && 15 ((?Pref2_ > ?Pref2) || (?Pref3_ > ?Pref3 && ?Pref2 = ?Pref2_ )))) }} In the copied pattern P 0 we replace every variable by a ‘copy’ appending ’ ’ to the variable name. In both P and P 0 we bind every atomic expression in P ref to a new variable (lines 2-4 and 7-9); then we use these variables to build up a FILTER expression that filters out ‘dominating’ witnesses for P ; if no such witness exists, P survives. Let us explain briefly the rationale behind the translation of preference dominance in lines 11-15: the two dimensions (AND) of the skyline preference – i.e., that a dominating solution is better in at least one dimension and no worse in the others – are encoded in the two branches in lines 11-12 and lines 14-15, whereas the nested cascading (PRIOR TO) preference between P ref 2 and P ref 3 is encoded in the boolean expressions in lines 12 and 15, respectively. Let us emphasize that the translation would also work without BINDs; the respec- tive expressions could just be copied, although this would make the translated query more confusing. Similarly, NOT EXISTS could be replaced by a well-known combi- nation of OPTIONAL and FILTER(!bound) (cf. [17, Section 11.4.1]) to emulate set difference, which would make the whole query also expressible in SPARQL1.0. Conditional preferences can be encoded in SPARQL1.1 conveniently using the IF(·, ·, ·) function in filters, as shown in the translation of query Q2 from above. 1 SELECT ?A WHERE { 2 ?T :rated ?R; :offers ?A. ?A :day ?D; :starts ?S. 3 BIND ( IF( (?D = "Sunday" || ?D = "Saturday"), ?S < 1800, ?S >= 1800) 4 AS ?Pref1) 5 NOT EXISTS { ?T_ :rated ?R_; :offers ?A_. ?A_ :day ?D_; :starts ?S_. 6 BIND ( IF( (?D_ = "Sunday" || ?D_ = "Saturday"), ?S_ < 1800, ?S_ >= 1800) 7 AS ?Pref1_) 8 FILTER (?Pref1_ > ?Pref1)}} We note that the IF(·, ·, ·) function is – in principle – again syntactic sugar that can be emulated in SPARQL1.0 as well, such that in summary our presented preferences can be both implemented in rewritings to SPARQL1.1 and SPARQL1.0. 4 Summary and Discussion In this position paper we argued for (re-)considering preferences in SPARQL queries. Given the vast amount of data being subjected to SPARQL queries, we can conceive many examples where complex preferences would be needed to find the “needle in the 6 haystack”, with the user specifying preferences not only over query results but also over the sources (SPARQL endpoints) and provenance of data used to produce those results. Here we proposed a core grammar for PrefSPARQL, an extension to SPARQL 1.1 that supports the expression of preferred query results. Our language builds on established work on SQL preferences and on an earlier effort by Siberski et al. [18], extending it with conditional preferences. We further argued, contrary to the conjecture of Siberski et al., that these preference queries can be directly expressed in both SPARQL1.0 and SPARQL1.1 using OPTIONAL queries or novel features of SPARQL1.1, such as NOT EXISTS. We illustrated such a realization with respect to skyline and conditional pref- erence queries. We acknowledge that, at the time Siberski and colleagues’ work was performed, the semantics of SPARQL1.0 was not yet fully defined and SPARQL1.1 was still far off on the horizon. Nevertheless, we argue that this topic needs further attention, since preference queries implemented naively by rewriting in SPARQL might become prohibitively costly. In particular, we emphasize that all the examples we gave in this paper, even those ex- pressing simple preferences by “stacking OPTIONALs”, as mentioned in Section 1, or, respectively all our example translations would produce so called non-well-designed patterns [15, 12] in SPARQL. We therefore plan to further investigate relaxations of the well-designedness restriction, which still enable efficiently evaluable preference queries. The specification and efficient realization of preference-based SPARQL queries is an important topic that is worthy of further exploration. As this work is ongoing, con- siderations for expanding it include the addition of quantitative preferences in the form of top-k queries, ranking within a skyline, preferences over endpoints in the context of the SPARQL1.1 Federation extension [16], and SPARQL endpoint discovery (e.g. by preferences over Service descriptions [21]) as well as interaction of preferences with Entailment Regimes [6]. Further details relating to this paper can be found in [7]. Acknowledgements We wish to thank Wolf Siberski, Jeff Pan, and Florian Wenzel for their assistance. This work has been partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), an Ontario Graduate Scholarship (OGS), and by the Vi- enna Science and Technology Fund (WWTF) through project ICT12-015. References [1] Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proc. of the 17th Int’l Conference on Data Engineering (ICDE). pp. 421–430 (2001) [2] Bozzon, A., Valle, E.D., Magliacane, S.: Extending SPARQL algebra to support efficient evaluation of top-k SPARQL queries. In: Ceri, S., Brambilla, M. (eds.) Search Computing – Broadening Web Search, pp. 143–156. Springer Lecture Notes in Computer Science (2012) [3] Chomicki, J.: Querying with intrinsic preferences. In: Proc. of the 8th Int’l Conference on Extending Database Technology (EDBT). pp. 34–51 (2002) [4] Chomicki, J.: Preference formulas in relational queries. ACM Trans. on Database Systems (TODS) 28(4), 427–466 (2003) 7 [5] Chomicki, J.: Logical foundations of preference queries. IEEE Data Engineering Bulletin 34(2), 3–10 (2011) [6] Glimm, B., Ogbuji, C.: SPARQL 1.1 Entailment Regimes (2013), W3C Recommendation [7] Gueroussova, M., Polleres, A., McIlraith, S.A.: SPARQL with qualitative and quantitative preferences (extended report). Tech. Rep. CSRG-619, Department of Computer Science, University of Toronto (October 2013) [8] Harris, S., Seaborne, A.: SPARQL 1.1 Query Language (2013), W3C Recommendation [9] Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys 40(4), 11:1–11:58 (Oct 2008) [10] Kießling, W., Endres, M., Wenzel, F.: The preference SQL system - an overview. IEEE Data Engineering Bulletin 34(2), 11–18 (2011) [11] Kießling, W., Köstler, G.: Preference SQL - design, implementation, experiences. In: Proc. of 28th Int’l Conference on Very Large Data Bases (VLDB). pp. 990–1001 (2002) [12] Letelier, A., Pérez, J., Pichler, R., Skritek, S.: Static analysis and optimization of semantic web queries. In: Proc. of the 31st Symposium on Principles of Database Systems (PODS). pp. 89–100 (2012) [13] Magliacane, S., Bozzon, A., Valle, E.D.: Efficient execution of top-k SPARQL queries. In: Proc. of the 11th Int’l Semantic Web Conference (ISWC). pp. 344–360 (2012) [14] Montoya, G., Vidal, M.E., Corcho, Ó., Ruckhaus, E., Aranda, C.B.: Benchmarking fed- erated SPARQL query engines: Are existing testbeds enough? In: Proc. of the 11th Int’l Semantic Web Conference (ISWC). pp. 313–324 (2012) [15] Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. on Database Systems (TODS) 34(3) (2009) [16] Prud’hommeaux, E., Buil-Aranda, C.: SPARQL 1.1 Federated Query (2013), W3C Rec- ommendation [17] Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF (2008), W3C Rec- ommendation [18] Siberski, W., Pan, J.Z., Thaden, U.: Querying the semantic web with preferences. In: Inter- national Semantic Web Conference. pp. 612–624 (2006) [19] Stefanidis, K., Koutrika, G., Pitoura, E.: A survey on representation, composition and ap- plication of preferences in database systems. ACM Trans. on Database Systems (TODS) 36(3), 19 (2011) [20] Wagner, A., Tran, D.T., Ladwig, G., Harth, A., Studer, R.: Top-k linked data query process- ing. In: Proceedings of the 9th Extended Semantic Web Conference (ESWC). pp. 56–71 (2012) [21] Williams, G.: SPARQL 1.1 Service Description (2013), W3C Recommendation