<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fixing Comparative Preferences for SPARQL?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter F. Patel-Schneider</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel Polleres</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Martin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>NAIL Laboratory</institution>
          ,
          <addr-line>Nuance Communications, Sunnyvale, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Stanford University</institution>
          ,
          <addr-line>CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vienna Univ. of Economics &amp; Business / Complexity Science Hub Vienna</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Preferences have been part of the goal of the Semantic Web from its inception [1] but are not currently part of any Semantic Web standards, such as SPARQL. Several proposals [2-4] have been made to add comparative preferences to SPARQL. Comparative preferences are based on comparing solutions to a query and eliminating ones that come out worse in the comparison, as in searching for gas stations and eliminating any for which there is a closer station serving the same brand of gasoline. The proposals each add an extra construct to SPARQL filtering out non-preferred query solutions. Their preference constructs are of different expressive power but they can each be thought of as providing a skyine operator, defined as follows: Definition 1. Given a set of (potential) solutions P , a preference relation relation over P P. A solution s1 is dominated by a solution s2 if s2 s1. Definition 2. (Adapted from Chominki [5].) If S is a finite set of (candidate) solutions and a preference relation over some superset of S then the skyline (the preferred solutions) of S with respect to is ! (S) = fs 2 Sj:9s0 2 S:s0 That is, a solution is retained only if there is no solution that dominates it according to the preference relation. So to prefer the closest gas station for each brand, the preference relation would state that one gas station dominates a second if they both serve the same brand of gas and the first is closer than the second. The most general of the proposals to add preferences to SPARQL is SPREFQL [4], where the preference relation can be any SPARQL expression. A query in SPREFQL preferring the closest gas station for each brand would be</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>SPREFQL</title>
      <p>SELECT ?X ?B ?D
WHERE f ?X a :GasStation ; :brand ?B ; :dist ?D . g
PREFER ?X1 ?B1 ?D1 TO ?X2 ?B2 ?D2</p>
      <p>IF ?B1 = ?B2 &amp;&amp; ?D1 &lt; ?D2
? For technical details, including formal statements of all the theorems and their proofs see the
technical report available at http://polleres.net/publications/patel-schneider-etal-2018TR.pdf
?? Axel Polleres’ work was supported under the Distinguished Visiting Austrian Chair Professors
program hosted by The Europe Center of Stanford University.</p>
      <p>The SPARQL expression in the PREFER construct has access to two candidate solution
mappings through the two lists of variables in the PREFER construct. A candidate
solution mapping S1 dominates candidate solution mapping S2 if the expression in the
PREFER construct evaluates to true when the first list of variables in the PREFER are
bound using S1 and the second are bound using S2. The PREFER construct selects those
solution mappings (hereafter just solution) that have no dominating solution, which in
this example means that the preferred solutions are those for which there is no solution
with the same brand and a smaller distance.</p>
      <p>Special macros are provided in SPREFQL for preferences combining multiple
preferences, including the conjunction of two preferences (e.g., prefer closer and cheaper
gas stations) and the cascading of preferences (e.g., prefer cheaper gas stations but
among those with the same price prefer the closer ones).</p>
      <p>
        Troumpoukis et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provide a mapping from their extension back into SPARQL
1.1, mapping
      </p>
      <sec id="sec-1-1">
        <title>SELECT V WHERE f P g PREFER V1 TO V2 IF C</title>
        <p>to</p>
        <p>
          SELECT V WHERE f P FILTER NOT EXISTS fP(V=V1) FILTER C(V2=V )g g
Unfortunately this mapping is not correct because of problems in the definition of
EXISTS in SPARQL [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Certain constructs occuring in C, such as BOUND, are
handled incorrectly inside of EXISTS in SPARQL. Fortunately there is a simple fix for
this problem, namely using a well-known replacement for EXISTS, already proposed
by Gueroussova et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] as a mapping from their preferences into SPARQL 1.0. The
mapping below uses a variant of their mapping.
        </p>
        <sec id="sec-1-1-1">
          <title>Mapping 1 (Simple Mapping to SPARQL)</title>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>SELECT V WHERE f P OPTIONAL f P(V=V1) FILTER ( C(V2=V ) ) BIND 1 TO ?exists g FILTER (!BOUND(?exists)) g</title>
        <p>The OPTIONAL part only binds a value to ?exists when a dominator exists so the final
filter only lets through solutions that are not dominated.</p>
        <p>
          Troumpoukis et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] only consider the solutions in the skyline as defined above. It
can be useful to define a second skyline as the skyline after the first skyline is removed,
and so on. The rank of a solution can then be defined as the number of the skyline it
is in. A simple extension to SPREFQL, adding LIMIT SKYLINE n [AS vrank] and
LIMIT SKYLINE ALL AS vrank modifiers to the PREFER construct, could produce the
first n skylines and optionally the rank of each solution or all solutions with their rank.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Preference Cycles</title>
      <p>There is a very serious problem in SPREFQL, however. Any solution that is in a
preference cycle, i.e., the solution is in a cycle of solutions where each one dominates the
next, will not be a preferred solution, even if the cycle is just the solution dominating
itself. Not only will such solutions not be in the first skyline, they will not be in any
skyline, nor will any solution that they transitively dominate.</p>
      <p>
        Some comparative preference languages, such as PrefSPARQL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], are built up from
metrics (like closest distance) and cannot have preference cycles. However, in
SPREFQL it is easy to construct preference cycles. Simply preferring a particular brand, as
in the following SPREFQL query
      </p>
      <p>SELECT ?X ?B ?D
WHERE f ?X a :GasStation ; :brand ?B ; :dist ?D . g</p>
      <p>PREFER ?X1 ?B1 ?D1 TO ?X2 ?B2 ?D2 IF ?B1 = :Mobil
results in any solution for brand Mobil being preferred to itself and if there are any
solutions with brand Mobil there will be no preferred solutions at all.</p>
      <p>It is easy for users to provide preferences that produce cycles for certain data,4 For
example having preferences about brands associated with gas stations, as in
SELECT ?X ?B ?F ?T
WHERE f?X a :GasStation; :brand ?B; :shop ?F; :antifreeze ?A.g
PREFER ?X1 ?B1 ?F1 ?A1 TO ?X2 ?B2 ?F2 ?A2
IF ( (?B1 = :Mobil &amp;&amp; ?B2 = :Chevron) AND
(?F1 = :KwikieMart &amp;&amp; ?F2 = :711) AND
(?A1 = :Xerex &amp;&amp; ?A2 = :Prestone) )
results in a preference cycle if there is a Mobil station with a TigerMart that sells
Prestone antifreeze; a Chevron station with a KwikieMart that sells Delo antifreeze; and a
gas station with a 7-11 that sells Xerex antifreeze.</p>
      <p>To handle preference cycles, it is necessary to change the definition of skyline so
that a preferred solution is no longer just one with no dominating solution. Instead
a preferred solution is one where every solution that transitively dominates it is also
transitively dominated by it (i.e., is in a preference cycle with it):
Definition 3. If S is a finite set of solutions and
superset of S then the skyline of S with is
a preference relation over some
!l (s) = fs 2 Sj:9s0 2 S:(s0
s ^ :(s
s0))g
where is the transitive closure of over candidate solutions, i.e., there is a sequence
of candidate solutions, each dominating the next.</p>
      <p>This simple change means that there are always preferred solutions in any
nonempty, finite set of solutions and that every solution in a finite set of solutions has a
finite rank. This repairs the problem in SPREFQL at the expense of complicating the
definition of preferred solutions.</p>
      <p>This complication matters, as there is no mapping to a single SPARQL query for
the correct definition of skyline, as the query would have to examine all the candidate
solutions looking for preference cycles. If, however, multiple queries are allowed it is
possible to use two CONSTRUCT queries to create the preference relation in an RDF
graph and then a SELECT query to return the preferred solutions from this graph.
4 We do not argue here that such preferences are rational, but simply observe that people have
preferences of these forms.</p>
      <p>
        Constructing the domination relation in this way would be quite inefficient, but it is
possible to implement this correct definition of skylines as an extension of SPARQL by
using a simple algorithm that actually computes the rank of each solution. The core of
the algorithm collapses preference cycles into a single representative using the
unionfind algorithm [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] resulting in cheap determination of whether a solution is in a
preference cycle with another.
      </p>
      <p>In the worst case this algorithm requires O(n2) time (to compute the domination
relationships between each pair of candidate solutions), which is no worse than the
algorithm required for SPREFQL. There is a difference in expected time, however, as
the correct algorithm cannot stop considering a solution as soon as it finds a dominator
of it but instead has to also check whether this dominator is in a cycle.</p>
      <sec id="sec-2-1">
        <title>Transitive Preferences</title>
        <p>If the preference relation is transitive, but still can have loops, then there is a simpler
skyline definition that is correct:
Definition 4. If S is a finite set of (candidate) solutions and
relation over some superset of S then the skyline of S with
a transitive preference
!t (S) = fs 2 Sj:9s0 2 S:(s0
s ^ :(s
s0))g
This definition exploits the fact that the transitive closure of a transitive relation is the
relation itself. There is a mapping to a single SPARQL query for this definition of
skyline, that is only a slight modification of Mapping 1.</p>
        <p>With these corrections SPREFQL is well behaved on any preference relation. As
well, the useful notion of the rank of a solution under a preference relation can be
defined as an extension to SPREFQL. The corrected version cannot always be translated
to a single SPARQL query but there is still an effective algorithm for computing the rank
of each solution in a solution set under a preference relation.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The semantic web</article-title>
          .
          <source>Scientific American</source>
          <volume>284</volume>
          (
          <issue>5</issue>
          ) (
          <year>2001</year>
          )
          <fpage>28</fpage>
          -
          <lpage>37</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Siberski</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thaden</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Querying the semantic web with preferences</article-title>
          .
          <source>In: The 5th International Semantic Web Conference (ISWC</source>
          <year>2006</year>
          ).
          <article-title>(</article-title>
          <year>2006</year>
          )
          <fpage>612</fpage>
          -
          <lpage>624</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gueroussova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McIlraith</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          :
          <article-title>SPARQL with qualitative and quantitative preferences</article-title>
          .
          <source>In: 2nd International Workshop on Ordering and Reasoning (OrdRing</source>
          <year>2013</year>
          ),
          <source>at ISWC</source>
          <year>2013</year>
          .
          <article-title>(2013) CEUR Workshop Proceedings</article-title>
          , Volume
          <volume>1059</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Troumpoukis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konstantopoulos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charalambidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An extension of SPARQL for expressing qualitative preferences</article-title>
          .
          <source>In: The 16th International Semantic Web Conference (ISWC</source>
          <year>2017</year>
          ).
          <article-title>(</article-title>
          <year>2017</year>
          )
          <fpage>711</fpage>
          -
          <lpage>727</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chomicki</surname>
          </string-name>
          , J.:
          <article-title>Preference formulas in relational queries</article-title>
          .
          <source>ACM Transactions on Database Systems</source>
          <volume>28</volume>
          (
          <issue>4</issue>
          ) (
          <year>2003</year>
          )
          <fpage>427</fpage>
          -
          <lpage>466</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>EXISTStential aspects of SPARQL</article-title>
          .
          <source>In: The 15th International Semantic Web Conference (ISWC</source>
          <year>2016</year>
          ).
          <source>(October</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Tarjan</surname>
          </string-name>
          , R.E.:
          <article-title>Efficiency of a good but not linear set union algorithm</article-title>
          .
          <source>Journal of the ACM</source>
          <volume>22</volume>
          (
          <issue>2</issue>
          ) (
          <year>1975</year>
          )
          <fpage>215</fpage>
          -
          <lpage>225</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>