<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Certain Answers for SPARQL?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Claudio Gutierrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Hernández</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aidan Hogan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel Polleres</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Semantic Web Research, Department of Computer Science, University of Chile</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vienna University of Economics and Business</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The standard semantics of SPARQL and the standard semantics of RDF differ fundamentally, sometimes leading to unintuitive answers. In this paper, we thus motivate an alternative semantics for SPARQL based on certain answers, taking into account the existential nature of blank nodes, the open-world assumption of RDF, and perhaps even the lack of a unique name assumption. We propose that SPARQL is a natural use-case for applying existing techniques that approximate certain answers in relational database settings.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The foundational “simple semantics” of RDF is based on three premises: (i)
an open world assumption (OWA) where anything not present in the data is
assumed to be neither true nor false, (ii) a lack of unique name assumption (no
UNA) where multiple names can refer to the same thing, and (iii) the presence
of existential variables in the form of blank nodes. On the other hand, SPARQL
– the standard query language for RDF – appears not to respect any of these
premises. This may lead, in some cases, to unintuitive solutions.</p>
      <p>Taking an example relating to (iii), the Wikidata knowledge-base uses
existential blank nodes in RDF. For example, in the case of Nicole Brown Simpson, it
is stated that she has a killer, but that the killer is unknown, encoded by the
following triple with blank node _:b: (w:NicoleBrownSimpson; w:killedBy; _:b).
Now consider the following SPARQL query over Wikidata (including this triple):
SELECT * WHERE { ?person rdf:type w:Person .</p>
      <p>MINUS { w:NicoleBrownSimpson w:killedBy ?person } }
(1)
The query will return the IRIs of all persons in Wikidata: the SPARQL engine
will do a comparison to check if _:b = w:X – where w:X denotes each person –
and will return false in every case since blank nodes in data act like constants.
Semantically speaking however, the RDF graph does not provide any evidence
that these people in the results are not the killer.</p>
      <p>Leaving aside blank nodes, let’s turn to (i) and assume w:NicoleBrownSimpson
has no w:killedBy edge. SPARQL will again return all people in the data based
on the lack of existence of the edge in the data (effectively taking an implicit
closed world assumption (CWA)). Conversely, the OWA semantics of RDF asserts
that the relation may still hold even if the relevant triple is not given.</p>
      <p>Likewise, with respect to (ii), the comparison w:x = w:y will always return
false in SPARQL, whereas in the RDF graph, these may refer to the same thing:
SPARQL sees unequal terms (effectively adopting a UNA) while in RDF, these
terms may be referring to the same thing (no UNA).</p>
      <p>There is another option, however: SPARQL could return answers only where
it is certain that the person did not kill Nicole Brown Simpson. In other words,
nobody would be returned for this query since anybody could have killed her.
In the comparison _:b = w:X, we could say unknown instead of false to
acknowledge that _:b could refer to w:X; this is similar to how SQL handles the
case of nulls, using three-valued logic. Likewise, checking the (non-)existence of
some relation not supported by the data could return unknown to align with
RDF’s OWA; and so too could (in)equalities over IRIs, aligning with no-UNA.</p>
      <p>
        In this paper, we do not develop new results, but rather position SPARQL
and RDF (with blank nodes) as a natural application of recent results in the
database literature relating to certain answers. Our proposed aim is to have a
keyword, such as SELECT CERTAIN [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which users could opt to use to ensure
that all answers returned by the query are true for all models of the RDF graph,
such as to ensure that no results are returned in the case(s) previously outlined.
      </p>
      <p>
        While some authors have looked at certain answers for SPARQL in the context
of OWA [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and OWL entailments [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], to the best of our knowledge, no work has
yet looked at RDF graphs with blank nodes, nor considered certain answers more
generally in the context of the simple semantics of RDF.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Simple Entailment and SPARQL</title>
      <p>
        Given two RDF graphs G and H, we say that G (simple) entails H, denoted
G j= H, if every model of G is a model of H; intuitively, H says nothing new over
G [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. If G j= H and H j= G, then we say that G and H are (simple) equivalent,
denoted G H. In the simple semantics of RDF, blank nodes act as existential
variables, while IRIs and literals acts as ground terms. With respect to checking
entailment, let denote a mapping from blank nodes to other RDF terms and
let (G) denote the image of an RDF graph G under . Then G j= H if and only
if there exists a such that (H) G (denoted H ! G) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Example 1. Let us consider two RDF graphs, G on the left, H on the right,
where b are blank nodes, and all other terms are IRIs.</p>
      <p>w x b .
w y b .
w y z .</p>
      <p>w x b1 .
w y b2 .
b3 y b2 .</p>
      <p>In this case, G j= H with (b1) = b, (b2) = b, (b3) = w, (H)</p>
      <p>Given a SPARQL query Q and two graphs G and H such that G H, it is
often the case that (abusing notation) Q(G) 6$ Q(H). More generally, it does
not hold that G j= H implies Q(H) ! Q(G) in SPARQL, even when considering
just basic graph patterns (aka. conjunctive queries) with basic inequalities.
SELECT * WHERE { ?w :x ?b1 ; y ?b2 . FILTER(?b1 != ?b2) }
Under standard SPARQL query evaluation, Q(H) returns two results – (w,b1,b2)
and (w,b2,b1) – whereas Q(G) is empty. Clearly Q(H) 6! Q(G). tu</p>
      <p>SPARQL query evaluation is counter to the underlying RDF semantics: one
can get more answers in SPARQL from an RDF graph with less information
(in terms of entailment). Even if a query uses a feature like MINUS, the simple
semantics of RDF adopts OWA, such that if G j= H, any information given by G
that is not given by H is considered unknown, rather than false. We are thus
interested in exploring an alternative semantics for SPARQL where we could
guarantee, for example, that if G j= H, then Q(H) ! Q(G): that in some sense
SPARQL answers are “monotonic” with respect to simple entailment in RDF.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Certain Answers for SPARQL?</title>
      <p>An intuitive idea is to define the semantics of SPARQL with respect to all possible
models of the RDF graph, where an answer is included in the result if and only
if it is “valid” for all models. This idea corresponds closely with the notion of
certain answers for incomplete databases.</p>
      <p>More specifically, let C denote a set of constants and N a set of nulls where
both sets are disjoint. Let us likewise assume, for simplicity, a single relation
name R with arity k. An instance I of R is a k-ary relation RI Ck with only
constants. A naive instance I of R is a k-ary relation RI (CN)k that may
contain constants and nulls. Let : N ! C be a mapping from nulls to constants,
and abusing notation, let (RI ) denote the image of RI under . A representative
of a naive instance I is an instance I0 where (RI ) RI0 . Each representative
fills in the incomplete information of I in a particular way. We denote the set of
all representatives of I as rep(I).</p>
      <p>Example 3. The following are instances of a relation R where numbers indicate
constants and terms of the form ?i indicate a labelled null.</p>
      <p>R
1 ?2 ?1
2 ?2 3</p>
      <p>R
In other words, a certain answer is an answer for all representatives.</p>
      <p>
        Importantly, representatives in certain answers act somewhat like models
in simple entailment, leading to an interesting correspondence: an RDF graph
G j= H if and only if rep(G) rep(H) (assuming an intuitive encoding of G and
H as a ternary relation) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. From this follows:
Remark 1. Given two RDF graphs G and H, then G j= H if and only if
certain(Q; G) certain(Q; H). tu
In other words, if G simple-entails H, then the certain answers for any query
Q over H will be “contained in” those for G. This semantic property holds no
matter what type of query Q or base query-evaluation Q(G) we consider.
Example 4. Consider again query Q from Example 2 run over the same two
graphs. We previously saw that Q(H) 6! Q(G). On the other hand, certain(Q; H)
is empty because there exists H0 2 rep(H) such that b1 and b2 map to the same
constant and where Q(H0) is thus empty, and thus certain(Q; H) is empty. It
now holds that certain(Q; H) certain(Q; G).
tu
      </p>
      <p>
        To evaluate certain answers for SPARQL, we could apply recent results by
Libkin [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], who proposes a sound evaluation of certain answers for SQL by
changing how (in)equality expressions and atomic formulae are evaluated in the
presence of nulls. We very briefly give an example to illustrate.
      </p>
      <p>Example 5. Under Libkin’s proposal, for query (1) in the introduction, the
w:killedBy pattern would be evaluated as unknown (with or without the
triple mentioning _:b), which would lead to no results being returned as certain
answers. In the case of Example 2, the inequality comparison would likewise
return unknown, leading to certain(Q; H) returning no results.</p>
      <p>
        Implementing and experimentally evaluating Libkin’s proposals [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for SPARQL
would thus be a natural next step towards a SELECT CERTAIN feature. There are
also novel questions to tackle in the SPARQL setting, including the effect of bag
semantics, non-well-designed queries, etc., on certain answer evaluation.
Acknowledgements: This work was supported by FONDECYT Grant no. 11140900 and
by the Millennium Nucleus Center for Semantic Web Research under Grant NC120004.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmetaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fischl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Simkus</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skritek</surname>
          </string-name>
          .
          <article-title>Towards reconciling SPARQL and certain answers</article-title>
          .
          <source>In World Wide Web (WWW)</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>33</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pérez</surname>
          </string-name>
          .
          <article-title>Querying semantic web data with SPARQL</article-title>
          .
          <source>In Principles of Database Systems (PODS)</source>
          , pages
          <fpage>305</fpage>
          -
          <lpage>316</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Hurtado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. O.</given-names>
            <surname>Mendelzon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pérez</surname>
          </string-name>
          .
          <article-title>Foundations of Semantic Web databases</article-title>
          .
          <source>J. Comput. Syst. Sci.</source>
          ,
          <volume>77</volume>
          (
          <issue>3</issue>
          ):
          <fpage>520</fpage>
          -
          <lpage>541</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mallea</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          .
          <article-title>Everything you always wanted to know about blank nodes</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>27</volume>
          :
          <fpage>42</fpage>
          -
          <lpage>69</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          .
          <article-title>SQL's three-valued logic and certain answers</article-title>
          .
          <source>In International Conference on Database Theory (ICDT)</source>
          , pages
          <fpage>94</fpage>
          -
          <lpage>109</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>