=Paper= {{Paper |id=Vol-1350/paper-31 |storemode=property |title=Semantics of SPARQL under OWL 2 Entailment Regimes |pdfUrl=https://ceur-ws.org/Vol-1350/paper-31.pdf |volume=Vol-1350 |dblpUrl=https://dblp.org/rec/conf/dlog/KostylevG15 }} ==Semantics of SPARQL under OWL 2 Entailment Regimes== https://ceur-ws.org/Vol-1350/paper-31.pdf
                  Semantics of SPARQL
            under OWL 2 Entailment Regimes

                  Egor V. Kostylev and Bernardo Cuenca Grau

               Department of Computer Science, University of Oxford



      Abstract. We study the semantics of SPARQL queries with optional
      matching features under entailment regimes. We argue that the normative
      semantics may lead to answers that are in conflict with the intuitive mean-
      ing of optional matching, where unbound variables naturally represent
      unknown information. We propose an extension of the SPARQL algebra
      that addresses these issues and is compatible with any entailment regime
      satisfying the minimal requirements given in the normative specification.
      We then study the complexity of query evaluation and show that our
      extension comes at no cost for regimes with an entailment relation of
      reasonable complexity. Finally, we show that our semantics preserves the
      known properties of optional matching that are commonly exploited for
      static analysis and optimisation.


1   Introduction
SPARQL became the standard language for querying RDF in 2008 [1]. Since
then, the theoretical properties of SPARQL have been the subject of intensive
research efforts and are by now relatively well-understood [2–7]. At the same
time, SPARQL has become a core technology in practice, and most RDF-based
applications rely on SPARQL endpoints for query formulation and processing.
    The functionality of many such applications is enhanced by OWL 2 ontolo-
gies [8], which are used to provide background knowledge about the application
domain, and to enrich query answers with implicit information. A new version of
SPARQL, called SPARQL 1.1, was released in 2013 [9]. This new version captures
the capabilities of OWL 2 by means of the so-called entailment regimes [10]: a
flexible mechanism for extending SPARQL query answering to the W3C stan-
dards layered on top of RDF. A regime specifies which RDF graphs and SPARQL
queries are legal (i.e., admissible) for the regime, as well as an entailment relation
that unambiguously defines query answers for all legal queries and graphs.
    The semantics of SPARQL under entailment regimes is specified for the
conjunctive fragment, where queries are represented as basic graph patterns (i.e.,
sets of RDF triples with variables) and query answers are directly provided by
the entailment relation of the regime. Roughly speaking, to check whether a
mapping from variables of the query to nodes in the RDF graph is an answer to
the query, one first transforms the query itself into an RDF graph by substituting
each variable with the corresponding value, and then checks whether this graph
is entailed in the regime by the original data graph [10–12].
    When one goes beyond the basic fragment of SPARQL the language becomes
considerably more complicated, but the effect of entailment regimes on the query
semantics remains circumscribed to basic graph patterns [13, 14]. To evaluate a
query one must first evaluate its component basic patterns using the relevant
regime and then compose the results by means of the SPARQL algebra operations.
    Of particular interest from both a theoretical and a practical perspective
is the extension of the basic fragment of SPARQL with the optional matching
feature, which is realised in the language by means of the OPTIONAL operator
(abbreviated by OPT in this paper). This feature allows the optional information
to be added to query answers only when the information is available in the RDF
data graph: if the optional part of the query does not match the data, then the
relevant variables are left unbounded in query answers.
    One of the main motivations behind optional matching in SPARQL is to deal
with the “lack of regular, complete structures in RDF graphs” (see [9] Section 6)
and hence with the inherent incompleteness of information in RDF data sources
where only partial information about the relevant Web resources is typically
available. In this setting, an unbound variable in an answer mapping is naturally
interpreted as a “null” value, meaning that there might exist a binding for this
variable if we consider other information elsewhere on the Web, but none is
currently available in the RDF graph at hand. Another (and slightly different)
motivation for optional matching was to introduce a mechanism for “not rejecting
solutions because some part of the query pattern does not match” [1]; in this
sense, one would naturally expect optional matching to either extend solutions
with the optional information, or to leave solutions unchanged. Both readings
of optional matching coincide if we focus just on RDF, and they are faithfully
captured by the normative semantics. In this paper we argue that they naturally
diverge once we consider more sophisticated entailment regimes. Furthermore,
the differences that arise can have a major impact on expected answers.
    To make this discussion concrete, let us briefly discuss a simple example of
an RDF graph representing the direct train lines between UK cities as well as
ferry boat transfers from UK cities to international destinations. Let this graph
be exhaustive in its description of rail connections, but much less so in what
concerns ferry transfers. We may exploit optional matching to retrieve all direct
train connections between cities X and Y, extended with ferry transfers from Y
to other cities Z whenever possible. Under the normative semantics of SPARQL
we may obtain answers (London, Oxford , −) and (London, Holyhead , −) provided
the graph has information about direct train lines from London to both Oxford
and Holyhead , but no matching can be found in the graph for ferry connections
starting from Oxford or Holyhead to other cities. Suppose next that the data
graph is extended to a graph corresponding to an OWL 2 ontology in which
it is stated that inland cities do not have ferry connections, and that Oxford
is an inland city. The ontology establishes a clear distinction between Oxford
and Holyhead : whereas the former is inland and cannot have ferry connections,
the latter may still well be (and indeed is) a coastal city offering a number of
transfers to international destinations. The normative OWL 2 direct semantics
entailment regime, however, does not distinguish between the case of Holyhead
(where the information about ferry connections is still unknown) and Oxford
(where the information is certain), and both answers would be returned. In this
way, the normative semantics adopts the reading of optional matching where the
optional information is used to complete (but never discard) query answers. In
contrast, under the reading of unbounded variables as placeholders for unknown
information, one would naturally expect the answer on Oxford to be ruled out.
Indeed, if our goal were to find rail to ferry transfers starting from London and
terminating in Dublin by first querying this graph and then looking for the
missing information elsewhere on the Web, discarding cities like Oxford on the
first stage would significantly facilitate our task.
    In this paper, we propose an alternative semantics for the OPT operator which
adopts the aforementioned reading of optional matching as an incomplete “null”.
We call our semantics strict, which reflects the fact that it rules out those answers
in which unbound variables in the optional part cannot be matched to any
consistent extension of the input graph. Our semantics is given as an extension of
the SPARQL algebra and hence satisfies the expected compositionality properties
of algebraic query languages. Furthermore, it is backwards-compatible with the
normative semantics for regimes in which all legal graphs are consistent, such
as the RDF regime [10]. We also study the complexity of query evaluation and
show that our extension comes at no cost for regimes in which entailment is not
harder than query evaluation under normative semantics for the RDF regime.
Finally, we show that our semantics preserves the known properties of optional
matching that are commonly exploited for static analysis and optimisation.
    This paper is an updated version of the work [15].


2     SPARQL 1.1 under Entailment Regimes

In this section, we formalise the syntax and normative semantics of a core
fragment of SPARQL 1.1 with optional matching under entailment regimes. Our
formalisation is based on the normative specification documents [9–11] and builds
on the well-known foundational works on SPARQL [2, 3, 6].


2.1   Syntax

Let I, L, and B be infinite sets of IRIs, literals, and blank nodes, respectively.
The set of RDF terms T is I ∪ L ∪ B. An RDF triple is a triple (s p o) from
T × I × T, with s called subject, p predicate, and o object. An (RDF) graph is
a finite set of RDF triples. Assume additionally the existence of a countably
infinite set V of variables disjoint from T. A triple pattern is a tuple from
(T ∪ V) × (I ∪ V) × (T ∪ V). A basic graph pattern (BGP) is a finite set of triple
patterns. Built-in conditions are conditions of the form bound(?X), ?X = c, and
?X =?Y for ?X, ?Y ∈ V and c ∈ T, and their Boolean combinations.
    Complex graph patterns are constructed from BGPs using a range of available
operators that are applicable to graph patterns and built-in conditions. We
focus on the AND-OPT-FILTER fragment (i.e., we consider neither union nor
projection), which is widely accepted to be the fundamental core of SPARQL [2].
In this setting, graph patterns are inductively defined as follows (e.g., see [11]):
 1. every BGP is a graph pattern;
 2. if P1 and P2 are graph patterns that share no blank nodes then (P1 AND P2 )
     and (P1 OPT P2 ) are graph patterns (called AND and OPT patterns); and
 3. if P is a graph pattern and R is a built-in condition, then (P FILTER R) is a
     graph pattern (called FILTER pattern).
In what follows vars(P ) (respectively triples(P )) denotes all the variables from V
(respectively all triple patterns) that appear in a graph pattern P .
    We conclude with the definition of a special class of graph patterns with
intuitive behaviour [2]. A graph pattern is well-designed if and only if for each
of its OPT subpatterns (P1 OPT P2 ) the pattern P1 mentions all the variables
of P2 which appear outside this subpattern. Note that all graph patterns in the
examples of this paper are well-designed.


2.2   Semantics of BGPs under Entailment Regimes

The semantics of graph patterns is defined in terms of mappings; that is, partial
functions from variables V to terms T. The domain dom(µ) of a mapping µ is
the set of variables on which µ is defined. The set of triples obtained from a BGP
P by replacing each ?X from dom(µ) by µ(?X) is denoted by µ(P ).
    Two mappings µ1 and µ2 are compatible (written as µ1 ∼ µ2 ) if µ1 (?X) =
µ2 (?X) for all variables ?X which are in both dom(µ1 ) and dom(µ2 ). If µ1 ∼ µ2 ,
then we write µ1 ∪ µ2 for the mapping obtained by extending µ1 with µ2 on
variables undefined in µ1 . A mapping µ1 is subsumed by a mapping µ2 (written
µ1 v µ2 ) iff µ1 ∼ µ2 and dom(µ1 ) ⊆ dom(µ2 ). Finally, a set of mappings Ω1 is
subsumed by a set of mappings Ω2 (written Ω1 v Ω2 ) iff for each µ1 ∈ Ω1 there
exists µ2 ∈ Ω2 such that µ1 v µ2 .
    Based on [10], an (entailment) regime R is a tuple (R, G, P, C, J·K), where
 1. R is a set of reserved IRIs from I;
 2. G is the set of legal graphs;
 3. P is the set of legal BGPs;
 4. C is the set of consistent graphs, such that C ⊆ G; and
 5. J·K is the query answering function, that takes a graph G from G and a
     BGP P from P and returns either a set JP KG of mappings µ such that
     dom(µ) = vars(P ), if P ∈ C; or Err, otherwise.
As in most theoretical works on SPARQL [2, 3, 6, 16], we assume that the query
answering function returns a set of mappings, rather than a multiset.
    The definitions of query answering and consistency in a regime are based on
an entailment relation [10], which is also specified as part of the regime. We do not
model the entailment relation explicitly, but assume two conditions that capture
the effects of any reasonable entailment relation on legality and consistency. All
regimes mentioned in the normative specification satisfy these properties and in
this paper we consider only regimes that do so.
(C1) If graphs G, G1 and G2 are legal, and there is h : T → T, preserving R,
     such that h(G1 ∪ G2 ) ⊆ G then G1 ∪ G2 is legal; if, in addition, G is in C
     then G1 ∪ G2 is also in C.
(C2) If a BGP P is in P then µ(P ) is in G for any (total) µ : V → (T \ R), such
     that µ(P ) is a graph; if also µ(P ) is in C then µ ∈ JP Kµ(P ) .
Condition (C1) formalises (a weak form of) the monotonicity of legality and
consistency: an illegal graph that is a union of legal ones cannot be made legal by
identifying and renaming of non-reserved terms or adding triples to it; moreover, a
similar property holds for consistency. Condition (C2) guarantees, that “freezing”
variables of a legal BGP to non-reserved terms gives us a legal graph, and,
moreover, if such a graph is consistent, then the answer of the BGP on this graph
contains the mapping corresponding to the “freezing”.
    The notions introduced in the remainder of this paper are parameterised with
a regime R, which is not mentioned explicitly for brevity.

2.3   Normative Semantics under Entailment Regimes
Following [2], now we show how the query answering function J·K extends to
complex graph patterns (we refer to [2] for details). A mapping µ satisfies a
built-in condition R, denoted µ |= R, if one of the following holds:
 1. R is bound(?X) and ?X ∈ dom(µ); or
 2. R is ?X = c, ?X ∈ dom(µ), and µ(?X) = c; or
 3. R is ?X =?Y , ?X ∈ dom(µ), ?Y ∈ dom(µ), and µ(?X) = µ(?Y ); or
 4. R is an evaluating to true Boolean combination of other built-in conditions.
The join, difference, and left outer join of sets of mappings Ω1 , Ω2 are as follows:
          n Ω2 = {µ1 ∪ µ2 | µ1 ∈ Ω1 and µ2 ∈ Ω2 such that µ1 ∼ µ2 },
       Ω1 o
       Ω1 \ Ω2 = {µ1 | µ1 ∈ Ω1 , there is no µ2 ∈ Ω2 such that µ1 ∼ µ2 },
       Ω1 o          n Ω2 ) ∪ (Ω1 \ Ω2 ).
          n Ω2 = (Ω1 o
A graph pattern is legal for a regime R if all the BGPs it contains are legal. The
normative query answering function J·Kn is inductively defined for all legal graph
patterns P on the base of J·K as follows. For graphs G from C we have:
 1. if P is a BGP then JP KnG = JP KG ;
 2. if P is (P1 AND P2 ) then JP KnG = JP1 KnG on JP2 KnG ;
                                   n         n
 3. if P is (P1 OPT P2 ) then JP KG = JP1 KG o  n JP2 KnG ; and
 4. if P is (P FILTER R) then JP KG = {µ | µ ∈ JP 0 KnG and µ |= R}.
              0                      n

If G 6∈ C then JP KnG = Err for any graph pattern P (which again coincides with
JP KG when P is a BGP). Note, that by these definitions µ ∈ JP KnG implies that
dom(µ) ⊆ vars(P ), but this inclusion may be strict if P contains OPT operator.
    Two legal patterns P1 and P2 are equivalent (under normative semantics),
denoted by P1 ≡n P2 , if JP1 KnG = JP2 KnG for every RDF graph G ∈ G.

3     On Optional Matching Under the Normative Semantics
One of the main motivations for optional matching in SPARQL was to deal with
the “lack of regular, complete structures in RDF graphs” [9]. Indeed, RDF data
is loosely structured, and in many applications it is not satisfactory to reject an
answer if some relevant information is missing. For example, if we are interested
in retrieving the names, emails, and websites of employees, we may not want to
discard a partial answer involving the name and email address of an employee
merely because the information on the employee’s website is not available in the
graph. The normative semantics was designed to deal with such situations: the
optional information is included in query answers only when the information
is available; otherwise, the relevant variables are left unbounded. An unbound
variable in an answer is thus a manifestation of inherent incompleteness of RDF
data sources, and the missing information is interpreted as unknown.
    This natural interpretation of query results, however, no longer holds if the
query is evaluated under certain entailment regimes, as we illustrate next by
means of examples. In these and all other examples given later on, we focus on
the OWL 2 direct semantics regime. In order for an RDF graph to be legal for
this regime, it must correspond to an OWL 2 ontology; similarly, legal BGPs
must correspond to an extended ontology in which variables are allowed [10].
Thus, in the examples we express RDF graphs and BGPs in (extended) OWL 2
functional syntax, and use words “ontology” and “graph” interchangeably (we
also omit declaration axioms in ontologies and BGPs to avoid clatter).
Example 1. Consider the OWL 2 ontology O1 consisting of the axioms
   ClassAssertion(InlandCity Oxford ),    PropertyAssertion(train London Oxford ),
ClassAssertion(CoastalCity Holyhead ), PropertyAssertion(train London Holyhead ),
 PropertyDomain(ferry CoastalCity),        DisjointClasses(CoastalCity InlandCity).
Consider also the following graph pattern P1 , which we wish to evaluate over O1 :

  PropertyAssertion(train ?X ?Y ) OPT PropertyAssertion(ferry ?Y             ?Z).
Intuitively, solutions to P1 provide direct train lines from city X to city Y as well
as, optionally, the ferry transfers from Y to other cities Z. Under the normative
semantics, the BGPs in P1 are evaluated separately. In particular, the optional
BGP is evaluated to the empty set, and JP1 KnO1 = {µ1 , µ2 }, where
µ1 = {?X 7→ London, ?Y 7→ Oxford } and µ2 = {?X 7→ London, ?Y 7→ Holyhead }.
In both answers, variable ?Z is unbounded and hence we conclude that O1
contains no relevant information about ferry connections starting from Oxford or
Holyhead . However, the nature of the lack of such information is fundamentally
different. On the one hand, the connections from Holyhead (e.g., to Dublin) are
missing from O1 just by the incompleteness of the information in the graph,
which is usual in (and also a feature of) Semantic Web applications. On the other
hand, Oxford cannot have a ferry connection because it is a landlocked city, and
hence the information about its (lack of) ferry connections is certain. Thus, the
normative semantics cannot distinguish between unknown and non-existent ferry
connections. However, if we adhere to the reading of unbounded variables as
incomplete information or “nulls”, then µ1 should not be returned as an answer.
   The issues described in this example become even more apparent in cases
where the optional part alone cannot be satisfied, as in the following example.

Example 2. Consider the ontology O2 with the axioms

    ClassAssertion(Person Peter ),   DisjointProperties(hasFather hasMother ).

Furthermore, consider the following pattern P2 :

ClassAssertion(Person ?X) OPT ({
  PropertyAssertion(hasFather ?X ?Y ), PropertyAssertion(hasMother ?X ?Y )}).

The optional BGP does not match to anything, so JP2 KnO2 consists of {?X 7→
Peter }. However, this BGP is in contradiction with the disjointness axiom: under
the OWL 2 regime, no solution to P2 exists for any ontology with this axiom.

   As these examples suggest, if we interpret unbound variables in answers to
queries with optional parts as an indication of unknown information in the data
graph, then the normative semantics may yield counter-intuitive answers. At
the core of this issue is the inability of the normative semantics to distinguish
between answers in which it is possible to assign values to the missing optional
part (a natural reflection of incompleteness in the data), and those where this is
impossible (a reflection that the missing information is incompatible with the
answer). This distinction is immaterial for regimes without inconsistencies, but it
becomes apparent in more sophisticated regimes, such as those based on OWL 2.


4     Semantics of Strict Optional Matching
In this section, we propose our novel semantics for optional matching under
regimes. In a nutshell, our semantics addresses the issues described in Section 3
by ruling out those answer mappings where unbound variables in the optional
part cannot be matched to any consistent extension of the input graph. Our
semantics is therefore strict, in the sense that only answers in which unbound
variables are genuine manifestations of incompleteness in the data are returned.

4.1    Definition of Strict Semantics
We start by introducing the notion of a frozen RDF graph for a pattern P and
a mapping µ. Roughly speaking, this graph is obtained by taking all the triple
patterns in P and transforming them into RDF triples by applying the extension
of µ where unbounded variables are “frozen” to arbitrary fresh constants.

Definition 1. Let R = (R, G, P, C, J·K) be an entailment regime. Let P be a legal
graph pattern, and let µ be a mapping from variables V to RDF terms T. Then,
the freezing GPµ of P under µ is the RDF graph µ̄(triples(P )), where µ̄ is the
mapping that extends µ by assigning each variable in vars(P ), which is not in
dom(µ), to a globally fresh IRI from I not belonging to R.
   The freezing GP  µ depends only on the candidate mapping µ and the triple
patterns occurring in P ; thus, it does not depend either on the specific operators
used in P , or on the RDF graph over which the query pattern is to be evaluated.

Example 3. For pattern P1 and mappings µ1 , µ2 from Example 1 the freezings
GP        P1
 µ1 and Gµ2 have the following form, for fresh IRIs k and `:
  1




{PropertyAssertion(train London Oxford ), PropertyAssertion(ferry Oxford k)},
{PropertyAssertion(train London Holyhead ),PropertyAssertion(ferry Holyhead `)}.

    Intuitively, the freezing represents the simplest and most general RDF graph
over which all the undefined variables in a given solution mapping could be
bounded to concrete values. Thus, if GP   µ together with the input graph G is not
a consistent graph for the relevant regime, we can conclude, using condition (C1)
of the regime, that the undefined variables in µ will never be matched to concrete
values in any consistent extension of G and hence µ should be ruled out as an
answer. On the other hand, if G ∪ GP  µ is consistent, then such an extension exists
and, by condition (C2), the undefined variables can be mapped in this extension.

Definition 2. Let R = (R, G, P, C, J·K) be an entailment regime. A mapping µ
is R-admissible for a graph G ∈ C and legal graph pattern P if G ∪ GP
                                                                    µ is a graph
belonging to C. The set of all R-admissible mappings for a consistent graph G
and a legal graph pattern P is denoted as Adm(G, P ).

Example 4. Clearly, O1 ∪GP   µ1 is inconsistent since ferries only depart from coastal
                              1


cities, but Oxford is an inland city. In contrast, O1 ∪ GP  µ2 is consistent. Thus, we
                                                             1

have µ1 ∈ / Adm(O1 , P ), but µ2 ∈ Adm(O1 , P ).

   We are now ready to formalise our semantics.

Definition 3. Given an entailment regime R = (R, G, P, C, J·K), the strict query
answering function J·Ks is defined for legal graph patterns P and G ∈ C as follows:
 1. if P is a BGP then JP KsG = JP KG ;
 2. if P is P1 AND P2 then JP KsG = (JP1 KsG o n JP2 KsG ) ∩ Adm(G, P );
                                 s           s
 3. if P is P1 OPT P2 then JP KG = (JP1 KG o   n JP2 KsG ) ∩ Adm(G, P ); and
 4. if P is P FILTER R then JP KG = {µ | µ ∈ JP 0 KsG and µ |= R},
              0                    s

where ∩ denotes the standard set-theoretic intersection. If G 6∈ C then JP KsG = Err
for any graph pattern P . Finally, legal patterns P1 and P2 are equivalent (under
strict semantics), written P1 ≡s P2 , if JP1 KsG = JP2 KsG for any legal G.

Example 5. The strict semantics behaves as expected for our examples: JP1 KsO1 =
{µ1 } holds for O1 and P1 from Example 1, while JP2 KsO2 = ∅ holds for Example 2.

   The strict and normative semantics coincide in two limit cases. First, if the
entailment regime does not allow for inconsistent graphs (i.e., if C = G) as is
the case for the RDF regime [10], then JP KsG = JP KnG for every legal pattern P
and graph G. Second, if the relevant pattern P is OPT-free then the freezing for
every candidate answer mapping contains no fresh IRIs and is R-entailed by G;
thus, we again have JP KsG = JP KnG for every legal graph G.
     Thus, the difference between the normative semantics J·Kn and strict semantics
   s
J·K manifests only for regimes that admit inconsistency, and is circumscribed
to the presence of OPT in graph patterns, where non-admissible mappings are
excluded in the case of the strict semantics. Note, however, that even if a mapping
µ1 (respectively µ2 ) is admissible for a subpattern P1 (respectively P2 ) containing
OPT, it is possible for µ1 ∪ µ2 not to be admissible for the joined pattern
P = P1 AND P2 . Thus, the admissibility restriction is also explicitly reflected in
the semantics of AND given in Definition 3. This is illustrated in the example
given next.
Example 6. Consider ontology O3 , consisting of the axioms
SubClassOf(
 IntersectionOf(SomeValuesFrom(husband Thing) SomeValuesFrom(wife Thing))
 Nothing),
ClassAssertion(Person Mary).

The first axiom establishes that a person cannot have both a husband and a wife.
Consider also the following well-designed graph pattern P3 :

 (ClassAssertion(Person ?X) OPT (PropertyAssertion(husband ?X ?Y ))) AND
 (ClassAssertion(Person ?X) OPT (PropertyAssertion(wife ?X ?Z))).

Clearly, µ = {?X 7→ Mary} belongs to the strict answer to each of the OPT
subpatterns of P3 since each of them independently can match to a consistent
extension of O3 . However, µ is not admissible for P3 since Mary has both a
husband and a wife in GP                   P3                            s
                       µ , and hence O3 ∪ Gµ is inconsistent. Thus, JP3 KO3 = ∅.
                         3




4.2   Comparing the Normative and Strict Semantics
Our previous examples support the expected behaviour of our semantics, namely
that its effect is circumscribed to filtering out problematic answers returned under
the normative semantics. We next formally show that our semantics behaves as
expected in general, provided that we restrict ourselves to well-designed patterns
and negation-free FILTER expressions (which are rather mild restrictions).
    It is known that patterns which are not well-designed easily lead to unexpected
answers, even under the normative semantics (we refer to [2] for a detailed
discussion). Therefore, it comes at no surprise that the intuitive behaviour of our
semantics is only guaranteed under this assumption.
Theorem 1. Let R = (R, G, P, C, J·K) be an entailment regime. The inclusion
JP KsG v JP KnG holds for any graph G from C and any legal well-designed graph
pattern P which does not use negation in FILTER expressions.
    Note that Theorem 1 is formulated in terms of subsumption, instead of
set-theoretic containment. The rationale behind this formulation is clarified next.
Example 7. Consider the ontology O10 , which is obtained from O1 in Example 1
by removing all axioms involving Holyhead , and adding the axiom

                   PropertyAssertion(bus Canterbury London).

Consider also the following graph pattern P10 :

    PropertyAssertion(bus ?U ?X) OPT
       (PropertyAssertion(train ?X ?Y ) OPT PropertyAssertion(ferry ?Y ?Z)).

The mapping µ = {?U 7→ Canterbury, ?X 7→ London, ?Y 7→ Oxford } is returned
by the normative semantics. As already discussed, Oxford is an inland city and
hence cannot have ferry connections; thus, µ is not returned under strict semantics.
However, it may be possible to reach a ferry connection from London (although
none is given), and hence the answer µ0 = {?U 7→ Canterbury, ?X 7→ London}
is returned instead of µ under strict semantics. Clearly, µ0 is not a normative
answer and JP10 KsO0 6⊆ JP10 KnO0 ; however, µ0 v µ and JP10 KsO0 v JP10 KnO0 .
                  1          1                             1          1




5     Computational Properties and Static Optimisation
In this section we first study the computational properties of our semantics. We
show that the complexity of graph pattern evaluation under strict and normative
semantics coincide, provided that consistency checking is feasible in PSPACE for
the regime at hand. Then we focus on static query analysis, and in particular on
pattern equivalence. We show that the key equivalence-preserving transformation
rules that have been proposed for static optimisation of SPARQL queries continue
to hold if we consider equivalence under strict semantics.

5.1    Complexity of Strict Graph Pattern Evaluation
Recall that the graph pattern evaluation is the key problem in SPARQL. In the
context of entailment regimes, it is defined as follows, where x is either n or s.

 Graph Pattern Evaluation
 Input:    Regime R, legal graph G, legal graph pattern P , and mapping µ.
 Question: Is µ ∈ JP KxG under the regime R?
Here, when we say that regime R is a part of the input, we mean that it includes
two oracle functions checking consistency of legal graphs and evaluating legal
BGPs over legal graphs, respectively. In what follows, we refer to the problem as
Normative if x = n, and as Strict if x = s.
    It is known that the normative graph pattern evaluation problem is in
PSPACE for the RDF regime [2]. We next argue that membership in PSPACE
holds in general for any regime satisfying the basic properties discussed in Sec-
tion 2 and for both normative and strict versions of the problem, provided that
the complexity of both oracles of the regime is in PSPACE.
Theorem 2. Normative and Strict Graph Pattern Evaluation problems
are in PSPACE, provided the oracles associated to input regimes are in PSPACE.
    Consequently, the use of our strict semantics does not increase the computa-
tional complexity for reasonable regimes. In particular, it follows directly from
Theorem 2 that the evaluation problem is in PSPACE under both semantics for
the tractable entailment regimes associated to the OWL 2 profiles [17].
    It is also known that graph pattern evaluation under normative semantics is
PSPACE-hard for the RDF regime [2]. To formulate a general hardness result
that holds for any regime we would need to require additional properties for
a regime to qualify as “reasonable”. In order not to unnecessarily complicate
the presentation, we simply point out that PSPACE-hardness holds for all the
regimes in the specification under both normative and strict semantics [10].

5.2   Static Analysis and Optimisation
Static analysis and optimisation of SPARQL queries has received significant
attention in recent years [4, 6, 18–20]. A key ingredient for optimisation is the
availability of a comprehensive catalog of equivalence-preserving transformation
rules for SPARQL patterns. A rich set of such equivalences for normative semantics
and RDF regime is established in [2] and [4]. Some of these equivalences, such as
idempotence, commutativity, and associativity of the AND operator, hold without
any restrictions (for our core fragment of SPARQL). However, those that involve
OPT are more intricate and hold only for well-designed patterns. The claim of
this section is that these equivalences continue to hold for any entailment regime,
under both normative and strict semantics.
Theorem 3. The following equivalences hold for any entailment regime, provided
the graph patterns on both sides are legal and well-designed, for x ∈ {n, s}:
               (P1 OPT P2 ) FILTER R ≡x (P1 FILTER R) OPT P2 ,
                 P1 AND (P2 OPT P3 ) ≡x (P1 AND P2 ) OPT P3 ,
                 (P1 OPT P2 ) OPT P3 ≡x (P1 OPT P3 ) OPT P2 .

6     Conclusion
In this paper, we have proposed a novel semantics for optional matching in
SPARQL under entailment regimes where unbound variables in answer mappings
are naturally interpreted as “null” values. Our strict semantics has been designed
to deal in a faithful way with the “lack of regular, complete structures in RDF
graphs” and hence with the fundamental incompleteness of information on the
Semantic Web [1]. We believe that both strict and normative semantics are valid,
but one may be more appropriate than the other in certain applications. Both
semantics are compatible at a fundamental level and it would be possible to
exploit them in the same application by letting users commit to one or the other
explicitly when posing queries. Integrating them in a clean way from a syntactic
point of view is more tricky, and it is something we leave for future investigation.
References
 1. Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C Recom-
    mendation (2008) Available at http://www.w3.org/TR/rdf-sparql-query/.
 2. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM
    Trans. Database Syst. 34(3) (2009)
 3. Angles, R., Gutierrez, C.: The expressive power of SPARQL. In: ISWC. (2008)
    114–129
 4. Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization.
    In: ICDT. (2010) 4–33
 5. Arenas, M., Pérez, J.: Querying semantic web data with SPARQL. In: PODS.
    (2011) 305–316
 6. Letelier, A., Pérez, J., Pichler, R., Skritek, S.: Static analysis and optimization of
    semantic web queries. ACM Trans. Database Syst. 38(4) (2013) 25
 7. Polleres, A.: From SPARQL to rules (and back). In: WWW. (2007) 787–796
 8. Motik, B., Patel-Schneider, P.F., Parsia, B.: OWL 2 Web Ontology Language
    Structural Specification and Functional-style Syntax. W3C Recommendation (2012)
    Available at http://www.w3.org/TR/owl2-syntax/.
 9. W3C SPARQL Working Group: SPARQL 1.1 Query language. W3C Recommen-
    dation (2013) Available at http://www.w3.org/TR/sparql11-query/.
10. Glimm, B., Ogbuji, C.: SPARQL 1.1 Entailment Regimes. W3C Recommendation
    (2013) Available at http://www.w3.org/TR/sparql11-entailment/.
11. Glimm, B., Krötzsch, M.: SPARQL beyond subgraph matching. In: ISWC. (2010)
    241–256
12. Kollia, I., Glimm, B.: Optimizing SPARQL query answering over OWL ontologies.
    J. Artif. Intell. Res. 48 (2013) 253–303
13. Kontchakov, R., Rezk, M., Rodriguez-Muro, M., Xiao, G., Zakharyaschev, M.:
    Answering SPARQL queries over databases under OWL 2 QL entailment regime.
    In: ISWC. (2014) 552–567
14. Arenas, M., Gottlob, G., Pieris, A.: Expressive languages for querying the semantic
    web. In: PODS. (2014) 14–26
15. Kostylev, E.V., Cuenca Grau, B.: On the semantics of SPARQL queries with
    optional matching under entailment regimes. In: ISWC. (2014)
16. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. In:
    ISWC. (2006) 30–43
17. Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL 2
    Web Ontology Language: Profiles (27 October 2009) Available at http://www.
    w3.org/TR/owl2-profiles/.
18. Chekol, M.W., Euzenat, J., Genevès, P., Layaı̈da, N.: SPARQL query containment
    under RDFS entailment regime. In: IJCAR. (2012) 134–148
19. Chekol, M.W., Euzenat, J., Genevès, P., Layaı̈da, N.: SPARQL query containment
    under SHI axioms. In: AAAI. (2012)
20. Chekol, M.W., Euzenat, J., Genevès, P., Layaı̈da, N.: Evaluating and benchmarking
    SPARQL query containment solvers. In: ISWC. (2013) 408–423