SPARQL Update for Materialised Triple Stores under DL-LiteRDFS Entailment Albin Ahmeti1 , Diego Calvanese2 , and Axel Polleres3 1 Vienna University of Technology, Favoritenstraße 9, 1040 Vienna, Austria 2 Faculty of Computer Science, Free University of Bozen-Bolzano, Bolzano, Italy 3 Vienna University of Economics and Business, Welthandelsplatz 1, 1020 Vienna, Austria Abstract. Updates in RDF stores have recently been standardised in the SPARQL 1.1 Update specification. However, computing answers entailed by ontologies in triple stores is usually treated orthogonally to updates. Even W3C’s SPARQL 1.1 Update language and SPARQL 1.1 Entailment Regimes specifica- tions explicitly exclude a standard behaviour for entailment regimes other than simple entailment in the context of updates. In this paper, we take a first step to close this gap. We define a fragment of SPARQL basic graph patterns corre- sponding to (the RDFS fragment of) DL-Lite and the corresponding SPARQL update language, dealing with updates both of ABox and of TBox statements. We discuss possible semantics along with potential strategies for implementing them. Particularly, we treat materialised RDF stores, which store all entailed triples explicitly, and preservation of materialisation upon ABox and TBox updates. 1 Introduction The availability of SPARQL as a standard for accessing structured Data on the Web may well be called one of the key factors to the success and increasing adoption of RDF and the Semantic Web. Still, in its first iteration the SPARQL [23] specification has neither defined how to treat ontological entailments with respect to RDF Schema (RDFS) and OWL ontologies, nor provided means how to update dynamic RDF data. Both these gaps have been addressed within the recent SPARQL 1.1 specification, which provides both means to define query answers under ontological entailments (SPARQL 1.1 Entailment Regimes [9]), and an update language to update RDF data stored in a triple store (SPARQL 1.1 Update [8]). Nonetheless, these specifications leave it open how SPARQL endpoints should treat entailment regimes other than simple entailment in the context of updates; the main issue here is how updates shall deal with implied statements: – What does it mean if an implied triple is explicitly (re-)inserted (or deleted)? – Which (if any) additional triples should be inserted, (or, resp., deleted) upon updates? For the sake of this paper, we address such questions with the focus on a deliberately minimal ontology language, namely the minimal RDFS fragment of [19].4 As it turns out, even in this confined setting, updates as defined in the SPARQL 1.1 Update specification 4 We ignore issues like axiomatic triples [13], blank nodes [17], or, in the context of OWL, inconsistencies arising through updates [5]. Neither do we consider named graphs in SPARQL, which is why we talk about “triple stores” as opposed to “graph stores” [8]. Table 1. DL-LiteRDFS assertions vs. RDF(S), where 𝐴, 𝐴′ denote concept (or, class) names, 𝑃 , 𝑃 ′ denote role (or, property) names, 𝛤 is a set of constants, and 𝑥, 𝑦 ∈ 𝛤 . For the RDF(S) vocabulary, we make use of similar abbreviations (sc, sp, dom, rng, a) introduced in [19]. TBox RDFS TBox RDFS ABox RDFS ′ ′ 1 𝐴 ⊑𝐴 𝐴 sc 𝐴. 3 ∃𝑃 ⊑ 𝐴 𝑃 dom 𝐴. 5 𝐴(𝑥) 𝑥 a 𝐴. 2 𝑃′ ⊑ 𝑃 𝑃 ′ sp 𝑃 . 4 ∃𝑃 − ⊑ 𝐴 𝑃 rng 𝐴. 6 𝑃 (𝑥, 𝑦) 𝑥 P 𝑦. impose non-trivial challenges; in particular, specific issues arise through the interplay of INSERT, DELETE, and WHERE clauses within a single SPARQL update operation, which —to the best of our knowledge— have not yet been considered in this combination in previous literature on updates under entailment (such as for instance [5, 11]). Example 1. As a running example, we assume a triple store 𝐺 with RDF (ABox) data and an RDFS ontology (TBox) 𝑂fam about family relationships (in Turtle syntax [2]), where :hasP, :hasM, and :hasF denote the parent, mother, and father relations. ABox: :joe :hasP :jack. :joe :hasM :jane. TBox: :Father sc :Parent. :Mother sc :Parent. :hasF sp :hasP. :hasM sp :hasP. :hasP rng :Parent; dom :Child. :hasF rng :Father; dom :Child. :hasM rng :Mother; dom :Child. The following query should return :jack and :jane as (RDFS entailed) answers: SELECT ?Y WHERE { :joe :hasP ?Y. } SPARQL engines supporting simple entailment would only return :jack, though. The intended behaviour for the query in Ex. 1 is typically achieved either (i) by query rewriting techniques computing entailed answers at query run-time, or (ii) by materialising all implied triples in the store, normally at loading time. That is, on the one hand, borrowing from query rewriting techniques for DL-Lite (such as, e.g., PerfectRef [4]) one can reformulate such a query to return also implied answers. Example 2 (cont’d). The rewriting according to PerfectRef [4] of the query in Ex. 1 with respect to 𝑂fam as a DL TBox in SPARQL yields SELECT ?Y WHERE { {:joe :hasP ?Y} UNION {:joe :hasF ?Y} UNION {:joe :hasM ?Y}} Indeed, this query returns both :jane and :jack. On the other hand, an alternative5 is to materialise all inferences in the triple store, such that the original query can be used ’as is’, for instance using the minimalistic inference rules for RDFS from [19]6 shown in Fig. 1. Example 3 (cont’d). The materialised version of 𝐺 would contain the following triples— for conciseness we only show assertional implied triples here, that is triples from the four leftmost rules in Fig. 1. 5 This alternative is viable for RDFS, but not necessarily for more expressive DLs. 6 These rules correspond to rules 2), 3), 4) of [19]; they suffice since we ignore blank nodes. ?𝐶 sc ?𝐷. ?𝑆 a ?𝐶. ?𝑃 dom ?𝐶. ?𝑆 ?𝑃 ?𝑂. ?𝐶 sc ?𝐷. ?𝐷 sc ?𝐸. ?𝑆 a ?𝐷. ?𝑆 a ?𝐶. ?𝐶 sc ?𝐸. ?𝑃 sp ?𝑄. ?𝑆 ?𝑃 ?𝑂. ?𝑃 rng ?𝐶. ?𝑆 ?𝑃 ?𝑂. ?𝑃 sp ?𝑄. ?𝑄 sp ?𝑅. ?𝑆 ?𝑄 ?𝑂. ?𝑂 a ?𝐶. ?𝑃 sp ?𝑅. Fig. 1. Minimal RDFS inference rules :joe a :Child; :hasP :jack; :hasM :jane; :hasP :jane. :jack a :Parent. :jane a :Mother, :Parent. On a materialised triple store, the query from Ex. 1 would return the expected results. However, in the context of SPARQL 1.1 Update, things become less clear. Example 4 (cont’d). The following operation tries to delete an implied triple and at the same time to (re-)insert another implied triple. DELETE {?X a :Child} INSERT {?Y a :Mother} WHERE {?X :hasM ?Y} Existing triple stores offer different solutions to these problems, ranging from ig- noring entailments in updates altogether, to keeping explicit and implicit (materialised) triples separate and re-materialising upon updates. In the former case (ignoring entail- ments) updates only refer to explicitly asserted triples, which may result in non-intuitive behaviours, whereas the latter case (re-materialisation) may be very costly, while still not eliminating all non-intuitive cases, as we will see. The problem is aggravated by no systematic approach to which implied triples to store explicitly in a triple store and which not. In this paper we try to argue for a more systematic approach for dealing with updates in the context of RDFS entailments; we will focus on materialised RDF stores, which store all entailed ABox triples explicitly. We propose alternative update semantics and discuss possible implementation strategies, partially inspired by query rewriting techniques from ontology-based data access (OBDA) [15] and DL-Lite [4]. In previous work [1], we have also discussed reduced RDF Stores, i.e., redundancy- free RDF stores that do not store any assertional (ABox) triples already entailed by others, which we omit here for space limits. Here we have added a more in-depth discussion on the relation to DRed [6, 10, 16, 25], and we also discuss TBox updates. 2 Preliminaries We introduce some basic notions about RDF graphs, RDFS ontologies, and SPARQL queries. Since we will draw from ideas coming from OBDA and DL-Lite, we introduce these notions in a way that is compatible with DLs. Definition 1 (RDFS ontology, ABox, TBox, triple store). We call a set 𝒯 of inclusion assertions of the forms 1–4 in Table 1 an (RDFS) TBox (or ontology), a set 𝒜 of assertions of the forms 5–6 in Table 1 an (RDF) ABox, and the union 𝐺 = 𝒯 ∪ 𝒜 an (RDFS) triple store. In the context of RDF(S), the set 𝛤 of constants coincides with the set 𝐼 of IRIs. We assume the IRIs used for concepts, roles, and constants to be disjoint from IRIs of the RDFS and OWL vocabularies.7 In the following, we view RDF and DL notation interchangeably, i.e., we treat any RDF graph consisting of triples without non-standard RDFS vocabulary as a set of TBox and ABox assertions. To define the semantics of RDFS, we rely on the standard notions of (first-order logic) interpretation, satisfaction of assertions, and model. As for queries, we again treat the cases of SPARQL and DLs interchangeably. Let 𝒱 be a countably infinite set of variables (written as ’?’ -prefixed alphanumeric strings). Definition 2 (BGP, CQ, UCQ). A conjunctive query (CQ) 𝑞, or basic graph pattern (BGP), is a set of atoms of the forms 5–6 in Table 1, where now 𝑥, 𝑦 ∈ 𝛤 ∪ 𝒱. A union of conjunctive queries (UCQ) 𝑄, or UNION pattern, is a set of CQs. We denote with 𝒱(𝑞) (or 𝒱(𝑄)) the set of variables from 𝒱 occurring in 𝑞 (resp., 𝑄). In this definition we are considering only CQs in which all variables are distin- guished (i.e., are answer variables), and such queries correspond to SPARQL basic graph patterns (BGPs). Also, we allow only for restricted forms of general SPARQL BGPs that correspond to standard CQs as formulated over a DL ontology; that is, we rule out more complex patterns in SPARQL 1.1 [12] (such as OPTIONAL, NOT EXISTS, FILTER), queries with variables in predicate positions, as well as “terminological” queries, e.g., {?𝑥 sc ?𝑦.}. We will relax the latter restriction later (see Sec. 4). Also, we do not consider here blank nodes separately8 . By these restrictions, we can treat query answering and BGP matching in SPARQL analogously and define it in terms of interpretations and models (as usual in DLs). Specifically, an answer (under RDFS Entailment) to a CQ 𝑞 over a triple store 𝐺 is a substitution 𝜃 of the variables in 𝒱(𝑞) with constants in 𝛤 such that every model of 𝐺 satisfies all facts in 𝑞𝜃. We denote the set of all such ⋃︀ answers with ans rdfs (𝑞, 𝐺) (or simply ans(𝑞, 𝐺)). The set of answers to a UCQ 𝑄 is 𝑞∈𝑄 ans(𝑞, 𝐺). From now on, let rewrite(𝑞, 𝒯 ) be the UCQ resulting from applying PerfectRef [4] to a CQ 𝑞 and a triple store 𝐺 = 𝒯 ∪ 𝒜, and let mat(𝐺) be the triple store obtained from exhaustive application on 𝐺 of the inference rules in Fig. 1. The next result follow immediately from, e.g., [4, 11, 19] and shows that query answering in RDFS can be done by either query rewriting or materialisation. Proposition 1. Let 𝐺 = 𝒯 ∪ 𝒜 be a triple store, 𝑞 a CQ, and 𝒜′ the set of ABox assertions in mat(𝐺). Then, ans(𝑞, 𝐺) = ans(rewrite(𝑞, 𝒯 ), 𝒜) = ans(𝑞, 𝒜′ ). Various triple stores (e.g., BigOWLIM [3]) perform ABox materialisation directly upon loading data. However, such triple stores do not necessarily materialise the TBox: in order to correctly answer UCQs as defined above, a triple store actually does not need to consider the two rightmost rules in Fig. 1. Accordingly, we will call a triple store or (ABox) materialised if in each state it always guarantees 𝐺 ∖ 𝒯 = mat(𝐺) ∖ mat(𝒯 ). We observe that, trivially, a triple store containing no ABox statements is materialised. Finally, we introduce the notion of a SPARQL update operation. 7 That is, we assume no “non-standard use” [22] of these vocabularies. While we could assume concept names, role names, and individual constants mutually disjoint, we rather distinguish implicitly between them “per use” (in the sense of “punning” [18]) based on their position. 8 Blank nodes in a triple store may be considered as constants and we do not allow blank nodes in queries, which does not affect the expressivity of SPARQL. Definition 3 (SPARQL update operation). Let 𝑃𝑑 and 𝑃𝑖 be BGPs, and 𝑃𝑤 a BGP or UNION pattern. Then an update operation 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ) has the form DELETE 𝑃𝑑 INSERT 𝑃𝑖 WHERE 𝑃𝑤 Intuitively, the semantics of executing 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ) on 𝐺, denoted as 𝐺𝑢(𝑃𝑑 ,𝑃𝑖 ,𝑃𝑤 ) is defined by interpreting both 𝑃𝑑 and 𝑃𝑖 as “templates” to be instantiated with the solutions of ans(𝑃𝑤 , 𝐺), resulting in sets of ABox statements 𝒜𝑑 to be deleted from 𝐺, and 𝒜𝑖 to be inserted into 𝐺. A naïve update semantics follows straightforwardly. Definition 4 (Naïve update semantics). Let 𝐺 = 𝒯 ∪ 𝒜 be a triple store, and 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ) an update operation. Then, naive update of 𝐺 with ⋃︀ 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ), de- noted 𝐺𝑢(𝑃𝑑 ,𝑃𝑖 ,𝑃𝑤 ) , is defined as (𝐺 ∖ 𝒜𝑑 ) ∪ 𝒜𝑖 , where 𝒜𝑑 = 𝜃∈ans(𝑃𝑤 ,𝐺) gr(𝑃𝑑 𝜃), ⋃︀ 𝒜𝑖 = 𝜃∈ans(𝑃𝑤 ,𝐺) gr(𝑃𝑖 𝜃), and gr(𝑃 ) denotes the set of ground triples in pattern 𝑃 . As easily seen, this naïve semantics does not preserve materialisation.9 3 Alternative Mat-Preserving Semantics for ABox Updates We investigate now alternative semantics for updates that preserve a materialised ABox (or simply, mat-preserving semantics) and in how far these semantics can—similar to query answering—be implemented on top of off-the-shelf SPARQL 1.1 triple stores. Definition 5 (Mat-preserving semantics). Let 𝐺 and 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ) be as in Def. 4. An update semantics Sem is called mat-preserving, if 𝐺Sem Sem 𝑢(𝑃𝑑 ,𝑃𝑖 ,𝑃𝑤 ) = mat(𝐺𝑢(𝑃𝑑 ,𝑃𝑖 ,𝑃𝑤 ) ). Specifically, we consider the following variants, given an update 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ): Semmat 0 : as a baseline for a mat-preserving semantics, we apply the naïve semantics, followed by (re-)materialisation of the whole triple store. Semmat 1 : an alternative approach for a mat-preserving semantics is to follow the so- called “delete and rederive” algorithm [10] for deletions, that is: (i) delete the instantiations of 𝑃𝑑 plus “dangling” effects, i.e., effects of deleted triples that after deletion are not implied any longer by any non-deleted triples; (ii) insert the instantiations of 𝑃𝑖 plus all their effects. Semmat 2 : Another mat-preserving semantics could take a different viewpoint with re- spect to deletions, following the intention to: (i) delete the instantiations of 𝑃𝑑 plus all their causes; (ii) insert the instantiations of 𝑃𝑖 plus all their effects. Semmat 3 : Finally, a mat-preserving semantics could combine Semmat 1 and Semmat 2 deleting both causes of instantiations of 𝑃𝑑 and (recursively) “dangling” effects.10 The definition of semantics Semmat 0 is straightforward. Definition 6 (Baseline mat-preserving update semantics). Let 𝐺 and 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ) Semmat be as in Def. 4. Then, 𝐺𝑢(𝑃𝑑0,𝑃𝑖 ,𝑃𝑤 ) = mat(𝐺𝑢(𝑃𝑑 ,𝑃𝑖 ,𝑃𝑤 ) ). 9 Consider, e.g., the update from Ex. 4 on the materialised store from Ex. 3. 10 Note the difference to the basic “delete and rederive” approach. Semmat 1 in combination with the intention of Semmat 2 would also mean to recursively delete effects of causes, and so forth. Let us proceed with a “reality-check” on this semantics using our running example. Example 5. Consider the update from Ex. 4. It is easy to see that under Semmat 0 executed on the materialised triple store of Ex. 3, it would not have any effect. As this behaviour is quite arguable, let us proceed with discussing the proposed alternative mat-preserving update semantics, and how they could be implemented. As for Semmat1 , we rely on a well-known technique in the area of updates for deductive databases called “delete and rederive” (DRed) [6, 10, 16, 25]. Informally 𝜔 translated to our setting, when given a logic program 𝛱 and its materialisation 𝑇𝛱 , plus set of facts 𝐴𝑑 to be deleted and 𝐴𝑖 to be inserted, DRed (i) first deletes 𝐴𝑑 and all its 𝜔 𝜔 ′ effects (computed via semi-naive evaluation [24]) from 𝑇𝛱 , resulting in (𝑇𝛱 ) , (ii) then, 𝜔 ′ starting from (𝑇𝛱 ) , re-materialises (𝛱 ∖ 𝐴𝑑 ) ∪ 𝐴𝑖 (again using semi-naive evaluation). The basic intuition behind DRed of deleting effects of deleted triples and then re- materialising can be expressed in our notation as follows; as we will consider a variant of this semantics later on, we refer to this semantics as Semmat 1𝑎 . Definition 7. Let 𝐺 = 𝒯 ∪ 𝒜, 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ), 𝒜𝑑 , and 𝒜𝑖 be defined as in Def. 4. Then Semmat ,𝑃𝑖 ,𝑃𝑤 ) = mat(𝒯 ∪ (𝒜 ∖ mat(𝒯 ∪ 𝒜𝑑 )) ∪ 𝒜𝑖 ). 𝐺𝑢(𝑃𝑑1𝑎 As opposed to the classic DRed algorithm, where Datalog distinguishes between view predicates (IDB) and extensional knowledge in the Database (EDB), in our setting we do not make this distinction, i.e., we do not distinguish between implicitly and explicitly inserted triples. This means that Semmat 1𝑎 would delete also those effects that had been inserted explicitly before. We introduce now a different variant of this semantics, denoted Semmat 1𝑏 , that makes a distinction between explicitly and implicitly inserted triples. Definition 8. Let 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ) be an update operation, and 𝐺 = 𝒯 ∪ 𝒜𝑒𝑥𝑝𝑙 ∪ 𝒜𝑖𝑚𝑝𝑙 a triple store, where 𝒜𝑒𝑥𝑝𝑙 and 𝒜𝑖𝑚𝑝𝑙 respectively denote the explicit and implicit ABox Semmat ′ ′ ,𝑃𝑖 ,𝑃𝑤 ) = 𝒯 ∪ 𝒜𝑒𝑥𝑝𝑙 ∪ 𝒜𝑖𝑚𝑝𝑙 , where 𝒜𝑑 and 𝒜𝑖 are defined as in triples. Then 𝐺𝑢(𝑃𝑑1𝑏 Def. 4, 𝒜𝑒𝑥𝑝𝑙 = (𝒜𝑒𝑥𝑝𝑙 ∖ 𝒜𝑑 ) ∪ 𝒜𝑖 , and 𝒜𝑖𝑚𝑝𝑙 = mat(𝒜′𝑒𝑥𝑝𝑙 ∪ 𝒯 ) ∖ 𝒯 . ′ ′ Note that in Semmat mat 1𝑏 , as opposed to Sem1𝑎 , we do not explicitly delete effects of 𝒜𝑑 from the materialisation, since the definition just relies on re-materialisation from scratch from the explicit ABox 𝒜′𝑒𝑥𝑝𝑙 . Nonetheless, the original DRed algorithm can still be used for computing Semmat 1𝑏 as shown by the following proposition. Proposition 2. Let us interpret the inference rules in Fig. 1 and triples in 𝐺 respectively as rules and facts of a logic program 𝛱; accordingly, we interpret 𝒜𝑑 and 𝒜𝑖 from Def. 8 as facts to be deleted from and inserted into 𝛱, respectively. Then, the materialisation computed by DRed, as defined in [16], computes exactly 𝒜′𝑖𝑚𝑝𝑙 . None of Semmat 0 , Semmat mat 1𝑎 , and Sem1𝑏 are equivalent, as shown in Ex. 6. Example 6. Given the triple store 𝐺 = {:C sc :D . :D sc :E}, on which we perform the operation INSERT{:x a :C, :D, :E.}, explicitly adding three triples, and subsequently perform DELETE{:x a :C, :E.}, we obtain, according to the three semantics discussed so far, the following ABoxes: Semmat 0 : {:x a :D. :x a :E.}, Semmat 1𝑎 : {}, mat Sem1𝑏 : {:x a :D. :x a :E.}. While after this update Semmat 0 and Semmat1𝑏 deliver the same result, the difference between these two is shown by the subsequent update DELETE{:x a :D.}: Semmat 0 : {:x a :E.}, Semmat 1𝑎 : {}, Semmat 1𝑏 : {}. As for the subtle difference between Semmat mat 1𝑎 and Sem1𝑏 , we point out that none of [16, 25], who both refer to using DRed in the course of RDF updates, make it clear whether explicit and implicit ABox triples are to be treated differently. Further, continuing with Ex. 5, the update from Ex. 4 still would not have any effect, neither using Semmat mat 1𝑎 , nor Sem1𝑏 . That is, it is not possible in any of these update semantics to remove implicit information (without explicitly removing all its causes). Semmat2 aims at addressing this problem concerning the deletion of implicit infor- mation. As it turns out, while the intention of Semmat 2 to delete causes of deletions cannot be captured just with the mat operator, it can be achieved fairly straightforwardly, building upon ideas similar to those used in query rewriting. As we have seen, in the setting of RDFS we can use algorithm PerfectRef to expand a CQ to a UCQ that incorporates all its “causes”. A slight variation can be used to compute the set of all causes, that is, in the most naïve fashion by just “flattening” the set of sets returned by PerfectRef to a simple set; we denote this flattening on a set 𝑆 of sets as flatten(𝑆). Likewise, we can easily define a modified version of mat(𝐺), applied to a BGP 𝑃 using a TBox 𝒯 11 . Let us call the resulting algorithm mat eff (𝑃, 𝒯 )12 . Using these considerations, we can thus define both rewritings that consider all causes, and rewritings that consider all effects of a given (insert or delete) pattern 𝑃 : Definition 9 (Cause/Effect rewriting). Given a BGP insert or delete template 𝑃 for an update operation over the triple store 𝐺 = 𝒯 ∪ 𝒜, we define the all-causes-rewriting of 𝑃 as 𝑃 caus = flatten(rewrite(𝑃, 𝒯 )); likewise, we define the all-effects-rewriting of 𝑃 as 𝑃 eff = mat eff (𝑃, 𝒯 ). This leads (almost) straightforwardly to a rewriting-based definition of Semmat 2 . Definition 10. Let 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ) be an update operation. Then Semmat 𝐺𝑢(𝑃𝑑2,𝑃𝑖 ,𝑃𝑤 ) = 𝐺𝑢(𝑃 caus ,𝑃 eff ,{𝑃𝑤 }{𝑃 fvars }) 𝑑 𝑖 𝑑 where 𝑃𝑑fvars = {?𝑥 a rdfs:Resource. | for each ?𝑥 ∈ Var(𝑃𝑑caus ) ∖ Var(𝑃𝑑 )}. The only tricky part in this definition is the rewriting of the WHERE clause, where 𝑃𝑤 is joined13 with a new pattern 𝑃𝑑fvars that binds “free” variables (i.e., the fresh variables 11 This could be viewed as simply applying the first four inference rules in Fig. 1 exhaustively to 𝑃 ∪ 𝒯 , and then removing 𝒯 . 12 Note that it is not our intention to provide optimised algorithms here, but just to convey the feasibility of this rewriting. 13 A sequence of ’{}’-delimited patterns in SPARQL corresponds to a join, where such joins can again be nested with UNIONs, with the obvious semantics, for details cf. [12]. denoted by ‘_’ in PerfectRef ) in the rewritten DELETE clause, 𝑃𝑑caus . Here, ?x a rdfs:Resource. is a shortcut for {{?x ?x𝑝 ?x𝑜 } UNION {?x𝑠 ?x ?x𝑜 } UNION {?x𝑠 ?x𝑝 ?x}}, which binds ?x to any term occurring in 𝐺. Example 7. Getting back to the materialised version of the triple store 𝐺 from Ex. 3, the update 𝑢 from Ex. 4 would, according to Semmat 2 , be rewritten to DELETE {?X a :Child. ?X :hasF ?x1. ?X :hasM ?x2. ?X :hasP ?x3.} INSERT {?Y a :Mother. ?Y a :Parent. } WHERE {{?X :hasM ?Y.} {?x1 a rdfs:Resource. ?x2 a rdfs:Resource. ?x3 a rdfs:Resource.}} Semmat with 𝐺𝑢 2 containing :jane a :Mother, :Parent. :jack a :Parent. It is easy to argue that Semmat 2 is mat-preserving. However, this semantics might still result in potentially non-intuitive behaviours. For instance, subsequent calls of INSERTs and DELETEs might leave “traces”, as shown by the following example. Example 8. Assume 𝐺 = 𝑂fam from Ex. 1 with an empty ABox. Under Semmat 2 , the following sequence of updates would leave as a trace —among others— the resulting triples as in Ex. 7, which would not be the case under the naïve semantics. DELETE{} INSERT {:joe :hasM :jane; :hasF :jack} WHERE{}; DELETE {:joe :hasM :jane; :hasF :jack} INSERT{} WHERE{} Semmat 3 tries to address the issue of such “traces”, but can no longer be formulated by a relatively straightforward rewriting. For the present, preliminary paper we leave out a detailed definition/implementation capturing the intention of Semmat 3 ; there are two possible starting points, namely combining Semmat 1𝑎 +Sem2 mat , or Semmat 1𝑏 +Sem2 mat , respectively. We emphasise though, that independently of this choice, a semantics that achieves the intention of Semmat 3 would still potentially run into arguable cases, since it might run into removing seemingly “disconnected” implicit assertions, whenever removed assertions cause these, as shown by the following example. Example 9. Assume a materialised triple store 𝐺 consisting only of the TBox triples :Father sc :Person, :Male . The behaviour of the following update sequence under a semantics implementing the intention of Semmat 3 is arguable: DELETE {} INSERT {:x a :Father.} WHERE {}; DELETE {:x a :Male.} INSERT {} WHERE {}; We leave it open for now whether “recursive deletion of dangling effects” is intuitive: in this case, should upon deletion of 𝑥 being Male, also be deleted that 𝑥 is a Person? In a strict reading of Semmat 3 ’s intention, :x a :Person. would count as a dan- gling effect of the cause for :x a :Male., since it is an effect of the inserted triple with no other causes in the store, and thus should be removed upon the delete operation. Lastly, we point out that while implementations of (materialised) triple stores may make a distinction between implicit and explicitly inserted triples (e.g., by storing explicit and implicit triples separately, as sketched in Semmat 1𝑏 already), we consider the distinction between implicit triples and explicitly inserted ones non-trivial in the context of SPARQL 1.1 Update: for instance, is a triple inserted based upon implicit bindings in the WHERE clause of an INSERT statement to be considered “explicitly inserted” or not? We tend towards avoiding such distinction, but we have more in-depth discussions of such philosophical aspects of possible SPARQL update semantics on our agenda. 4 TBox Updates So far, we have considered the TBox as static. As already noted in [11], additionally allowing TBox updates considerably complicates issues and opens additional degrees of freedom for possible semantics. While it is out of the scope of this paper to explore all of these, we limit ourselves to sketch these different degrees of freedom and suggest one pragmatic approach to extend updates expressed in SPARQL to RDFS TBoxes. In order to allow for TBox updates, we have to extend the update language: in the following, we will assume general BGPs, extending Def. 2. Definition 11 (general BGP). A general BGP is a set of triples of any of the forms from Table 1, where 𝑥, 𝑦, 𝐴, 𝐴′ , 𝑃, 𝑃 ′ ∈ 𝛤 ∪ 𝒱. We observe that with this relaxation for BGPs, updates as per Def. 3 can query TBox data, since they admit TBox triples in 𝑃𝑤 . In order to address this issue we need to also generalise the definition of query answers.14 Definition 12. Let 𝑄 be a union of general BGPs and [[𝑄]]𝐺 the simple SPARQL se- mantics as per [21], i.e., essentially the set of query answers obtained as the union of answers from simple pattern matching of the general BGPs in 𝑄 over the graph 𝐺. Then we define ans RDFS (𝑄, 𝐺) = [[𝑄]]mat(𝐺) In fact, Def. 12 does not affect ABox inferences, that is, the following corollary follows immediately from Prop. 1 for non-general UCQs as per Def. 2. Corollary 1. Let Q be a UCQ as per Def. 2. Then ans RDFS (𝑄, 𝐺) = ans rdfs (𝑄, 𝐺) As opposed to the setting discussed so far, where the last two rules in Fig. 1 used for TBox materialisation were ignored, we now focus on the discussion of terminological updates under the standard “intensional” semantics (essentially defined by the inference rules in Fig. 1) and attempt to define a reasonable (that means computable) semantics under this setting. Note that upon terminological queries, the RDFS semantics and DL semantics differ, since this“intensional” semantics does not cover all terminological inferences derivable in DL, cf. [7]; we leave the details of this aspect to future work. Observation 1. TBox updates potentially affects materialisation of the ABox, that is, (i) upon TBox insertions a materialised ABox might need to be re-materialised in order to preserve materialisation. (ii) upon TBox deletions in a materialised setting, we have a similar issue to what we called “dangling” effects earlier. Observation 2. Whereas deletions of implicit ABox triples can be achieved deter- ministically by deleting all single causes, TBox deletions involving sc and sp chains can be achieved in several distinct ways, as already observed by [11]. Example 10. Consider the graph 𝐺 = {:A sc :B. :B sc :C. :B sc :D. :C sc :E. :D sc :E. :E sc :F.} with the update DELETE{:A sc :F.} Independent of whether we assume a materialised TBox, we would have various choices here to remove triples, to delete all the causes for :A sc :F. 14 As mentioned in Fn. 7, elements of 𝛤 may act as individuals, concept, or roles names in parallel. In order to define a deterministic semantics for TBox updates, we need a canonical way to delete implicit and explicit TBox triples. Minimal cuts are suggested in [11] in the sc (or sp, resp.) graphs as candidates for deletions of sc (or sp, resp.) triples. However, as easily verified by Ex. 10, minimal multicuts are still ambiguous. Here, we suggest two update semantics using rewritings to SPARQL 1.1 property path patterns [12] that yield canonical cuts. Definition 13. Let 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ) be an update operation where 𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 are general Semmat ,𝑃𝑖 ,𝑃𝑤 ) = mat(𝐺𝑢(𝑃𝑑 ,𝑃𝑖 ,𝑃𝑤 ) ), where each triple {𝐴1 𝑠𝑐𝑝 𝐴2 } ∈ 𝑃𝑑 BGPs. Then 𝐺𝑢(𝑃𝑑outcut ′ ′ such that scp ∈ {sc, sp} is replaced within 𝑃𝑑′ by {𝐴1 𝑠𝑐𝑝 ?𝑥.}, and we add to 𝑃𝑤′ the property path pattern {𝐴1 𝑠𝑐𝑝 ?𝑥. ?𝑥 𝑠𝑐𝑝* 𝐴2 }. Analogously, Semmat incut is defined by replacing {?𝑥 𝑠𝑐𝑝 𝐴2 } within 𝑃𝑑′ , and adding {𝐴1 𝑠𝑐𝑝* ?𝑥. ?𝑥 𝑠𝑐𝑝 𝐴2 } within 𝑃𝑤′ instead. Both Semmat mat outcut and Semincut may be viewed as straightforward extensions of mat Sem0 , i.e., both are mat-preserving and equivalent to the baseline semantics for non-general BGPs (i.e., on ABox updates): Proposition 3. Let 𝑢(𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 ) be an update operation, where 𝑃𝑑 , 𝑃𝑖 , 𝑃𝑤 are (non- Semmat Semmat Semmat general) BGPs. Then 𝐺𝑢(𝑃𝑑outcut ,𝑃𝑖 ,𝑃𝑤 ) = 𝐺𝑢(𝑃𝑑 ,𝑃𝑖 ,𝑃𝑤 ) = 𝐺𝑢(𝑃𝑑 ,𝑃𝑖 ,𝑃𝑤 ) incut 0 The intuition behind Semmat 𝑜𝑢𝑡𝑐𝑢𝑡 is to delete for every deleted 𝐴 𝑠𝑐𝑝 𝐵. triple, all directly outgoing 𝑠𝑐𝑝 edges from 𝐴 that lead into paths to 𝐵, or, resp., in Semmat 𝑖𝑛𝑐𝑢𝑡 all directly incoming edges to 𝐵. This choice is motivated by the following proposition. Proposition 4. Let 𝑢 = DELETE {𝐴 𝑠𝑐𝑝 𝐵}, and let 𝐺 be a triple store with materi- Semmat Semmat alised TBox 𝒯 . Then, the TBox statements deleted by 𝐺𝑢(𝑃𝑑outcut ,𝑃𝑖 ,𝑃𝑤 ) (or, 𝐺𝑢(𝑃𝑑 ,𝑃𝑖 ,𝑃𝑤 ) , incut resp.) form a minimal cut [11] of 𝒯 disconnecting 𝐴 and 𝐵. The following example illustrates that the generalisation of Prop. 4 to updates involving the deletion of several TBox statements at once does not hold. Example 11. Assume the materialised triple store 𝐺 = {:A scp :B,:C,:D. :B scp :C, :D.} and 𝑢 = DELETE{:A scp :C. :A scp :D.}. Here, Semmat incut does not yield a minimal multicut in 𝐺 wrt disconnecting (:A,:C) and (:A,:D).15 As the example shows, the extension of the baseline ABox update semantics to TBox updates already yields new degrees of freedom. We leave a more in-depth discussion of TBox updates also extending the other semantics from Sec. 3 for future work. 5 Further Related Work and Possible Future Directions Previous work on updates in the context of tractable ontology languages such as RDFS [11] and DL-Lite [5] typically has treated DELETEs and INSERTs in isolation, but 15 An orthogonal example, where Semmat outcut would not yield a minimal multicut can be constructed symmetrically. not both at the same time nor in combination with templates filled by WHERE clauses, as in SPARQL 1.1; that is, these approaches are not based on BGP matching but rather on a set of ABox assertions to be updated known a priori. Pairing both DELETE and INSERT , as in our case, poses new challenges, which we tried to address here in the practically relevant context of materialised triple stores. In the future, we plan to extend our work in the context of DL-Lite, where we could build upon thoroughly studied query rewriting techniques (not necessarily relying on materialisation), and at the same time benefiting from a more expressive ontology language. Expanding beyond our simple minimal RDFS language towards more features of DL-Lite or coverage of unrestricted RDF graphs would impose new challenges: for instance, consistency checking and consistency-preserving updates (as those treated in [5]) do not yet play a role in the setting of RDFS; extensions in these directions as well as practically evaluating the proposed semantics on existing triple stores is on our agenda. In the area of database theory, there has been a lot of work on updating logical databases: Winslett [26] distinguishes between model-based and formula-based updates; our approach clearly falls in the latter category, more concretely, ABox updates could be viewed as sets of propositional knowledge base updates [14] generated by SPARQL instantiating DELETE/INSERT templates. Let us further note that in the more applied area of databases, there are obvious parallels between some of our considerations and CASCADE DELETEs in SQL (that is, deletions under foreign key constraints), in the sense that we trigger additional deletions of causes/effects in some of the proposed update semantics discussed herein. 6 Conclusions We have discussed the semantics of SPARQL 1.1 Update in the context of RDFS. To the best of our knowledge, this is the first work to discuss how to combine RDFS with the new SPARQL 1.1 Update language. While in this paper we have been operating on a very restricted setting (only capturing minimal RDFS entailments, restricting BGPs to disallow non-standard-use of the RDFS vocabulary), we could demonstrate that even in this setting the definition of a SPARQL 1.1 Update semantics under entailments is a non-trivial task. We proposed several possible semantics, neither of which might seem intuitive for all possible use cases; this might well suggest that there is no “one-size-fits- all” update semantics. Further, while ontologies should be “ready for evolution” [20], we believe that more work into semantics for updates of ontologies alongside with data (TBox & ABox) is still needed to ground research in Ontology Evolution into standards (SPARQL, RDF, RDFS, OWL), particularly in the light of the increased importance that RDF and SPARQL are experiencing in dynamic domains where also data is continuously updated (dealing with dynamics in Linked Data, querying sensor data, or stream reasoning). We have taken a first step in the present paper. Acknowledgements. This work has been funded by the Vienna Science and Technology Fund (WWTF, project ICT12-015), by the Vienna PhD School of Informatics, and by EU Project Optique (grant n. FP7-318338). References 1. Ahmeti, A., Polleres, A.: SPARQL update under RDFS entailment in fully materialized and redundancy-free triple stores. In: Proc. of the 2nd Int. Workshop on Ordering and Reasoning (OrdRing) (2013) 2. Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1 Turtle – Terse RDF Triple Language. W3C Recommendation, World Wide Web Consortium (Feb 2014), available at http://www.w3.org/TR/turtle/ 3. Bishop, B., Kiryakov, A., Ognyanoff, D., Peikov, I., Tashev, Z., Velkov, R.: OWLIM: A family of scalable semantic repositories. Semantic Web J. 2(1), 33–42 (2011) 4. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning 39(3), 385–429 (2007) 5. Calvanese, D., Kharlamov, E., Nutt, W., Zheleznyakov, D.: Evolution of DL-Lite knowledge bases. In: Proc. of the 9th Int. Semantic Web Conf. (ISWC). pp. 112–128 (2010) 6. Ceri, S., Widom, J.: Deriving incremental production rules for deductive data. Information Systems 19(6), 467–490 (1994) 7. Franconi, E., Gutierrez, C., Mosca, A., Pirrò, G., Rosati, R.: The logic of extensional RDFS. In: Proc. of the 12th Int. Semantic Web Conf. (ISWC). Lecture Notes in Computer Science, vol. 8218, pp. 101–116. Springer (2013) 8. Gearon, P., Passant, A., Polleres, A.: SPARQL 1.1 update. W3C Recommendation, World Wide Web Consortium (Mar 2013), available at http://www.w3.org/TR/ sparql11-update/ 9. Glimm, B., Ogbuji, C.: SPARQL 1.1 entailment regimes. W3C Recommendation, World Wide Web Consortium (Mar 2013), available at http://www.w3.org/TR/ sparql11-entailment/ 10. Gupta, A., Mumick, I.S., Subrahmanian, V.S.: Maintaining views incrementally. In: Proc. of the ACM SIGMOD Int. Conf. on Management of Data. pp. 157–166 (1993) 11. Gutierrez, C., Hurtado, C., Vaisman, A.: Updating RDFS: from theory to practice. In: Proc. of the 8th Extended Semantic Web Conf. (ESWC) (2011) 12. Harris, S., Seaborne, A.: SPARQL 1.1 query language. W3C Recommendation, World Wide Web Consortium (Mar 2013), available at http://www.w3.org/TR/ sparql11-query 13. Hayes, P.: RDF semantics. W3C Recommendation, World Wide Web Consortium (Feb 2004), available at http://www.w3.org/TR/rdf-mt/ 14. Katsuno, H., Mendelzon, A.O.: A unified view of propositional knowledge base updates. In: Proc. of the 11th Int. Joint Conf. on Artificial Intelligence (IJCAI). pp. 1413–1419 (1989) 15. Kontchakov, R., Rodriguez-Muro, M., Zakharyaschev, M.: Ontology-based data access with databases: A short course. In: Reasoning Web. Semantic Technologies for Intelligent Data Access – 9th Int. Summer School Tutorial Lectures (RW), LNCS, vol. 8067, pp. 194–229. Springer (2013) 16. Kotowski, J., Bry, F., Brodt, S.: Reasoning as axioms change - Incremental view maintenance reconsidered. In: Proc. of the 5th Int. Conf. on Web Reasoning and Rule Systems (RR). LNCS, vol. 6902, pp. 139–154. Springer (2011) 17. Mallea, A., Arenas, M., Hogan, A., Polleres, A.: On blank nodes. In: Proc. of the 10th Int. Semantic Web Conf. (ISWC). LNCS, vol. 7031, pp. 421–437. Springer (2011) 18. Motik, B.: On the properties of metamodeling in OWL. J. of Logic and Computation 17(4), 617–637 (2007) 19. Muñoz, S., Pérez, J., Gutiérrez, C.: Minimal deductive systems for RDF. In: Proc. of the 4th European Semantic Web Conf. (ESWC). pp. 53–67 (2007) 20. Noy, N.F., Klein, M.C.A.: Ontology evolution: Not the same as schema evolution. Knowledge and Information Systems 6(4), 428–440 (2004) 21. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and complexity of SPARQL. ACM Trans. on Database Systems 34(3), 16:1–16:45 (2009) 22. Polleres, A., Hogan, A., Delbru, R., Umbrich, J.: RDFS & OWL reasoning for linked data. In: Reasoning Web. Semantic Technologies for Intelligent Data Access – 9th Int. Summer School Tutorial Lectures (RW), Lecture Notes in Computer Science, vol. 8067, pp. 91–149. Springer (2013) 23. Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C Recommenda- tion, World Wide Web Consortium (Jan 2008), available at http://www.w3.org/TR/ rdf-sparql-query 24. Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. 1. Computer Science Press (1988) 25. Urbani, J., Margara, A., Jacobs, C.J.H., van Harmelen, F., Bal, H.E.: DynamiTE: Parallel materialization of dynamic RDF data. In: Proc. of the 12th Int. Semantic Web Conf. (ISWC). Lecture Notes in Computer Science, vol. 8218, pp. 657–672. Springer (2013) 26. Winslett, M.: Updating Logical Databases. Cambridge University Press (2005)