Annotation algebras for RDFS Peter Buneman Egor V. Kostylev University of Edinburgh University of Edinburgh Email: opb@inf.ed.ac.uk Email: ekostyle@inf.ed.ac.uk Abstract—Provenance and annotation are intimately con- temporal RDF [10] propose methods of adding annotations nected. On the one hand provenance can be regarded as a to RDF triples to express belief, trust, or temporal properties. form of annotation, on the other hand, in order to understand Can we simply follow the work for relational databases in how to convey annotations from source data to derived data, we need an account of how the data was derived – its provenance. developing a general model for such annotations? Here we Following a successful line of database research in which elements have to start by looking not at query languages for RDF, of a database are annotated with algebraic terms that describe but at the inference rules for the ontologies such as those the provenance of those elements, we develop an algebra of of RDFS [1]. Given annotations on the base triples, what annotations for RDFS that differs from that developed for should be the annotations on an inferred triple? Indeed in [15], relational databases. We show how such an annotation algebra can be used for computing annotations on inferred triples that [16], [12], with an initial goal applying fuzzy logic to RDFS provide information on belief, trust and temporal aspects of data proposes an algebra similar in many respects to that of [8]. as well as providing a framework for default reasoning. In this paper we propose a somewhat more general – and possibly simpler – algebra to serve this purpose. Our proposals I. I NTRODUCTION differ from the semirings in [8] in two ways. First, there are With the increasing interest in provenance in both databases situations in which we do not want commutativity and second, and ontologies, there have been a number of proposals and while the number of triples inferred in an RDFS graph is systems for annotation of the underlying data with information always finite, the derivations can be unbounded. We therefore concerning time, belief, and various aspects of provenance. need an extra condition to prevent “infinite annotations”. This The question that has been repeatedly posed in databases [17], condition precludes the possibility of bag semantics, which is [5], [2] is what happens to these annotations when a query useful for relational algbra but appears to be inapplicable to is applied? Are they ignored or are they somehow passed ontologies. Also in [8] there is a compact representation – a through the query? In fact generic prototypes [4] and systems polynomial – of terms in the semiring algebra. In the algebra that are specific to some domain [6] have been developed we develop, the compact representation is somewhat different. for propagating annotations through queries. Suppose, for To introduce annotation algebras, we give two simple exam- example, we take two tables S and T in which the individual ples of annotating an RDF graph shown in Fig. 1(a) (a dashed tuples have been annotated. How should we annotate the tuples arrow represents a triple, which can be inferred from the rest in the join of S and T ? An obvious answer is to put on of the graph). any output tuple the union of the annotations on the two Example 1: A temporal extension of RDF was introduced contributing tuples. This does not always make sense; for in [10], [9]. It consists of attaching to every RDF triple a set example if the input tuples are annotated with the set of people intervals that represents the times at which the triple is valid. who believe that this tuple is true or with the set of database An example of a temporal annotation of the graph is shown in versions for which the tuple is actually in the database, it Fig. 1(b): Picasso worked as a cubist from 1908 to 1919, paints might be more appropriate to annotate a join tuple with the are created by painters at least since engravings in Chauvet- intersection of the relevant annotations. Pont-d’Arc cave from about 29,000BC, and so on. The point This problem has resulted in a variety of proposals for here is that an annotation for the inferred triple was obtained propagating annotations through queries. Notably, by using by an intuitive calculation ({1908 − 1919} ∩ {1906 − 1921}) ∪ a semiring model for annotations, in [8] a tuple is annotated ({1937} ∩ {−29, 000 − N ow}) = {1908 − 1919, 1937}. with a term in a semiring algebra that describes the provenance Example 2: In [15] fuzzy annotations of RDF are used to of the tuple – how it was formed by the operations of the describe degree of trust. To every triple a real in the range relational query that constructed it. By suitable instatiatons [0, 1] is attached. Such an annotation is shown in Fig. 1(c). The operations of the semiring, one can realize various extensions annotation for the inferred triple was calculated as max(0.8 ∗ to relational databases such as probabilistic databases, multi- 0.4, 0.3 ∗ 1) = 0.32. set semantics and certain kinds of constraint databases. The There is an obvious similarity between these examples: the semiring model also generalizes a number of other models for calculations are both of the general form (a ⊗ b) ⊕ (c ⊗ d), provenance [5], [2]. and the calculations for the inferred triple are performed by Turning to ontologies, proposals have also been suggested suitably instantiating the operators ⊕ and ⊗. In fact [16], for the annotation of RDF [13]. Named graphs [3] and both of the annotation domains form so-called BL-algebras sc sc:[1906-1921] sc: 0.4 Cubist Painter Cubist Painter Cubist Painter type: [1908- type: type: type 1919, 1937] dom: type: 0.32 type dom [1908- dom: 1 [-29,000-Now] 0.8 1919] Paints Paints :[1937] Paints : 0.3 Picasso Guernica Picasso Guernica Picasso Guernica (a) (b) (c) Fig. 1. Standard and annotated RDF graphs [11], which in general preserve the properties of the deductive – ∆P is a set of property names (not necessarily disjoint system [16]. from ∆R ), Before describing these constructions in more detail we – ∆C ⊆ ∆R is a set of classes, should briefly connect this work with the general issue of – P J·K : ∆P → 2∆R ×∆R is a property extension function, provenance. Our first observation is that an algebraic anno- – CJ·K : ∆C → 2∆R is a class extension function, tation or a triple of the form we have described is a synopsis – ·I : U → ∆R ∪ ∆P is an interpretation mapping. of how that triple was derived – which is surely part of The interpretation I is a model for a graph G over ρdf, denoted its provenance. The second is that exogenous provenance by I |= G, iff the conditions in Tab. I hold2 . A graph G entails information such as who created a triple or when it was created G0 , denoted G |= G0 , if every model of G is a model of G0 . can be added following the proposals of [3]. We would also A sound and complete deductive system for entailment [14] like to compute such annotations for an inferred triple. Our is presented in Tab. II. An instantiation of an inference rule r proposals provide a method for tranferring the provenance from the system is a replacement of the variables A, B, C, X , annotations of explict triples to those that are derived. and Y, occurring in the rule, by references from U. If there is an instantiation RR0 of r such that R ⊆ G, then the graph Outline. This work has two aims. The first is to determine G0 = G ∪ R0 is the result of an application of r to G. A what algebraic structure is necessary for an annotation domain graph G0 is inferred from G, denoted by G ` G0 , iff G0 is to keep the behavior of the deductive system the same as in obtained from G by successively applying rules in Tab. II. the standard case. The second is to find “the most general” This entailment can be checked by computing the closure of of such structures, which allows annotation of RDF graphs G, which is the maximal graph which can be inferred from by elements of the general structure, apply inference rules, G. It can be done in quadratic time [14]. and then obtain annotations from specific domains on demand. In the following sections we review RDFS, introduce a new III. A NNOTATED RDFS annotation algebra and provide some evidence that this is the appropriate algebra for RDFS. We give a freeness result that Definition 1: Given an algebra K with an elements set K allows us to represent the terms of this annotation algebra and containing a distinguished element ⊥ a K-annotated RDF give some examples of the use of the algebra. graph (or simply an a-graph, if K is clear) is a function G : T → 2K such that for each t ∈ T holds ⊥ ∈ G(t) II. P RELIMINARIES and the set Supp(G) = {t : v | v ∈ G(t), v 6= ⊥} is finite. Given a set of RDF URI references U1 , let T be the set of If v ∈ G(t) we write t : v ∈ G and call it an a-triple. RDF triples of the form (s, p, o) ∈ U × U × U. Here s, p and An a-graph G is schema-acyclic, if the subgraphs {(s, e, o) : o are called subject, predicate and object correspondingly. An v | (s, e, o) : v ∈ Supp(G)}, e = sc, sp, do not contain RDF graph (or simply graph) is a finite set of triples G ⊆ T . non-trivial loops. The semantics for a-graphs is given in the The RDF specification[13] includes RDF Schema following definitions. (RDFS) [1] which is a vocabulary of reserved words designed Definition 2: A K-annotated interpretation is a tuple I = to describe relationships between resources and properties. (∆R , ∆P , ∆C , P J·K, CJ·K, ·I ) where ∆R , ∆P , ∆C and ·I are In this work we use the ρdf = {sp, sc, type, dom, range} the same as for the standard interpretation and P J·K, CJ·K are fragment of RDFS [14]. The elements of ρdf represent defined as follows: sub-property, sub-class, domain, and range properties – for each p ∈ ∆P holds P JpK : ∆R × ∆R → K, correspondingly. It is widely accepted that ρdf is a stable – for each c ∈ ∆C holds CJcK : ∆R → K. core of RDFS. To define models for annotated RDFS we need more struc- An interpretation of an RDF graph is a tuple I = ture on the algebra K. Let ⊕ and ⊗ be binary operations on (∆R , ∆P , ∆C , P J·K, CJ·K, ·I ) where K and ⊥, > be distinct constants in it. For every a, b ∈ K – ∆R is a nonempty set of resources, 2 The form of conditions in our definition of model is slightly different 1 For the sake of simplicity we do not consider blank nodes and literals. from that in [14], but they are equivalent per se. It is done to simplify the Their inclusion does not change the results of this work. comparison with notion of model for annotated RDFS given in Sect. III. (1) Simple interpretation: – if (p, q), (q, r) ∈ P JspI K then (p, r) ∈ P JspI K; – for each (s, p, o) ∈ G holds pI ∈ ∆P and (sI , oI ) ∈ P JpI K. – if (x, y) ∈ P JpK and (p, q) ∈ P JspI K then (x, y) ∈ P JqK. (2) Properties and classes: (4) Sub-class: – for each e ∈ ρdf holds eI ∈ ∆P ; – (c, c) ∈ P JscI K; – if (x, y) ∈ P JspI K then x, y ∈ ∆P ; – if (c, d), (d, e) ∈ P JscI K then (c, e) ∈ P JscI K; – if (x, y) ∈ P JscI K then x, y ∈ ∆C ; – if x ∈ CJcK and (c, d) ∈ P JscI K then x ∈ CJdK. – if (x, y) ∈ P JtypeI K then y ∈ ∆C ; (5) Typing: – if (x, y) ∈ P JdomI K then x ∈ ∆C and y ∈ ∆P ; – (x, c) ∈ P JtypeI K iff x ∈ CJcK; – if (x, y) ∈ P JrangeI K then x ∈ ∆C and y ∈ ∆P . – if (x, y) ∈ P JpK and (c, p) ∈ P JdomI K then x ∈ CJcK; (3) Sub-property: – if (x, y) ∈ P JpK and (c, p) ∈ P JrangeI K then y ∈ CJcK. – (p, p) ∈ P JspI K; TABLE I RDFS SEMANTICS (1) Sub-property: (2) Sub-class: (3) Typing: (4) Sub-class Reflexivity: (A, sp, B) (B, sp, C) (A, sc, B) (B, sc, C) (X , A, Y) (A, dom, B) (A, sc, B) (a) ; (a) ; (a) ; (a) ; (A, sp, C) (A, sc, C) (X , type, B) (A, sc, A) (B, sc, B) (X , A, Y) (A, sp, B) (X , type, A) (A, sc, B) (X , A, Y) (A, range, B) (X , e, A) (b) . (b) . (b) . (b) for e ∈ {dom, range, type}. (X , B, Y) (X , type, B) (Y, type, B) (A, sc, A) (5) Sub-property Reflexivity: (X , A, Y) (A, sp, B) (A, e, X ) (a) ; (b) for e ∈ ρdf ; (c) ; (d) for e ∈ {dom, range}. (A, sp, A) (e, sp, e) (A, sp, A) (B, sp, B) (A, sp, A) TABLE II RDFS DEDUCTIVE SYSTEM we write a  b iff there exists c ∈ K such that a ⊕ c = b. (4) ⊗ is ⊥-annihilating, i.e. for each a holds ⊥ ⊗ a = ⊥ = The addition operation ⊕ will be used to combine annotations a ⊗ ⊥. for the same triple and the ⊥ constant represents the fact, that A dioid is >-dioid iff there is no information about a triple. The product operation (5) ⊕ is >-annihilating, i.e. for each a holds > ⊕ a = >. ⊗ will be used to join annotations when applying inference rules and > represents the maximal annotation. Note, that >-annihilation entails idempotence from (1). Definition 3: Let K = hK, ⊕, ⊗, ⊥, >i be an algebra of An instantiation of a rule from the deductive system in type (2, 2, 0, 0). The K-annotated interpretation I is a model Tab. IV is a replacement of variables A, B, C, X , and Y by for an a-graph G, denoted I |= G, iff the conditions in Tab. III elements of U, and variables v, v1 , v2 , and v3 by elements of hold. An a-graph G entails H, denoted G |= H, if for every K, such that all relations for annotations hold. An application I |= G holds I |= H. of a rule to an a-graph G and a deduction of an a-graph G0 By these definitions, each (non-annotated) RDF graph G from G, denoted by G ` G0 , is defined exactly the same way can be considered as an a-graph G0 = {t : > | t ∈ G} ∪ E, as for the standard case. if K = {⊥, >} and E = {t : ⊥ | t ∈ T }. In this case, the Note, that the system in Tab. IV differs from the one in definition of model for an a-graph coincides with the standard. Tab. II only by the presence of annotations and the generali- sation rule (∗) which combines annotations for the same triple. IV. A DEDUCTIVE SYSTEM FOR ANNOTATED RDFS AND Thus, that is natural to expect the new system to behave the DIOIDS same as the standard one. Particularly, they should coincide if K = {⊥, >}. To obtain it we need some properties of K. Definition 4: An algebra K = hK, ⊕, ⊗, ⊥, >i is a dioid iff it is an idempotent semi-ring, i.e. Definition 5: Let R be the set of all inference rules of the form T1TT2 from Tab. IV and Ins(r) the set of all instantiations (1) hK, ⊕, ⊥i is a semilattice, i.e. for each a, b, and c hold: of a rule r ∈ R. (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c) (associativity), a ⊕ b = b ⊕ a, 0 (commutativity), a ⊕ ⊥ = a (neutral element), a ⊕ a = a (1) A set of rules R 0 ⊆ R is associative, iff for every r, r0 ∈ τ τ (idempotence); τ τ R0 , 1τ4 2 , τ 0 4 ∈ Ins(r), and τ4τ5τ3 , τ2τ 0τ3 ∈ Ins(r0 ) 1 5 4 (2) hK, ⊗, >i is a monoid, i.e. for each a, b, and c hold: holds τ5 = τ50 . (a⊗b)⊗c = a⊗(b⊗c) (associativity), a⊗> = a = >⊗a (2) A rule r ∈ R is commutative, iff for every τ1ττ2 ∈ Ins(r) (neutral element); holds τ2ττ1 ∈ Ins(r). (3) ⊗ is left and right distributive over ⊕, i.e. for each a, b, (3) A rule r ∈ R is idempotent, if τττ ∈ Ins(r) for every τ . and c hold: a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c), (b ⊕ c) ⊗ a = (4) A set of rules R0 ⊆ R is left distributive over r ∈ R τ 0 τ 00 (b ⊗ a) ⊕ (c ⊗ a); if for every r0 ∈ R0 , τ2τ4τ3 , 4τ 0 4 ∈ Ins(r), and 5 (1) Simple interpretation: – P JspI K(p, q) ⊗ P JspI K(q, r)  P JspI K(p, r); – for each (s, p, o) : v ∈ G holds pI ∈ ∆P and v  P JpI K(sI , oI ). – P JpK(x, y) ⊗ P JspI K(p, q)  P JqK(x, y). (2) Properties and classes: (4) Sub-class: – for each e ∈ ρdf holds eI ∈ ∆P ; – P JscI K(c, c) = >, – P JspI K(x, y) is defined only for x, y ∈ ∆P ; – P JscI K(c, d) ⊗ P JscI K(d, e)  P JscI K(c, e); – P JscI K(x, y) is defined only for x, y ∈ ∆C ; – CJcK(x) ⊗ P JscI K(c, d)  CJdK(x). – P JtypeI K(x, y) is defined only for y ∈ ∆C ; (5) Typing: – P JdomI K(x, y) is defined only for x ∈ ∆C and y ∈ ∆P ; – P JtypeI K(x, c) = CJcK(x); – P JrangeI K(x, y) is defined only for x ∈ ∆C and y ∈ ∆P . – P JpK(x, y) ⊗ P JdomI K(c, p)  CJcK(x); (3) Sub-property: – P JpK(x, y) ⊗ P JrangeI K(c, p)  CJcK(y). – P JspI K(p, p) = >, TABLE III A NNOTATED RDFS S EMANTICS (1) Sub-property: (2) Sub-class: (3) Typing: (A, sp, B) : v1 (B, sp, C) : v2 (A, sc, B) : v1 (B, sc, C) : v2 (X , A, Y) : v1 (A, dom, B) : v2 (a) ; (a) ; (a) ; (A, sp, C) : v1 ⊗ v2 (A, sc, C) : v1 ⊗ v2 (X , type, B) : v1 ⊗ v2 (X , A, Y) : v1 (A, sp, B) : v2 (X , type, A) : v1 (A, sc, B) : v2 (X , A, Y) : v1 (A, range, B) : v2 (b) . (b) . (b) . (X , B, Y) : v1 ⊗ v2 (X , type, B) : v1 ⊗ v2 (Y, type, B) : v1 ⊗ v2 (*) Generalisation: (4) Sub-class Reflexivity: (X , A, Y) : v1 (X , A, Y) : v2 (A, sc, B) : v (X , e, A) . (a) ; (b) for e ∈ {dom, range, type}. (X , A, Y) : v1 ⊕ v2 (A, sc, A) : v (B, sc, B) : v (A, sc, A) (5) Sub-property Reflexivity: (X , A, Y) : v (A, sp, B) : v (A, e, X ) : v (a) ; (b) for e ∈ ρdf ; (c) ; (d) for e ∈ {dom, range}. (A, sp, A) : v (e, sp, e) : > (A, sp, A) : v (B, sp, B) : v (A, sp, A) : v TABLE IV A NNOTATED RDFS DEDUCTIVE SYSTEM τ1 τ4 τ1 τ 2 τ1 τ3 0 0 τ5 , τ40 , τ400 ∈ Ins(r ) holds τ5 = τ5 . As in the standard case, the important notion is the closure 0 cl(G) = {τ | G ` {τ }} of an a-graph G. To compute it we (5) A set of rules R ⊆ R is right distributive over r ∈ τ 0 τ 00 need a representation of an a-graph G, which is a finite set of R if for every r0 ∈ R0 , τ1τ4τ2 , 4τ 0 4 ∈ Ins(r), and 5 τ4 τ3 τ1 τ 3 τ2 τ3 0 0 a-triples RG such that RG ∪E = G, E = {t : ⊥ | t ∈ T }. The τ5 , τ40 , τ400 ∈ Ins(r ) holds τ5 = τ5 . set Supp(G) by the definition is a representation of G, but we (6) A rule r ∈ R is v-neutral, v ∈ K, if for every t1 :v t2 :v2 t3 :v3 ∈ t1 :v1 t2 :v also want to have a possibility to work with representations Ins(r) holds v2 = v3 and for every t3 :v3 ∈ Ins(r) which contain an finite number of ⊥-annotated triples. holds v1 = v3 . Proposition 2: Let G be an a-graph and K a >-dioid. For (7) A rule r ∈ R is v-annihilating, v ∈ K, if for every t1 :v t2 :v2 every representation RG of G there exists an representation t3 :v3 ∈ Ins(r) holds v3 = v and for every t1 :v 1 t2 :v t3 :v3 ∈ Rcl(G) of cl(G) which can be computed in polynomial time in Ins(r) holds v3 = v. the size of RG if the complexities of ⊕ and ⊗ are polynomial Proposition 1: The set of inference rules R0 = bounded. {(1a), (1b), (2a), (2b), (3a), (3b)} in Tab. IV is associative, the Let K and K0 be two algebras. For any function h : K → K0 set of rules {(8)} is associative, the rule (8) is commutative, and K-annotated graph G denote h(G) the set of K0 -annotated idempotent and ⊥-neutral, the set R0 is left and right triples formed from G by applying h to each annotation. distributive over the rule (8), and each of the rules from R0 Proposition 3: Let h : K → K0 and K, K0 be >-dioids. For are >-neutral and ⊥-annihilating iff K is a dioid. Theorem 1 (Soundness and completeness): Given a dioid every K-annotated graph G the set h(G) is a K0 -annotated K and a-graphs G and H hold. graph and holds cl(h(G)) = h(cl(G)) iff h is a dioid homomorphism. (1) If G ` H then G |= H. (2) If G is schema-acyclic and G |= H then for every t : v ∈ V. S TRING DIOIDS FOR RDFS ANNOTATION H there exists v 0  v such that G ` t : v 0 . (3) If K is >-annihilating and G |= H then for every t : v ∈ Prop. 3 enables us to obtain an a-graph from another one H there exists v 0  v such that G ` t : v 0 . without recomputing annotations for inferred triples. The next Hence, the deductive system behaves the same as the step is to develop a “universal” annotation, i.e. an annotation standard one iff K is a dioid for schema-acyclic a-graphs and from which we can obtain any other one by applying a a >-dioid in general case. corresponding dioid homomorphism. Given an alphabet Σ define an subsequence order on the set γ1 ⊗T γ2 = sup{γ | γ  γi , i = 1, 2}, ⊥T = ∅ and >T = P. of words Σ∗ : u ≤ u0 iff for some u1 , . . . , un , w1 , . . . wn−1 ∈ The domain KT forms an BL-algebra and hence a >-dioid. Σ∗ holds u = u1 u2 . . . un and u0 = u1 w1 u2 w2 . . . wn−1 un . A Fuzzy RDF [15] treats the Ex. 2. The domain here is KF = finite set m ⊆ Σ∗ is an antichain if for all u ≤ u0 , u, u0 ∈ m, h[0, 1], max, ⊗F , 0, 1i, where ⊗F is any t-norm of BL-algebra. holds u = u0 . Let min(m) be the set of minimal elements If ⊗F is an ordinary multiplication as in Ex. 2, then the domain (w.r.t. ≤) of m. On the set of antichains M [Σ] we define: becomes probabilistic. As KT domain, KF is a >-dioid. Default RDF. In both of the previous domains the product m1 + m2 = min(m1 ∪ m2 ), operation in the dioid is commutative. Next we introduce an m1 × m2 = min({w1 w2 | w1 ∈ m1 , w2 ∈ m2 }), example of ⊗-noncommutative annotations. Call M[Σ] = hM [Σ], +, ×, ∅, {}i, where  is the empty Suppose we want to represent – in an RDF graph – string, a string dioid over generators Σ. Note, that a string information about an attribute attached to resources modelled dioid is a >-dioid. The following proposition says, that it is by the graph. The straightforward way is to introduce a new “the most general” of all >-dioids. property to the vocabulary of the graph and handle it as usual. Proposition 4: (1) Given a set of generators Σ the string Nevertheless, in some situations this way can be not optimal: dioid M[Σ] is the free >-dioid on Σ, i.e. for any >-dioid If we have a broad system of classes and values of the attribute Kf = hK, ⊕, ⊗, ⊥, >i and a valuation φ : Σ → K there for subclasses and elements of a class are usually the same, exists a unique homomorphism Evalφ : M [Σ] → K such that then it is natural to keep the default attribute value for the for each a ∈ Σ holds Evalφ (a) = φ(a). class and the value for a resource only if it differs from usual (2) The operations of the string dioid + and × can be one of the class it belongs to. computed in polynomial time. For a substantiation, consider an RDF graph denoted in Hence, string dioids consititute an important and general Fig. 2 representing a (part of) botanical taxonomy3 . The subclass of >-dioids. We now show how they can be applied triples of this graph have annotations which store values to annotate RDF graphs. Let G be a K-annotated graph and of an attribute PublishedBy for every taxon in the graph. X a set of triple ids of triples from Supp(G). We associate to The division Pinophyta was introduced by Carl Linnaeus, G an “abstract” version which is a (X ∪ ∅)-annotated graph Ḡ so the triple (P lantae, sc, P inophyta) is annotated by consisted of the same triples as G, but the annotation of each Linnaeus. The subsequent taxons down to family Arau- of them is its id if it has one, and ∅ otherwise. cariaceae keep this attribute value, so we annotate all sc- Theorem 2: For every K-annotated graph G holds cl(G) = triples between Pinophyta and Araucariaceae with a special Evalφ ◦ cl(Ḡ), where φ : X → K is a valuation which value ? which means “derived from above”. The same value associates the annotation of an a-triple in G to its id. keeps for genus Araucaria and species Araucaria Araucana. This theorem gives rise to the following strategy for an- But genus Wollemia was introduced by David Noble as notating RDF graphs. Given a graph G we are to annotate well as its only species Wollemia Nobilis, so the anno- it with elements from several domains. However, we would tations for the triples (Araucariaceae, sc, W ollemia) and like to infer triples and their annotations only once, without (W ollemia, sc, W.N obilis) are N oble and ? correspond- recomputing the annotations for each annotation domain. In ingly. Thus, to find out from the graph who introduced a taxon this case we construct an “abstract” version Ḡ of G by we need to go up from its node along the tree until we find an annotating triples with their ids. Then we can apply inference edge with an annotation differs from ?. An annotation value rules and obtain an abstract annotation from the string dioid for a triple here represents not the value of the attribute by over the set of ids for each triple. Finally, as soon as we need itself, but its difference with the value of higher triple in the to get annotations from a specific domain we need to make tree. Hence, it is natural to phisically keep such annotations in sure that it is a >-dioid, define a homomorphism by attaching a memory only it they differ from ?. That is why if the value specific annotations to triples in G and then apply it to abstract of an attribute for a resource in most cases is determined by a annotations of previously inferred triples. class it belongs to and the number of attributes are large, the advantage in memory can be sufficient. VI. A PPLICATIONS TO SPECIFIC MODELS Before we define the default domain formally, note, that the In this section we introduce several annotation domains for situation in general case can be more complicated than in the RDF graphs those are >-dioids. latter example, because a ρdf subgraph of a graph does not Temporal RDF [16]. This model treats the Ex. 1. The nesessarily have a tree structure. Therefore, in certain cases we temporal domain KT = hKT , ⊕T , ⊗T , ⊥T , >T i is defined as need to combine different annotations for a triple. To this effect follows. Consider temporal intervals [α1 , α2 ] where α1,2 ∈ we require a set of attirbute values to have a union operation P = Z ∪ {−∞, +∞}, α1 < α2 . Two intervals [α1 , α2 ] and the lowest element, i.e. to be a semilattice. In many cases and [α3 , α4 ] are adjacent iff α2 + 1 = α3 . The set KT (as in the taxonomy example) this set has no structure, but it is is the set of all pairwise disjoint and non-adjacent sets of possible to enrich it with the lowest element Unknown and the intervals. On KT a partial order is defined: γ1  γ2 iff greatest element Error. The latter is justified by situations for each I1 ∈ γ1 there exists I2 ∈ γ2 such that I1 ⊆ I2 . For KT we have: γ1 ⊕T γ2 = inf{γ | γ  γi , i = 1, 2}, 3 This is just an example and the actual information may be incorrect. . . . sc: ? Plantae sc: Linnaeus Pinophyta sc: ? . . . sc: ? Araucariaceae sc: ? Araucaria sc: ? A. araucana sc: N oble Wollemia W. nobilis sc: ? Fig. 2. The RDF graph representing botanical taxonomy when from one source we get an attribute value for a resource, have given some examples of its use. There may be some but from another source we get a different value for the same mileage to be gained by combining these two algebras. One of resource which contradicts with the first one. The union of this the applications of the proposed algebra is a system for default information is inconsistent and requires a manual intervention, reasoning about certain annotations on RDF resources, which which can be flagged by Error annotation. may also prove to be a useful mechanism for physically storing Let A be an attribute with a domain hA, t, 0i which is a those annotations. We would like to find similar mechanisms semilattice with union t and the lowest element 0. Consider an for efficiently storing default annotations on triples. A-default domain KA = hA? ∪ {⊥? , >? }, ⊕A , ⊗A , ⊥? , >? i, We are grateful to Marcelo Arenas, Boris Motik, Floris where A? = (A×{T rue, F alse}) and ⊥? , >? are new special Geerts and Alex Simpson for helpful discussions. This work symbols. Note, that the meanings of ⊥? and an annotation was supported by a grant from the UK Engineering and with 0 are different: The first one says that a triple is not in Physical Sciences Research Council. the support of a graph and the second says that the value of R EFERENCES the attribute is minimal according to t. The second boolean [1] D. Brickley and R. V. Guha. RDF Vocabulary Description Language component of A? corresponds to ? in the taxonomy example 1.0: RDF Schema. W3C Recommendation. http://www.w3.org/TR/ above, i.e. it is T rue in an annotation for a triple iff the actual rdf-schema/, Feb. 2004. value of the attribute is the union of the first component of the [2] P. Buneman, S. Khanna, and W. Tan. Why and Where: A Characteriza- tion of Data Provenance. In ICDT 2001, volume 1973 of LNCS, pages annotation with values those can be derived from triples above, 316–330. Springer, 2001. and F alse overwise. The operations are defined as follows: [3] J. J. Carroll, C. Bizer, P. J. Hayes, and P. Stickler. Named graphs, provenance and trust. In WWW, pages 613–622, 2005. (a1 , b1 ) ⊕A (a2 , b2 ) =  (a1 t a2 , b1 ∨ b2 ), [4] L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. DBNotes: a post-it (a1 t a2 , b2 ) iff b1 = T rue, system for relational databases based on provenance. In SIGMOD ’05: (a1 , b1 ) ⊗A (a2 , b2 ) = Proceedings of the 2005 ACM SIGMOD international conference on (a1 , b1 ) iff b1 = F alse. Management of data, pages 942–944, New York, NY, USA, 2005. ACM. [5] Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a These operations extend to elements ⊥? and >? in a way to warehousing environment. ACM Trans. Database Syst., 25(2):179–227, keep the properties of ⊥ and > in the dioid. Next, we describe 2000. the meaning of this operations. The addition ⊕A is applied [6] R. D. Dowell, R. M. Jokerst, A. Day, S. R. Eddy, and L. Stein. The Distributed Annotation System. BMC Bioinformatics, 2:7, 2001. when we union annotations about the same triple. Hence, the [7] G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, and values a1 and a2 are joined by the semilattice operation t V. Christophides. Coloring RDF Triples to Capture Provenance. and the possibility to derive values from a ρdf structure above In International Semantic Web Conference, pages 196–212, 2009. [8] T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. exists only if it exists in any of the considering annotations. In PODS ’07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT- The product ⊗A is applied when we infer triples by transitivity SIGART symposium on Principles of database systems, pages 31–40, of sc or similar inference rule from Tab. IV. If b1 = T rue we New York, NY, USA, 2007. ACM. [9] C. Gutierrez, C. A. Hurtado, and A. Vaisman. Introducing Time into union the current value a1 with the derived value a2 and the RDF. IEEE Trans. on Knowl. and Data Eng., 19(2):207–218, 2007. possibility of father deriving depends on b2 . If b1 = F alse, we [10] C. Gutiérrez, C. A. Hurtado, and A. A. Vaisman. Temporal RDF. In just keep the annotation (a1 , b1 ) which override any annotation ESWC, pages 93–107, 2005. [11] P. Hájek. Metamathematics of Fuzzy Logic (Trends in Logic). Springer, from a structure above. Finally, KA can be easily checked to 1 edition, November 2001. be ⊗-noncommutative >-dioid. [12] A. Hogan, G. Lukacsy, N. Lopes, A. Polleres, U. Straccia, A. Zimmer- To use the A-default dioid KA for storing values of an mann, and S. Decker. RDF Needs Annotations. In Proceedings of W3C Workshop — RDF Next Steps. W3C, 2010. attribute A for resources we assume (0, T rue)-annotation for [13] F. Manola, E. Miller, and B. McBride. RDF Primer, W3C Recommen- a triple by default and do not keep it in a memory. As soon as dation. http://www.w3.org/TR/REC-rdf-syntax/, Feb. 2004. we need the real value of the attribute for a resource a we infer [14] S. Muñoz, J. Pérez, and C. Gutiérrez. Minimal Deductive Systems for RDF. In ESWC, pages 53–67, 2007. a triple (a, type, r) and get the first component of annotation [15] U. Straccia. A Minimal Deductive System for General Fuzzy RDF. for it. (Here r is the root of ρdf subgraph of the considered In Proceedings of the 3rd International Conference on Web Reasoning graph; if it does not exist, we can always introduce it.) and Rule Systems (RR-09), number 5837 in Lecture Notes in Computer Science, pages 166–181. Springer-Verlag, 2009. [16] U. Straccia, N. Lopes, G. Lukacsy, and A. Polleres. A General VII. C ONCLUSIONS Framework for Representing and Reasoning with Annotated Semantic Although annotation algebras have been studied for database Web Data. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10). AAAI Press, 2010. query languages [8] and have also recently been investigated [17] Y. R. Wang and S. E. Madnick. A Polygen Model for Heterogeneous for RDF query languages [7], we have suggested an alternative Database Systems: The Source Tagging Perspective. In VLDB, pages and more natural algebra for the annotation of RDFS, and we 519–538, 1990.