=Paper=
{{Paper
|id=None
|storemode=property
|title=Annotation algebras for RDFS Data
|pdfUrl=https://ceur-ws.org/Vol-670/paper_4.pdf
|volume=Vol-670
|dblpUrl=https://dblp.org/rec/conf/semweb/BunemanK10
}}
==Annotation algebras for RDFS Data==
Annotation algebras for RDFS
Peter Buneman Egor V. Kostylev
University of Edinburgh University of Edinburgh
Email: opb@inf.ed.ac.uk Email: ekostyle@inf.ed.ac.uk
Abstract—Provenance and annotation are intimately con- temporal RDF [10] propose methods of adding annotations
nected. On the one hand provenance can be regarded as a to RDF triples to express belief, trust, or temporal properties.
form of annotation, on the other hand, in order to understand Can we simply follow the work for relational databases in
how to convey annotations from source data to derived data, we
need an account of how the data was derived – its provenance. developing a general model for such annotations? Here we
Following a successful line of database research in which elements have to start by looking not at query languages for RDF,
of a database are annotated with algebraic terms that describe but at the inference rules for the ontologies such as those
the provenance of those elements, we develop an algebra of of RDFS [1]. Given annotations on the base triples, what
annotations for RDFS that differs from that developed for should be the annotations on an inferred triple? Indeed in [15],
relational databases. We show how such an annotation algebra
can be used for computing annotations on inferred triples that [16], [12], with an initial goal applying fuzzy logic to RDFS
provide information on belief, trust and temporal aspects of data proposes an algebra similar in many respects to that of [8].
as well as providing a framework for default reasoning. In this paper we propose a somewhat more general – and
possibly simpler – algebra to serve this purpose. Our proposals
I. I NTRODUCTION differ from the semirings in [8] in two ways. First, there are
With the increasing interest in provenance in both databases situations in which we do not want commutativity and second,
and ontologies, there have been a number of proposals and while the number of triples inferred in an RDFS graph is
systems for annotation of the underlying data with information always finite, the derivations can be unbounded. We therefore
concerning time, belief, and various aspects of provenance. need an extra condition to prevent “infinite annotations”. This
The question that has been repeatedly posed in databases [17], condition precludes the possibility of bag semantics, which is
[5], [2] is what happens to these annotations when a query useful for relational algbra but appears to be inapplicable to
is applied? Are they ignored or are they somehow passed ontologies. Also in [8] there is a compact representation – a
through the query? In fact generic prototypes [4] and systems polynomial – of terms in the semiring algebra. In the algebra
that are specific to some domain [6] have been developed we develop, the compact representation is somewhat different.
for propagating annotations through queries. Suppose, for To introduce annotation algebras, we give two simple exam-
example, we take two tables S and T in which the individual ples of annotating an RDF graph shown in Fig. 1(a) (a dashed
tuples have been annotated. How should we annotate the tuples arrow represents a triple, which can be inferred from the rest
in the join of S and T ? An obvious answer is to put on of the graph).
any output tuple the union of the annotations on the two Example 1: A temporal extension of RDF was introduced
contributing tuples. This does not always make sense; for in [10], [9]. It consists of attaching to every RDF triple a set
example if the input tuples are annotated with the set of people intervals that represents the times at which the triple is valid.
who believe that this tuple is true or with the set of database An example of a temporal annotation of the graph is shown in
versions for which the tuple is actually in the database, it Fig. 1(b): Picasso worked as a cubist from 1908 to 1919, paints
might be more appropriate to annotate a join tuple with the are created by painters at least since engravings in Chauvet-
intersection of the relevant annotations. Pont-d’Arc cave from about 29,000BC, and so on. The point
This problem has resulted in a variety of proposals for here is that an annotation for the inferred triple was obtained
propagating annotations through queries. Notably, by using by an intuitive calculation ({1908 − 1919} ∩ {1906 − 1921}) ∪
a semiring model for annotations, in [8] a tuple is annotated ({1937} ∩ {−29, 000 − N ow}) = {1908 − 1919, 1937}.
with a term in a semiring algebra that describes the provenance Example 2: In [15] fuzzy annotations of RDF are used to
of the tuple – how it was formed by the operations of the describe degree of trust. To every triple a real in the range
relational query that constructed it. By suitable instatiatons [0, 1] is attached. Such an annotation is shown in Fig. 1(c). The
operations of the semiring, one can realize various extensions annotation for the inferred triple was calculated as max(0.8 ∗
to relational databases such as probabilistic databases, multi- 0.4, 0.3 ∗ 1) = 0.32.
set semantics and certain kinds of constraint databases. The There is an obvious similarity between these examples: the
semiring model also generalizes a number of other models for calculations are both of the general form (a ⊗ b) ⊕ (c ⊗ d),
provenance [5], [2]. and the calculations for the inferred triple are performed by
Turning to ontologies, proposals have also been suggested suitably instantiating the operators ⊕ and ⊗. In fact [16],
for the annotation of RDF [13]. Named graphs [3] and both of the annotation domains form so-called BL-algebras
sc sc:[1906-1921] sc: 0.4
Cubist Painter Cubist Painter Cubist Painter
type: [1908- type:
type:
type 1919, 1937] dom: type: 0.32
type dom [1908- dom: 1
[-29,000-Now] 0.8
1919]
Paints Paints :[1937] Paints : 0.3
Picasso Guernica Picasso Guernica Picasso Guernica
(a) (b) (c)
Fig. 1. Standard and annotated RDF graphs
[11], which in general preserve the properties of the deductive – ∆P is a set of property names (not necessarily disjoint
system [16]. from ∆R ),
Before describing these constructions in more detail we – ∆C ⊆ ∆R is a set of classes,
should briefly connect this work with the general issue of – P J·K : ∆P → 2∆R ×∆R is a property extension function,
provenance. Our first observation is that an algebraic anno- – CJ·K : ∆C → 2∆R is a class extension function,
tation or a triple of the form we have described is a synopsis – ·I : U → ∆R ∪ ∆P is an interpretation mapping.
of how that triple was derived – which is surely part of The interpretation I is a model for a graph G over ρdf, denoted
its provenance. The second is that exogenous provenance by I |= G, iff the conditions in Tab. I hold2 . A graph G entails
information such as who created a triple or when it was created G0 , denoted G |= G0 , if every model of G is a model of G0 .
can be added following the proposals of [3]. We would also A sound and complete deductive system for entailment [14]
like to compute such annotations for an inferred triple. Our is presented in Tab. II. An instantiation of an inference rule r
proposals provide a method for tranferring the provenance from the system is a replacement of the variables A, B, C, X ,
annotations of explict triples to those that are derived. and Y, occurring in the rule, by references from U. If there
is an instantiation RR0 of r such that R ⊆ G, then the graph
Outline. This work has two aims. The first is to determine
G0 = G ∪ R0 is the result of an application of r to G. A
what algebraic structure is necessary for an annotation domain
graph G0 is inferred from G, denoted by G ` G0 , iff G0 is
to keep the behavior of the deductive system the same as in
obtained from G by successively applying rules in Tab. II.
the standard case. The second is to find “the most general”
This entailment can be checked by computing the closure of
of such structures, which allows annotation of RDF graphs
G, which is the maximal graph which can be inferred from
by elements of the general structure, apply inference rules,
G. It can be done in quadratic time [14].
and then obtain annotations from specific domains on demand.
In the following sections we review RDFS, introduce a new III. A NNOTATED RDFS
annotation algebra and provide some evidence that this is the
appropriate algebra for RDFS. We give a freeness result that Definition 1: Given an algebra K with an elements set K
allows us to represent the terms of this annotation algebra and containing a distinguished element ⊥ a K-annotated RDF
give some examples of the use of the algebra. graph (or simply an a-graph, if K is clear) is a function
G : T → 2K such that for each t ∈ T holds ⊥ ∈ G(t)
II. P RELIMINARIES and the set Supp(G) = {t : v | v ∈ G(t), v 6= ⊥} is finite.
Given a set of RDF URI references U1 , let T be the set of If v ∈ G(t) we write t : v ∈ G and call it an a-triple.
RDF triples of the form (s, p, o) ∈ U × U × U. Here s, p and An a-graph G is schema-acyclic, if the subgraphs {(s, e, o) :
o are called subject, predicate and object correspondingly. An v | (s, e, o) : v ∈ Supp(G)}, e = sc, sp, do not contain
RDF graph (or simply graph) is a finite set of triples G ⊆ T . non-trivial loops. The semantics for a-graphs is given in the
The RDF specification[13] includes RDF Schema following definitions.
(RDFS) [1] which is a vocabulary of reserved words designed Definition 2: A K-annotated interpretation is a tuple I =
to describe relationships between resources and properties. (∆R , ∆P , ∆C , P J·K, CJ·K, ·I ) where ∆R , ∆P , ∆C and ·I are
In this work we use the ρdf = {sp, sc, type, dom, range} the same as for the standard interpretation and P J·K, CJ·K are
fragment of RDFS [14]. The elements of ρdf represent defined as follows:
sub-property, sub-class, domain, and range properties – for each p ∈ ∆P holds P JpK : ∆R × ∆R → K,
correspondingly. It is widely accepted that ρdf is a stable – for each c ∈ ∆C holds CJcK : ∆R → K.
core of RDFS. To define models for annotated RDFS we need more struc-
An interpretation of an RDF graph is a tuple I = ture on the algebra K. Let ⊕ and ⊗ be binary operations on
(∆R , ∆P , ∆C , P J·K, CJ·K, ·I ) where K and ⊥, > be distinct constants in it. For every a, b ∈ K
– ∆R is a nonempty set of resources,
2 The form of conditions in our definition of model is slightly different
1 For the sake of simplicity we do not consider blank nodes and literals. from that in [14], but they are equivalent per se. It is done to simplify the
Their inclusion does not change the results of this work. comparison with notion of model for annotated RDFS given in Sect. III.
(1) Simple interpretation: – if (p, q), (q, r) ∈ P JspI K then (p, r) ∈ P JspI K;
– for each (s, p, o) ∈ G holds pI ∈ ∆P and (sI , oI ) ∈ P JpI K. – if (x, y) ∈ P JpK and (p, q) ∈ P JspI K then (x, y) ∈ P JqK.
(2) Properties and classes: (4) Sub-class:
– for each e ∈ ρdf holds eI ∈ ∆P ; – (c, c) ∈ P JscI K;
– if (x, y) ∈ P JspI K then x, y ∈ ∆P ; – if (c, d), (d, e) ∈ P JscI K then (c, e) ∈ P JscI K;
– if (x, y) ∈ P JscI K then x, y ∈ ∆C ; – if x ∈ CJcK and (c, d) ∈ P JscI K then x ∈ CJdK.
– if (x, y) ∈ P JtypeI K then y ∈ ∆C ;
(5) Typing:
– if (x, y) ∈ P JdomI K then x ∈ ∆C and y ∈ ∆P ;
– (x, c) ∈ P JtypeI K iff x ∈ CJcK;
– if (x, y) ∈ P JrangeI K then x ∈ ∆C and y ∈ ∆P .
– if (x, y) ∈ P JpK and (c, p) ∈ P JdomI K then x ∈ CJcK;
(3) Sub-property: – if (x, y) ∈ P JpK and (c, p) ∈ P JrangeI K then y ∈ CJcK.
– (p, p) ∈ P JspI K;
TABLE I
RDFS SEMANTICS
(1) Sub-property: (2) Sub-class: (3) Typing: (4) Sub-class Reflexivity:
(A, sp, B) (B, sp, C) (A, sc, B) (B, sc, C) (X , A, Y) (A, dom, B) (A, sc, B)
(a) ; (a) ; (a) ; (a) ;
(A, sp, C) (A, sc, C) (X , type, B) (A, sc, A) (B, sc, B)
(X , A, Y) (A, sp, B) (X , type, A) (A, sc, B) (X , A, Y) (A, range, B) (X , e, A)
(b) . (b) . (b) . (b) for e ∈ {dom, range, type}.
(X , B, Y) (X , type, B) (Y, type, B) (A, sc, A)
(5) Sub-property Reflexivity:
(X , A, Y) (A, sp, B) (A, e, X )
(a) ; (b) for e ∈ ρdf ; (c) ; (d) for e ∈ {dom, range}.
(A, sp, A) (e, sp, e) (A, sp, A) (B, sp, B) (A, sp, A)
TABLE II
RDFS DEDUCTIVE SYSTEM
we write a b iff there exists c ∈ K such that a ⊕ c = b. (4) ⊗ is ⊥-annihilating, i.e. for each a holds ⊥ ⊗ a = ⊥ =
The addition operation ⊕ will be used to combine annotations a ⊗ ⊥.
for the same triple and the ⊥ constant represents the fact, that A dioid is >-dioid iff
there is no information about a triple. The product operation (5) ⊕ is >-annihilating, i.e. for each a holds > ⊕ a = >.
⊗ will be used to join annotations when applying inference
rules and > represents the maximal annotation. Note, that >-annihilation entails idempotence from (1).
Definition 3: Let K = hK, ⊕, ⊗, ⊥, >i be an algebra of An instantiation of a rule from the deductive system in
type (2, 2, 0, 0). The K-annotated interpretation I is a model Tab. IV is a replacement of variables A, B, C, X , and Y by
for an a-graph G, denoted I |= G, iff the conditions in Tab. III elements of U, and variables v, v1 , v2 , and v3 by elements of
hold. An a-graph G entails H, denoted G |= H, if for every K, such that all relations for annotations hold. An application
I |= G holds I |= H. of a rule to an a-graph G and a deduction of an a-graph G0
By these definitions, each (non-annotated) RDF graph G from G, denoted by G ` G0 , is defined exactly the same way
can be considered as an a-graph G0 = {t : > | t ∈ G} ∪ E, as for the standard case.
if K = {⊥, >} and E = {t : ⊥ | t ∈ T }. In this case, the Note, that the system in Tab. IV differs from the one in
definition of model for an a-graph coincides with the standard. Tab. II only by the presence of annotations and the generali-
sation rule (∗) which combines annotations for the same triple.
IV. A DEDUCTIVE SYSTEM FOR ANNOTATED RDFS AND Thus, that is natural to expect the new system to behave the
DIOIDS same as the standard one. Particularly, they should coincide if
K = {⊥, >}. To obtain it we need some properties of K.
Definition 4: An algebra K = hK, ⊕, ⊗, ⊥, >i is a dioid
iff it is an idempotent semi-ring, i.e. Definition 5: Let R be the set of all inference rules of the
form T1TT2 from Tab. IV and Ins(r) the set of all instantiations
(1) hK, ⊕, ⊥i is a semilattice, i.e. for each a, b, and c hold: of a rule r ∈ R.
(a ⊕ b) ⊕ c = a ⊕ (b ⊕ c) (associativity), a ⊕ b = b ⊕ a, 0
(commutativity), a ⊕ ⊥ = a (neutral element), a ⊕ a = a (1) A set of rules R 0
⊆ R is associative, iff for every r, r0 ∈
τ τ
(idempotence);
τ τ
R0 , 1τ4 2 , τ 0 4 ∈ Ins(r), and τ4τ5τ3 , τ2τ 0τ3 ∈ Ins(r0 )
1
5 4
(2) hK, ⊗, >i is a monoid, i.e. for each a, b, and c hold: holds τ5 = τ50 .
(a⊗b)⊗c = a⊗(b⊗c) (associativity), a⊗> = a = >⊗a (2) A rule r ∈ R is commutative, iff for every τ1ττ2 ∈ Ins(r)
(neutral element); holds τ2ττ1 ∈ Ins(r).
(3) ⊗ is left and right distributive over ⊕, i.e. for each a, b, (3) A rule r ∈ R is idempotent, if τττ ∈ Ins(r) for every τ .
and c hold: a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c), (b ⊕ c) ⊗ a = (4) A set of rules R0 ⊆ R is left distributive over r ∈ R
τ 0 τ 00
(b ⊗ a) ⊕ (c ⊗ a); if for every r0 ∈ R0 , τ2τ4τ3 , 4τ 0 4 ∈ Ins(r), and
5
(1) Simple interpretation: – P JspI K(p, q) ⊗ P JspI K(q, r) P JspI K(p, r);
– for each (s, p, o) : v ∈ G holds pI ∈ ∆P and v P JpI K(sI , oI ). – P JpK(x, y) ⊗ P JspI K(p, q) P JqK(x, y).
(2) Properties and classes: (4) Sub-class:
– for each e ∈ ρdf holds eI ∈ ∆P ; – P JscI K(c, c) = >,
– P JspI K(x, y) is defined only for x, y ∈ ∆P ; – P JscI K(c, d) ⊗ P JscI K(d, e) P JscI K(c, e);
– P JscI K(x, y) is defined only for x, y ∈ ∆C ; – CJcK(x) ⊗ P JscI K(c, d) CJdK(x).
– P JtypeI K(x, y) is defined only for y ∈ ∆C ;
(5) Typing:
– P JdomI K(x, y) is defined only for x ∈ ∆C and y ∈ ∆P ;
– P JtypeI K(x, c) = CJcK(x);
– P JrangeI K(x, y) is defined only for x ∈ ∆C and y ∈ ∆P .
– P JpK(x, y) ⊗ P JdomI K(c, p) CJcK(x);
(3) Sub-property: – P JpK(x, y) ⊗ P JrangeI K(c, p) CJcK(y).
– P JspI K(p, p) = >,
TABLE III
A NNOTATED RDFS S EMANTICS
(1) Sub-property: (2) Sub-class: (3) Typing:
(A, sp, B) : v1 (B, sp, C) : v2 (A, sc, B) : v1 (B, sc, C) : v2 (X , A, Y) : v1 (A, dom, B) : v2
(a) ; (a) ; (a) ;
(A, sp, C) : v1 ⊗ v2 (A, sc, C) : v1 ⊗ v2 (X , type, B) : v1 ⊗ v2
(X , A, Y) : v1 (A, sp, B) : v2 (X , type, A) : v1 (A, sc, B) : v2 (X , A, Y) : v1 (A, range, B) : v2
(b) . (b) . (b) .
(X , B, Y) : v1 ⊗ v2 (X , type, B) : v1 ⊗ v2 (Y, type, B) : v1 ⊗ v2
(*) Generalisation: (4) Sub-class Reflexivity:
(X , A, Y) : v1 (X , A, Y) : v2 (A, sc, B) : v (X , e, A)
. (a) ; (b) for e ∈ {dom, range, type}.
(X , A, Y) : v1 ⊕ v2 (A, sc, A) : v (B, sc, B) : v (A, sc, A)
(5) Sub-property Reflexivity:
(X , A, Y) : v (A, sp, B) : v (A, e, X ) : v
(a) ; (b) for e ∈ ρdf ; (c) ; (d) for e ∈ {dom, range}.
(A, sp, A) : v (e, sp, e) : > (A, sp, A) : v (B, sp, B) : v (A, sp, A) : v
TABLE IV
A NNOTATED RDFS DEDUCTIVE SYSTEM
τ1 τ4 τ1 τ 2 τ1 τ3 0 0
τ5 , τ40 , τ400 ∈ Ins(r ) holds τ5 = τ5 . As in the standard case, the important notion is the closure
0 cl(G) = {τ | G ` {τ }} of an a-graph G. To compute it we
(5) A set of rules R ⊆ R is right distributive over r ∈
τ 0 τ 00 need a representation of an a-graph G, which is a finite set of
R if for every r0 ∈ R0 , τ1τ4τ2 , 4τ 0 4 ∈ Ins(r), and
5
τ4 τ3 τ1 τ 3 τ2 τ3 0 0 a-triples RG such that RG ∪E = G, E = {t : ⊥ | t ∈ T }. The
τ5 , τ40 , τ400 ∈ Ins(r ) holds τ5 = τ5 .
set Supp(G) by the definition is a representation of G, but we
(6) A rule r ∈ R is v-neutral, v ∈ K, if for every t1 :v t2 :v2
t3 :v3 ∈
t1 :v1 t2 :v also want to have a possibility to work with representations
Ins(r) holds v2 = v3 and for every t3 :v3 ∈ Ins(r)
which contain an finite number of ⊥-annotated triples.
holds v1 = v3 .
Proposition 2: Let G be an a-graph and K a >-dioid. For
(7) A rule r ∈ R is v-annihilating, v ∈ K, if for every
t1 :v t2 :v2 every representation RG of G there exists an representation
t3 :v3 ∈ Ins(r) holds v3 = v and for every t1 :v 1 t2 :v
t3 :v3 ∈
Rcl(G) of cl(G) which can be computed in polynomial time in
Ins(r) holds v3 = v.
the size of RG if the complexities of ⊕ and ⊗ are polynomial
Proposition 1: The set of inference rules R0 =
bounded.
{(1a), (1b), (2a), (2b), (3a), (3b)} in Tab. IV is associative, the
Let K and K0 be two algebras. For any function h : K → K0
set of rules {(8)} is associative, the rule (8) is commutative,
and K-annotated graph G denote h(G) the set of K0 -annotated
idempotent and ⊥-neutral, the set R0 is left and right
triples formed from G by applying h to each annotation.
distributive over the rule (8), and each of the rules from R0
Proposition 3: Let h : K → K0 and K, K0 be >-dioids. For
are >-neutral and ⊥-annihilating iff K is a dioid.
Theorem 1 (Soundness and completeness): Given a dioid every K-annotated graph G the set h(G) is a K0 -annotated
K and a-graphs G and H hold. graph and holds cl(h(G)) = h(cl(G)) iff h is a dioid
homomorphism.
(1) If G ` H then G |= H.
(2) If G is schema-acyclic and G |= H then for every t : v ∈
V. S TRING DIOIDS FOR RDFS ANNOTATION
H there exists v 0 v such that G ` t : v 0 .
(3) If K is >-annihilating and G |= H then for every t : v ∈ Prop. 3 enables us to obtain an a-graph from another one
H there exists v 0 v such that G ` t : v 0 . without recomputing annotations for inferred triples. The next
Hence, the deductive system behaves the same as the step is to develop a “universal” annotation, i.e. an annotation
standard one iff K is a dioid for schema-acyclic a-graphs and from which we can obtain any other one by applying a
a >-dioid in general case. corresponding dioid homomorphism.
Given an alphabet Σ define an subsequence order on the set γ1 ⊗T γ2 = sup{γ | γ γi , i = 1, 2}, ⊥T = ∅ and >T = P.
of words Σ∗ : u ≤ u0 iff for some u1 , . . . , un , w1 , . . . wn−1 ∈ The domain KT forms an BL-algebra and hence a >-dioid.
Σ∗ holds u = u1 u2 . . . un and u0 = u1 w1 u2 w2 . . . wn−1 un . A Fuzzy RDF [15] treats the Ex. 2. The domain here is KF =
finite set m ⊆ Σ∗ is an antichain if for all u ≤ u0 , u, u0 ∈ m, h[0, 1], max, ⊗F , 0, 1i, where ⊗F is any t-norm of BL-algebra.
holds u = u0 . Let min(m) be the set of minimal elements If ⊗F is an ordinary multiplication as in Ex. 2, then the domain
(w.r.t. ≤) of m. On the set of antichains M [Σ] we define: becomes probabilistic. As KT domain, KF is a >-dioid.
Default RDF. In both of the previous domains the product
m1 + m2 = min(m1 ∪ m2 ),
operation in the dioid is commutative. Next we introduce an
m1 × m2 = min({w1 w2 | w1 ∈ m1 , w2 ∈ m2 }),
example of ⊗-noncommutative annotations.
Call M[Σ] = hM [Σ], +, ×, ∅, {}i, where is the empty Suppose we want to represent – in an RDF graph –
string, a string dioid over generators Σ. Note, that a string information about an attribute attached to resources modelled
dioid is a >-dioid. The following proposition says, that it is by the graph. The straightforward way is to introduce a new
“the most general” of all >-dioids. property to the vocabulary of the graph and handle it as usual.
Proposition 4: (1) Given a set of generators Σ the string Nevertheless, in some situations this way can be not optimal:
dioid M[Σ] is the free >-dioid on Σ, i.e. for any >-dioid If we have a broad system of classes and values of the attribute
Kf = hK, ⊕, ⊗, ⊥, >i and a valuation φ : Σ → K there for subclasses and elements of a class are usually the same,
exists a unique homomorphism Evalφ : M [Σ] → K such that then it is natural to keep the default attribute value for the
for each a ∈ Σ holds Evalφ (a) = φ(a). class and the value for a resource only if it differs from usual
(2) The operations of the string dioid + and × can be one of the class it belongs to.
computed in polynomial time. For a substantiation, consider an RDF graph denoted in
Hence, string dioids consititute an important and general Fig. 2 representing a (part of) botanical taxonomy3 . The
subclass of >-dioids. We now show how they can be applied triples of this graph have annotations which store values
to annotate RDF graphs. Let G be a K-annotated graph and of an attribute PublishedBy for every taxon in the graph.
X a set of triple ids of triples from Supp(G). We associate to The division Pinophyta was introduced by Carl Linnaeus,
G an “abstract” version which is a (X ∪ ∅)-annotated graph Ḡ so the triple (P lantae, sc, P inophyta) is annotated by
consisted of the same triples as G, but the annotation of each Linnaeus. The subsequent taxons down to family Arau-
of them is its id if it has one, and ∅ otherwise. cariaceae keep this attribute value, so we annotate all sc-
Theorem 2: For every K-annotated graph G holds cl(G) = triples between Pinophyta and Araucariaceae with a special
Evalφ ◦ cl(Ḡ), where φ : X → K is a valuation which value ? which means “derived from above”. The same value
associates the annotation of an a-triple in G to its id. keeps for genus Araucaria and species Araucaria Araucana.
This theorem gives rise to the following strategy for an- But genus Wollemia was introduced by David Noble as
notating RDF graphs. Given a graph G we are to annotate well as its only species Wollemia Nobilis, so the anno-
it with elements from several domains. However, we would tations for the triples (Araucariaceae, sc, W ollemia) and
like to infer triples and their annotations only once, without (W ollemia, sc, W.N obilis) are N oble and ? correspond-
recomputing the annotations for each annotation domain. In ingly. Thus, to find out from the graph who introduced a taxon
this case we construct an “abstract” version Ḡ of G by we need to go up from its node along the tree until we find an
annotating triples with their ids. Then we can apply inference edge with an annotation differs from ?. An annotation value
rules and obtain an abstract annotation from the string dioid for a triple here represents not the value of the attribute by
over the set of ids for each triple. Finally, as soon as we need itself, but its difference with the value of higher triple in the
to get annotations from a specific domain we need to make tree. Hence, it is natural to phisically keep such annotations in
sure that it is a >-dioid, define a homomorphism by attaching a memory only it they differ from ?. That is why if the value
specific annotations to triples in G and then apply it to abstract of an attribute for a resource in most cases is determined by a
annotations of previously inferred triples. class it belongs to and the number of attributes are large, the
advantage in memory can be sufficient.
VI. A PPLICATIONS TO SPECIFIC MODELS
Before we define the default domain formally, note, that the
In this section we introduce several annotation domains for situation in general case can be more complicated than in the
RDF graphs those are >-dioids. latter example, because a ρdf subgraph of a graph does not
Temporal RDF [16]. This model treats the Ex. 1. The nesessarily have a tree structure. Therefore, in certain cases we
temporal domain KT = hKT , ⊕T , ⊗T , ⊥T , >T i is defined as need to combine different annotations for a triple. To this effect
follows. Consider temporal intervals [α1 , α2 ] where α1,2 ∈ we require a set of attirbute values to have a union operation
P = Z ∪ {−∞, +∞}, α1 < α2 . Two intervals [α1 , α2 ] and the lowest element, i.e. to be a semilattice. In many cases
and [α3 , α4 ] are adjacent iff α2 + 1 = α3 . The set KT (as in the taxonomy example) this set has no structure, but it is
is the set of all pairwise disjoint and non-adjacent sets of possible to enrich it with the lowest element Unknown and the
intervals. On KT a partial order is defined: γ1 γ2 iff greatest element Error. The latter is justified by situations
for each I1 ∈ γ1 there exists I2 ∈ γ2 such that I1 ⊆ I2 .
For KT we have: γ1 ⊕T γ2 = inf{γ | γ γi , i = 1, 2}, 3 This is just an example and the actual information may be incorrect.
. . . sc: ? Plantae sc: Linnaeus Pinophyta sc: ? . . . sc: ? Araucariaceae sc: ? Araucaria sc: ? A. araucana
sc: N oble Wollemia W. nobilis
sc: ?
Fig. 2. The RDF graph representing botanical taxonomy
when from one source we get an attribute value for a resource, have given some examples of its use. There may be some
but from another source we get a different value for the same mileage to be gained by combining these two algebras. One of
resource which contradicts with the first one. The union of this the applications of the proposed algebra is a system for default
information is inconsistent and requires a manual intervention, reasoning about certain annotations on RDF resources, which
which can be flagged by Error annotation. may also prove to be a useful mechanism for physically storing
Let A be an attribute with a domain hA, t, 0i which is a those annotations. We would like to find similar mechanisms
semilattice with union t and the lowest element 0. Consider an for efficiently storing default annotations on triples.
A-default domain KA = hA? ∪ {⊥? , >? }, ⊕A , ⊗A , ⊥? , >? i, We are grateful to Marcelo Arenas, Boris Motik, Floris
where A? = (A×{T rue, F alse}) and ⊥? , >? are new special Geerts and Alex Simpson for helpful discussions. This work
symbols. Note, that the meanings of ⊥? and an annotation was supported by a grant from the UK Engineering and
with 0 are different: The first one says that a triple is not in Physical Sciences Research Council.
the support of a graph and the second says that the value of R EFERENCES
the attribute is minimal according to t. The second boolean
[1] D. Brickley and R. V. Guha. RDF Vocabulary Description Language
component of A? corresponds to ? in the taxonomy example 1.0: RDF Schema. W3C Recommendation. http://www.w3.org/TR/
above, i.e. it is T rue in an annotation for a triple iff the actual rdf-schema/, Feb. 2004.
value of the attribute is the union of the first component of the [2] P. Buneman, S. Khanna, and W. Tan. Why and Where: A Characteriza-
tion of Data Provenance. In ICDT 2001, volume 1973 of LNCS, pages
annotation with values those can be derived from triples above, 316–330. Springer, 2001.
and F alse overwise. The operations are defined as follows: [3] J. J. Carroll, C. Bizer, P. J. Hayes, and P. Stickler. Named graphs,
provenance and trust. In WWW, pages 613–622, 2005.
(a1 , b1 ) ⊕A (a2 , b2 ) =
(a1 t a2 , b1 ∨ b2 ), [4] L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. DBNotes: a post-it
(a1 t a2 , b2 ) iff b1 = T rue, system for relational databases based on provenance. In SIGMOD ’05:
(a1 , b1 ) ⊗A (a2 , b2 ) = Proceedings of the 2005 ACM SIGMOD international conference on
(a1 , b1 ) iff b1 = F alse. Management of data, pages 942–944, New York, NY, USA, 2005. ACM.
[5] Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a
These operations extend to elements ⊥? and >? in a way to warehousing environment. ACM Trans. Database Syst., 25(2):179–227,
keep the properties of ⊥ and > in the dioid. Next, we describe 2000.
the meaning of this operations. The addition ⊕A is applied [6] R. D. Dowell, R. M. Jokerst, A. Day, S. R. Eddy, and L. Stein. The
Distributed Annotation System. BMC Bioinformatics, 2:7, 2001.
when we union annotations about the same triple. Hence, the [7] G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, and
values a1 and a2 are joined by the semilattice operation t V. Christophides. Coloring RDF Triples to Capture Provenance.
and the possibility to derive values from a ρdf structure above In International Semantic Web Conference, pages 196–212, 2009.
[8] T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings.
exists only if it exists in any of the considering annotations. In PODS ’07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-
The product ⊗A is applied when we infer triples by transitivity SIGART symposium on Principles of database systems, pages 31–40,
of sc or similar inference rule from Tab. IV. If b1 = T rue we New York, NY, USA, 2007. ACM.
[9] C. Gutierrez, C. A. Hurtado, and A. Vaisman. Introducing Time into
union the current value a1 with the derived value a2 and the RDF. IEEE Trans. on Knowl. and Data Eng., 19(2):207–218, 2007.
possibility of father deriving depends on b2 . If b1 = F alse, we [10] C. Gutiérrez, C. A. Hurtado, and A. A. Vaisman. Temporal RDF. In
just keep the annotation (a1 , b1 ) which override any annotation ESWC, pages 93–107, 2005.
[11] P. Hájek. Metamathematics of Fuzzy Logic (Trends in Logic). Springer,
from a structure above. Finally, KA can be easily checked to 1 edition, November 2001.
be ⊗-noncommutative >-dioid. [12] A. Hogan, G. Lukacsy, N. Lopes, A. Polleres, U. Straccia, A. Zimmer-
To use the A-default dioid KA for storing values of an mann, and S. Decker. RDF Needs Annotations. In Proceedings of W3C
Workshop — RDF Next Steps. W3C, 2010.
attribute A for resources we assume (0, T rue)-annotation for [13] F. Manola, E. Miller, and B. McBride. RDF Primer, W3C Recommen-
a triple by default and do not keep it in a memory. As soon as dation. http://www.w3.org/TR/REC-rdf-syntax/, Feb. 2004.
we need the real value of the attribute for a resource a we infer [14] S. Muñoz, J. Pérez, and C. Gutiérrez. Minimal Deductive Systems for
RDF. In ESWC, pages 53–67, 2007.
a triple (a, type, r) and get the first component of annotation [15] U. Straccia. A Minimal Deductive System for General Fuzzy RDF.
for it. (Here r is the root of ρdf subgraph of the considered In Proceedings of the 3rd International Conference on Web Reasoning
graph; if it does not exist, we can always introduce it.) and Rule Systems (RR-09), number 5837 in Lecture Notes in Computer
Science, pages 166–181. Springer-Verlag, 2009.
[16] U. Straccia, N. Lopes, G. Lukacsy, and A. Polleres. A General
VII. C ONCLUSIONS Framework for Representing and Reasoning with Annotated Semantic
Although annotation algebras have been studied for database Web Data. In Proceedings of the Twenty-Fourth AAAI Conference on
Artificial Intelligence (AAAI-10). AAAI Press, 2010.
query languages [8] and have also recently been investigated [17] Y. R. Wang and S. E. Madnick. A Polygen Model for Heterogeneous
for RDF query languages [7], we have suggested an alternative Database Systems: The Source Tagging Perspective. In VLDB, pages
and more natural algebra for the annotation of RDFS, and we 519–538, 1990.