=Paper=
{{Paper
|id=None
|storemode=property
|title=Annotation algebras for RDFS Data
|pdfUrl=https://ceur-ws.org/Vol-670/paper_4.pdf
|volume=Vol-670
|dblpUrl=https://dblp.org/rec/conf/semweb/BunemanK10
}}
==Annotation algebras for RDFS Data==
<pdf width="1500px">https://ceur-ws.org/Vol-670/paper_4.pdf</pdf>
<pre>
                            Annotation algebras for RDFS
                              Peter Buneman                                        Egor V. Kostylev
                         University of Edinburgh                               University of Edinburgh
                         Email: opb@inf.ed.ac.uk                             Email: ekostyle@inf.ed.ac.uk


   Abstract—Provenance and annotation are intimately con-            temporal RDF [10] propose methods of adding annotations
nected. On the one hand provenance can be regarded as a              to RDF triples to express belief, trust, or temporal properties.
form of annotation, on the other hand, in order to understand        Can we simply follow the work for relational databases in
how to convey annotations from source data to derived data, we
need an account of how the data was derived – its provenance.        developing a general model for such annotations? Here we
Following a successful line of database research in which elements   have to start by looking not at query languages for RDF,
of a database are annotated with algebraic terms that describe       but at the inference rules for the ontologies such as those
the provenance of those elements, we develop an algebra of           of RDFS [1]. Given annotations on the base triples, what
annotations for RDFS that differs from that developed for            should be the annotations on an inferred triple? Indeed in [15],
relational databases. We show how such an annotation algebra
can be used for computing annotations on inferred triples that       [16], [12], with an initial goal applying fuzzy logic to RDFS
provide information on belief, trust and temporal aspects of data    proposes an algebra similar in many respects to that of [8].
as well as providing a framework for default reasoning.              In this paper we propose a somewhat more general – and
                                                                     possibly simpler – algebra to serve this purpose. Our proposals
                      I. I NTRODUCTION                               differ from the semirings in [8] in two ways. First, there are
   With the increasing interest in provenance in both databases      situations in which we do not want commutativity and second,
and ontologies, there have been a number of proposals and            while the number of triples inferred in an RDFS graph is
systems for annotation of the underlying data with information       always finite, the derivations can be unbounded. We therefore
concerning time, belief, and various aspects of provenance.          need an extra condition to prevent “infinite annotations”. This
The question that has been repeatedly posed in databases [17],       condition precludes the possibility of bag semantics, which is
[5], [2] is what happens to these annotations when a query           useful for relational algbra but appears to be inapplicable to
is applied? Are they ignored or are they somehow passed              ontologies. Also in [8] there is a compact representation – a
through the query? In fact generic prototypes [4] and systems        polynomial – of terms in the semiring algebra. In the algebra
that are specific to some domain [6] have been developed             we develop, the compact representation is somewhat different.
for propagating annotations through queries. Suppose, for               To introduce annotation algebras, we give two simple exam-
example, we take two tables S and T in which the individual          ples of annotating an RDF graph shown in Fig. 1(a) (a dashed
tuples have been annotated. How should we annotate the tuples        arrow represents a triple, which can be inferred from the rest
in the join of S and T ? An obvious answer is to put on              of the graph).
any output tuple the union of the annotations on the two                Example 1: A temporal extension of RDF was introduced
contributing tuples. This does not always make sense; for            in [10], [9]. It consists of attaching to every RDF triple a set
example if the input tuples are annotated with the set of people     intervals that represents the times at which the triple is valid.
who believe that this tuple is true or with the set of database      An example of a temporal annotation of the graph is shown in
versions for which the tuple is actually in the database, it         Fig. 1(b): Picasso worked as a cubist from 1908 to 1919, paints
might be more appropriate to annotate a join tuple with the          are created by painters at least since engravings in Chauvet-
intersection of the relevant annotations.                            Pont-d’Arc cave from about 29,000BC, and so on. The point
   This problem has resulted in a variety of proposals for           here is that an annotation for the inferred triple was obtained
propagating annotations through queries. Notably, by using           by an intuitive calculation ({1908 − 1919} ∩ {1906 − 1921}) ∪
a semiring model for annotations, in [8] a tuple is annotated        ({1937} ∩ {−29, 000 − N ow}) = {1908 − 1919, 1937}.
with a term in a semiring algebra that describes the provenance         Example 2: In [15] fuzzy annotations of RDF are used to
of the tuple – how it was formed by the operations of the            describe degree of trust. To every triple a real in the range
relational query that constructed it. By suitable instatiatons       [0, 1] is attached. Such an annotation is shown in Fig. 1(c). The
operations of the semiring, one can realize various extensions       annotation for the inferred triple was calculated as max(0.8 ∗
to relational databases such as probabilistic databases, multi-      0.4, 0.3 ∗ 1) = 0.32.
set semantics and certain kinds of constraint databases. The            There is an obvious similarity between these examples: the
semiring model also generalizes a number of other models for         calculations are both of the general form (a ⊗ b) ⊕ (c ⊗ d),
provenance [5], [2].                                                 and the calculations for the inferred triple are performed by
   Turning to ontologies, proposals have also been suggested         suitably instantiating the operators ⊕ and ⊗. In fact [16],
for the annotation of RDF [13]. Named graphs [3] and                 both of the annotation domains form so-called BL-algebras
                 sc                                            sc:[1906-1921]                                                  sc: 0.4
    Cubist                   Painter                  Cubist                  Painter                            Cubist                       Painter
                                                          type: [1908-                                                    type:
                                                   type:
            type                                          1919, 1937]          dom:                         type:         0.32
type                         dom                   [1908-                                                                                 dom: 1
                                                                                 [-29,000-Now]              0.8
                                                   1919]
                   Paints                                           Paints :[1937]                                           Paints : 0.3
   Picasso                       Guernica              Picasso                         Guernica                 Picasso                        Guernica
                       (a)                                                 (b)                                                      (c)
                                                        Fig. 1.   Standard and annotated RDF graphs


[11], which in general preserve the properties of the deductive                     – ∆P is a set of property names (not necessarily disjoint
system [16].                                                                           from ∆R ),
   Before describing these constructions in more detail we                          – ∆C ⊆ ∆R is a set of classes,
should briefly connect this work with the general issue of                          – P J·K : ∆P → 2∆R ×∆R is a property extension function,
provenance. Our first observation is that an algebraic anno-                        – CJ·K : ∆C → 2∆R is a class extension function,
tation or a triple of the form we have described is a synopsis                      – ·I : U → ∆R ∪ ∆P is an interpretation mapping.
of how that triple was derived – which is surely part of                         The interpretation I is a model for a graph G over ρdf, denoted
its provenance. The second is that exogenous provenance                          by I |= G, iff the conditions in Tab. I hold2 . A graph G entails
information such as who created a triple or when it was created                  G0 , denoted G |= G0 , if every model of G is a model of G0 .
can be added following the proposals of [3]. We would also                          A sound and complete deductive system for entailment [14]
like to compute such annotations for an inferred triple. Our                     is presented in Tab. II. An instantiation of an inference rule r
proposals provide a method for tranferring the provenance                        from the system is a replacement of the variables A, B, C, X ,
annotations of explict triples to those that are derived.                        and Y, occurring in the rule, by references from U. If there
                                                                                 is an instantiation RR0 of r such that R ⊆ G, then the graph
   Outline. This work has two aims. The first is to determine
                                                                                 G0 = G ∪ R0 is the result of an application of r to G. A
what algebraic structure is necessary for an annotation domain
                                                                                 graph G0 is inferred from G, denoted by G ` G0 , iff G0 is
to keep the behavior of the deductive system the same as in
                                                                                 obtained from G by successively applying rules in Tab. II.
the standard case. The second is to find “the most general”
                                                                                 This entailment can be checked by computing the closure of
of such structures, which allows annotation of RDF graphs
                                                                                 G, which is the maximal graph which can be inferred from
by elements of the general structure, apply inference rules,
                                                                                 G. It can be done in quadratic time [14].
and then obtain annotations from specific domains on demand.
In the following sections we review RDFS, introduce a new                                              III. A NNOTATED RDFS
annotation algebra and provide some evidence that this is the
appropriate algebra for RDFS. We give a freeness result that                        Definition 1: Given an algebra K with an elements set K
allows us to represent the terms of this annotation algebra and                  containing a distinguished element ⊥ a K-annotated RDF
give some examples of the use of the algebra.                                    graph (or simply an a-graph, if K is clear) is a function
                                                                                 G : T → 2K such that for each t ∈ T holds ⊥ ∈ G(t)
                         II. P RELIMINARIES                                      and the set Supp(G) = {t : v | v ∈ G(t), v 6= ⊥} is finite.
   Given a set of RDF URI references U1 , let T be the set of                       If v ∈ G(t) we write t : v ∈ G and call it an a-triple.
RDF triples of the form (s, p, o) ∈ U × U × U. Here s, p and                     An a-graph G is schema-acyclic, if the subgraphs {(s, e, o) :
o are called subject, predicate and object correspondingly. An                   v | (s, e, o) : v ∈ Supp(G)}, e = sc, sp, do not contain
RDF graph (or simply graph) is a finite set of triples G ⊆ T .                   non-trivial loops. The semantics for a-graphs is given in the
   The RDF specification[13] includes RDF Schema                                 following definitions.
(RDFS) [1] which is a vocabulary of reserved words designed                         Definition 2: A K-annotated interpretation is a tuple I =
to describe relationships between resources and properties.                      (∆R , ∆P , ∆C , P J·K, CJ·K, ·I ) where ∆R , ∆P , ∆C and ·I are
In this work we use the ρdf = {sp, sc, type, dom, range}                         the same as for the standard interpretation and P J·K, CJ·K are
fragment of RDFS [14]. The elements of ρdf represent                             defined as follows:
sub-property, sub-class, domain, and range properties                              – for each p ∈ ∆P holds P JpK : ∆R × ∆R → K,
correspondingly. It is widely accepted that ρdf is a stable                        – for each c ∈ ∆C holds CJcK : ∆R → K.
core of RDFS.                                                                       To define models for annotated RDFS we need more struc-
   An interpretation of an RDF graph is a tuple I =                              ture on the algebra K. Let ⊕ and ⊗ be binary operations on
(∆R , ∆P , ∆C , P J·K, CJ·K, ·I ) where                                          K and ⊥, > be distinct constants in it. For every a, b ∈ K
   – ∆R is a nonempty set of resources,
                                                                                    2 The form of conditions in our definition of model is slightly different
  1 For the sake of simplicity we do not consider blank nodes and literals.      from that in [14], but they are equivalent per se. It is done to simplify the
Their inclusion does not change the results of this work.                        comparison with notion of model for annotated RDFS given in Sect. III.
  (1) Simple interpretation:                                                   –   if (p, q), (q, r) ∈ P JspI K then (p, r) ∈ P JspI K;
    – for each (s, p, o) ∈ G holds pI ∈ ∆P and (sI , oI ) ∈ P JpI K.           –   if (x, y) ∈ P JpK and (p, q) ∈ P JspI K then (x, y) ∈ P JqK.
  (2)    Properties and classes:                                             (4)   Sub-class:
    –    for each e ∈ ρdf holds eI ∈ ∆P ;                                      –   (c, c) ∈ P JscI K;
    –    if (x, y) ∈ P JspI K then x, y ∈ ∆P ;                                 –   if (c, d), (d, e) ∈ P JscI K then (c, e) ∈ P JscI K;
    –    if (x, y) ∈ P JscI K then x, y ∈ ∆C ;                                 –   if x ∈ CJcK and (c, d) ∈ P JscI K then x ∈ CJdK.
    –    if (x, y) ∈ P JtypeI K then y ∈ ∆C ;
                                                                             (5)   Typing:
    –    if (x, y) ∈ P JdomI K then x ∈ ∆C and y ∈ ∆P ;
                                                                               –   (x, c) ∈ P JtypeI K iff x ∈ CJcK;
    –    if (x, y) ∈ P JrangeI K then x ∈ ∆C and y ∈ ∆P .
                                                                               –   if (x, y) ∈ P JpK and (c, p) ∈ P JdomI K then x ∈ CJcK;
  (3)    Sub-property:                                                         –   if (x, y) ∈ P JpK and (c, p) ∈ P JrangeI K then y ∈ CJcK.
    –    (p, p) ∈ P JspI K;

                                                                         TABLE I
                                                                      RDFS SEMANTICS

      (1) Sub-property:                  (2) Sub-class:                       (3) Typing:                          (4) Sub-class Reflexivity:
     (A, sp, B) (B, sp, C)          (A, sc, B) (B, sc, C)               (X , A, Y) (A, dom, B)                    (A, sc, B)
 (a)                       ;    (a)                       ;         (a)                        ;       (a)                         ;
           (A, sp, C)                     (A, sc, C)                         (X , type, B)                 (A, sc, A) (B, sc, B)
     (X , A, Y) (A, sp, B)          (X , type, A) (A, sc, B)            (X , A, Y) (A, range, B)            (X , e, A)
 (b)                        .   (b)                          .      (b)                          .     (b)               for e ∈ {dom, range, type}.
           (X , B, Y)                      (X , type, B)                       (Y, type, B)                (A, sc, A)

                                                              (5) Sub-property Reflexivity:
        (X , A, Y)                                                            (A, sp, B)                      (A, e, X )
 (a)               ;            (b)                for e ∈ ρdf ;    (c)                       ;        (d)                 for e ∈ {dom, range}.
        (A, sp, A)                    (e, sp, e)                        (A, sp, A) (B, sp, B)                (A, sp, A)

                                                                         TABLE II
                                                                   RDFS DEDUCTIVE SYSTEM


we write a  b iff there exists c ∈ K such that a ⊕ c = b.                          (4) ⊗ is ⊥-annihilating, i.e. for each a holds ⊥ ⊗ a = ⊥ =
The addition operation ⊕ will be used to combine annotations                            a ⊗ ⊥.
for the same triple and the ⊥ constant represents the fact, that                   A dioid is >-dioid iff
there is no information about a triple. The product operation                       (5) ⊕ is >-annihilating, i.e. for each a holds > ⊕ a = >.
⊗ will be used to join annotations when applying inference
rules and > represents the maximal annotation.                                        Note, that >-annihilation entails idempotence from (1).
   Definition 3: Let K = hK, ⊕, ⊗, ⊥, >i be an algebra of                             An instantiation of a rule from the deductive system in
type (2, 2, 0, 0). The K-annotated interpretation I is a model                     Tab. IV is a replacement of variables A, B, C, X , and Y by
for an a-graph G, denoted I |= G, iff the conditions in Tab. III                   elements of U, and variables v, v1 , v2 , and v3 by elements of
hold. An a-graph G entails H, denoted G |= H, if for every                         K, such that all relations for annotations hold. An application
I |= G holds I |= H.                                                               of a rule to an a-graph G and a deduction of an a-graph G0
   By these definitions, each (non-annotated) RDF graph G                          from G, denoted by G ` G0 , is defined exactly the same way
can be considered as an a-graph G0 = {t : > | t ∈ G} ∪ E,                          as for the standard case.
if K = {⊥, >} and E = {t : ⊥ | t ∈ T }. In this case, the                             Note, that the system in Tab. IV differs from the one in
definition of model for an a-graph coincides with the standard.                    Tab. II only by the presence of annotations and the generali-
                                                                                   sation rule (∗) which combines annotations for the same triple.
 IV. A DEDUCTIVE SYSTEM FOR ANNOTATED RDFS AND                                     Thus, that is natural to expect the new system to behave the
                                 DIOIDS                                            same as the standard one. Particularly, they should coincide if
                                                                                   K = {⊥, >}. To obtain it we need some properties of K.
   Definition 4: An algebra K = hK, ⊕, ⊗, ⊥, >i is a dioid
iff it is an idempotent semi-ring, i.e.                                               Definition 5: Let R be the set of all inference rules of the
                                                                                   form T1TT2 from Tab. IV and Ins(r) the set of all instantiations
 (1) hK, ⊕, ⊥i is a semilattice, i.e. for each a, b, and c hold:                   of a rule r ∈ R.
       (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c) (associativity), a ⊕ b = b ⊕ a,                                            0
       (commutativity), a ⊕ ⊥ = a (neutral element), a ⊕ a = a                      (1) A set of rules R  0
                                                                                                              ⊆ R is associative, iff for every r, r0 ∈
                                                                                                    τ   τ
       (idempotence);
                                                                                              τ τ
                                                                                        R0 , 1τ4 2 , τ 0 4 ∈ Ins(r), and τ4τ5τ3 , τ2τ 0τ3 ∈ Ins(r0 )
                                                                                                      1
                                                                                                        5                              4

 (2) hK, ⊗, >i is a monoid, i.e. for each a, b, and c hold:                             holds τ5 = τ50 .
       (a⊗b)⊗c = a⊗(b⊗c) (associativity), a⊗> = a = >⊗a                             (2) A rule r ∈ R is commutative, iff for every τ1ττ2 ∈ Ins(r)
       (neutral element);                                                               holds τ2ττ1 ∈ Ins(r).
 (3) ⊗ is left and right distributive over ⊕, i.e. for each a, b,                   (3) A rule r ∈ R is idempotent, if τττ ∈ Ins(r) for every τ .
       and c hold: a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c), (b ⊕ c) ⊗ a =                   (4) A set of rules R0 ⊆ R is left distributive        over r ∈ R
                                                                                                                             τ 0 τ 00
       (b ⊗ a) ⊕ (c ⊗ a);                                                               if for every r0 ∈ R0 , τ2τ4τ3 , 4τ 0 4 ∈ Ins(r), and
                                                                                                                                     5
   (1) Simple interpretation:                                                               –   P JspI K(p, q) ⊗ P JspI K(q, r)  P JspI K(p, r);
     – for each (s, p, o) : v ∈ G holds pI ∈ ∆P and v  P JpI K(sI , oI ).                  –   P JpK(x, y) ⊗ P JspI K(p, q)  P JqK(x, y).
   (2)    Properties and classes:                                                         (4)   Sub-class:
     –    for each e ∈ ρdf holds eI ∈ ∆P ;                                                  –   P JscI K(c, c) = >,
     –    P JspI K(x, y) is defined only for x, y ∈ ∆P ;                                    –   P JscI K(c, d) ⊗ P JscI K(d, e)  P JscI K(c, e);
     –    P JscI K(x, y) is defined only for x, y ∈ ∆C ;                                    –   CJcK(x) ⊗ P JscI K(c, d)  CJdK(x).
     –    P JtypeI K(x, y) is defined only for y ∈ ∆C ;
                                                                                          (5)   Typing:
     –    P JdomI K(x, y) is defined only for x ∈ ∆C and y ∈ ∆P ;
                                                                                            –   P JtypeI K(x, c) = CJcK(x);
     –    P JrangeI K(x, y) is defined only for x ∈ ∆C and y ∈ ∆P .
                                                                                            –   P JpK(x, y) ⊗ P JdomI K(c, p)  CJcK(x);
   (3)    Sub-property:                                                                     –   P JpK(x, y) ⊗ P JrangeI K(c, p)  CJcK(y).
     –    P JspI K(p, p) = >,


                                                                               TABLE III
                                                                      A NNOTATED RDFS S EMANTICS

          (1) Sub-property:                                             (2) Sub-class:                                             (3) Typing:
     (A, sp, B) : v1 (B, sp, C) : v2                         (A, sc, B) : v1 (B, sc, C) : v2                        (X , A, Y) : v1 (A, dom, B) : v2
 (a)                                 ;                   (a)                                 ;                  (a)                                  ;
          (A, sp, C) : v1 ⊗ v2                                    (A, sc, C) : v1 ⊗ v2                                   (X , type, B) : v1 ⊗ v2
     (X , A, Y) : v1 (A, sp, B) : v2                         (X , type, A) : v1 (A, sc, B) : v2                     (X , A, Y) : v1 (A, range, B) : v2
 (b)                                  .                  (b)                                    .               (b)                                    .
          (X , B, Y) : v1 ⊗ v2                                     (X , type, B) : v1 ⊗ v2                                (Y, type, B) : v1 ⊗ v2

          (*) Generalisation:                                                                     (4) Sub-class Reflexivity:
  (X , A, Y) : v1 (X , A, Y) : v2                                   (A, sc, B) : v                                    (X , e, A)
                                  .                      (a)                               ;                    (b)                     for e ∈ {dom, range, type}.
       (X , A, Y) : v1 ⊕ v2                                  (A, sc, A) : v (B, sc, B) : v                           (A, sc, A)

                                                                   (5) Sub-property Reflexivity:
         (X , A, Y) : v                                                           (A, sp, B) : v                       (A, e, X ) : v
 (a)                    ;   (b)                    for e ∈ ρdf ;       (c)                               ;    (d)                        for e ∈ {dom, range}.
         (A, sp, A) : v           (e, sp, e) : >                           (A, sp, A) : v (B, sp, B) : v              (A, sp, A) : v

                                                                           TABLE IV
                                                               A NNOTATED RDFS DEDUCTIVE SYSTEM


         τ1 τ4 τ1 τ 2 τ1 τ3      0              0
          τ5 , τ40 , τ400 ∈ Ins(r ) holds τ5 = τ5 .                                         As in the standard case, the important notion is the closure
                            0                                                            cl(G) = {τ | G ` {τ }} of an a-graph G. To compute it we
 (5) A set of rules R ⊆ R is right distributive               over r ∈
                                           τ 0 τ 00                                      need a representation of an a-graph G, which is a finite set of
      R if for every r0 ∈ R0 , τ1τ4τ2 , 4τ 0 4 ∈ Ins(r), and
                                               5
      τ4 τ3 τ1 τ 3 τ2 τ3           0                  0                                  a-triples RG such that RG ∪E = G, E = {t : ⊥ | t ∈ T }. The
        τ5 , τ40 , τ400 ∈ Ins(r ) holds τ5 = τ5 .
                                                                                         set Supp(G) by the definition is a representation of G, but we
 (6) A rule r ∈ R is v-neutral, v ∈ K, if for every t1 :v         t2 :v2
                                                               t3 :v3    ∈
                                                 t1 :v1 t2 :v                            also want to have a possibility to work with representations
      Ins(r) holds v2 = v3 and for every t3 :v3 ∈ Ins(r)
                                                                                         which contain an finite number of ⊥-annotated triples.
      holds v1 = v3 .
                                                                                            Proposition 2: Let G be an a-graph and K a >-dioid. For
 (7) A rule r ∈ R is v-annihilating, v ∈ K, if for every
      t1 :v t2 :v2                                                                       every representation RG of G there exists an representation
         t3 :v3    ∈ Ins(r) holds v3 = v and for every t1 :v     1 t2 :v
                                                               t3 :v3    ∈
                                                                                         Rcl(G) of cl(G) which can be computed in polynomial time in
      Ins(r) holds v3 = v.
                                                                                         the size of RG if the complexities of ⊕ and ⊗ are polynomial
   Proposition 1: The set of inference rules R0                          =
                                                                                         bounded.
{(1a), (1b), (2a), (2b), (3a), (3b)} in Tab. IV is associative, the
                                                                                            Let K and K0 be two algebras. For any function h : K → K0
set of rules {(8)} is associative, the rule (8) is commutative,
                                                                                         and K-annotated graph G denote h(G) the set of K0 -annotated
idempotent and ⊥-neutral, the set R0 is left and right
                                                                                         triples formed from G by applying h to each annotation.
distributive over the rule (8), and each of the rules from R0
                                                                                            Proposition 3: Let h : K → K0 and K, K0 be >-dioids. For
are >-neutral and ⊥-annihilating iff K is a dioid.
   Theorem 1 (Soundness and completeness): Given a dioid                                 every K-annotated graph G the set h(G) is a K0 -annotated
K and a-graphs G and H hold.                                                             graph and holds cl(h(G)) = h(cl(G)) iff h is a dioid
                                                                                         homomorphism.
 (1) If G ` H then G |= H.
 (2) If G is schema-acyclic and G |= H then for every t : v ∈
                                                                                                    V. S TRING DIOIDS FOR RDFS ANNOTATION
      H there exists v 0  v such that G ` t : v 0 .
 (3) If K is >-annihilating and G |= H then for every t : v ∈                               Prop. 3 enables us to obtain an a-graph from another one
      H there exists v 0  v such that G ` t : v 0 .                                     without recomputing annotations for inferred triples. The next
   Hence, the deductive system behaves the same as the                                   step is to develop a “universal” annotation, i.e. an annotation
standard one iff K is a dioid for schema-acyclic a-graphs and                            from which we can obtain any other one by applying a
a >-dioid in general case.                                                               corresponding dioid homomorphism.
   Given an alphabet Σ define an subsequence order on the set           γ1 ⊗T γ2 = sup{γ | γ  γi , i = 1, 2}, ⊥T = ∅ and >T = P.
of words Σ∗ : u ≤ u0 iff for some u1 , . . . , un , w1 , . . . wn−1 ∈   The domain KT forms an BL-algebra and hence a >-dioid.
Σ∗ holds u = u1 u2 . . . un and u0 = u1 w1 u2 w2 . . . wn−1 un . A         Fuzzy RDF [15] treats the Ex. 2. The domain here is KF =
finite set m ⊆ Σ∗ is an antichain if for all u ≤ u0 , u, u0 ∈ m,        h[0, 1], max, ⊗F , 0, 1i, where ⊗F is any t-norm of BL-algebra.
holds u = u0 . Let min(m) be the set of minimal elements                If ⊗F is an ordinary multiplication as in Ex. 2, then the domain
(w.r.t. ≤) of m. On the set of antichains M [Σ] we define:              becomes probabilistic. As KT domain, KF is a >-dioid.
                                                                           Default RDF. In both of the previous domains the product
      m1 + m2 = min(m1 ∪ m2 ),
                                                                        operation in the dioid is commutative. Next we introduce an
      m1 × m2 = min({w1 w2 | w1 ∈ m1 , w2 ∈ m2 }),
                                                                        example of ⊗-noncommutative annotations.
   Call M[Σ] = hM [Σ], +, ×, ∅, {}i, where  is the empty                 Suppose we want to represent – in an RDF graph –
string, a string dioid over generators Σ. Note, that a string           information about an attribute attached to resources modelled
dioid is a >-dioid. The following proposition says, that it is          by the graph. The straightforward way is to introduce a new
“the most general” of all >-dioids.                                     property to the vocabulary of the graph and handle it as usual.
   Proposition 4: (1) Given a set of generators Σ the string            Nevertheless, in some situations this way can be not optimal:
dioid M[Σ] is the free >-dioid on Σ, i.e. for any >-dioid               If we have a broad system of classes and values of the attribute
Kf = hK, ⊕, ⊗, ⊥, >i and a valuation φ : Σ → K there                    for subclasses and elements of a class are usually the same,
exists a unique homomorphism Evalφ : M [Σ] → K such that                then it is natural to keep the default attribute value for the
for each a ∈ Σ holds Evalφ (a) = φ(a).                                  class and the value for a resource only if it differs from usual
(2) The operations of the string dioid + and × can be                   one of the class it belongs to.
computed in polynomial time.                                               For a substantiation, consider an RDF graph denoted in
   Hence, string dioids consititute an important and general            Fig. 2 representing a (part of) botanical taxonomy3 . The
subclass of >-dioids. We now show how they can be applied               triples of this graph have annotations which store values
to annotate RDF graphs. Let G be a K-annotated graph and                of an attribute PublishedBy for every taxon in the graph.
X a set of triple ids of triples from Supp(G). We associate to          The division Pinophyta was introduced by Carl Linnaeus,
G an “abstract” version which is a (X ∪ ∅)-annotated graph Ḡ           so the triple (P lantae, sc, P inophyta) is annotated by
consisted of the same triples as G, but the annotation of each          Linnaeus. The subsequent taxons down to family Arau-
of them is its id if it has one, and ∅ otherwise.                       cariaceae keep this attribute value, so we annotate all sc-
   Theorem 2: For every K-annotated graph G holds cl(G) =               triples between Pinophyta and Araucariaceae with a special
Evalφ ◦ cl(Ḡ), where φ : X → K is a valuation which                    value ? which means “derived from above”. The same value
associates the annotation of an a-triple in G to its id.                keeps for genus Araucaria and species Araucaria Araucana.
   This theorem gives rise to the following strategy for an-            But genus Wollemia was introduced by David Noble as
notating RDF graphs. Given a graph G we are to annotate                 well as its only species Wollemia Nobilis, so the anno-
it with elements from several domains. However, we would                tations for the triples (Araucariaceae, sc, W ollemia) and
like to infer triples and their annotations only once, without          (W ollemia, sc, W.N obilis) are N oble and ? correspond-
recomputing the annotations for each annotation domain. In              ingly. Thus, to find out from the graph who introduced a taxon
this case we construct an “abstract” version Ḡ of G by                 we need to go up from its node along the tree until we find an
annotating triples with their ids. Then we can apply inference          edge with an annotation differs from ?. An annotation value
rules and obtain an abstract annotation from the string dioid           for a triple here represents not the value of the attribute by
over the set of ids for each triple. Finally, as soon as we need        itself, but its difference with the value of higher triple in the
to get annotations from a specific domain we need to make               tree. Hence, it is natural to phisically keep such annotations in
sure that it is a >-dioid, define a homomorphism by attaching           a memory only it they differ from ?. That is why if the value
specific annotations to triples in G and then apply it to abstract      of an attribute for a resource in most cases is determined by a
annotations of previously inferred triples.                             class it belongs to and the number of attributes are large, the
                                                                        advantage in memory can be sufficient.
          VI. A PPLICATIONS TO SPECIFIC MODELS
                                                                           Before we define the default domain formally, note, that the
   In this section we introduce several annotation domains for          situation in general case can be more complicated than in the
RDF graphs those are >-dioids.                                          latter example, because a ρdf subgraph of a graph does not
   Temporal RDF [16]. This model treats the Ex. 1. The                  nesessarily have a tree structure. Therefore, in certain cases we
temporal domain KT = hKT , ⊕T , ⊗T , ⊥T , >T i is defined as            need to combine different annotations for a triple. To this effect
follows. Consider temporal intervals [α1 , α2 ] where α1,2 ∈            we require a set of attirbute values to have a union operation
P = Z ∪ {−∞, +∞}, α1 < α2 . Two intervals [α1 , α2 ]                    and the lowest element, i.e. to be a semilattice. In many cases
and [α3 , α4 ] are adjacent iff α2 + 1 = α3 . The set KT                (as in the taxonomy example) this set has no structure, but it is
is the set of all pairwise disjoint and non-adjacent sets of            possible to enrich it with the lowest element Unknown and the
intervals. On KT a partial order is defined: γ1  γ2 iff                greatest element Error. The latter is justified by situations
for each I1 ∈ γ1 there exists I2 ∈ γ2 such that I1 ⊆ I2 .
For KT we have: γ1 ⊕T γ2 = inf{γ | γ  γi , i = 1, 2},                    3 This is just an example and the actual information may be incorrect.
. . . sc: ? Plantae sc: Linnaeus Pinophyta sc: ? . . . sc: ? Araucariaceae sc: ? Araucaria sc: ? A. araucana
                                                                   sc: N oble    Wollemia        W. nobilis
                                                                                           sc: ?
                                            Fig. 2.   The RDF graph representing botanical taxonomy


when from one source we get an attribute value for a resource,            have given some examples of its use. There may be some
but from another source we get a different value for the same             mileage to be gained by combining these two algebras. One of
resource which contradicts with the first one. The union of this          the applications of the proposed algebra is a system for default
information is inconsistent and requires a manual intervention,           reasoning about certain annotations on RDF resources, which
which can be flagged by Error annotation.                                 may also prove to be a useful mechanism for physically storing
   Let A be an attribute with a domain hA, t, 0i which is a               those annotations. We would like to find similar mechanisms
semilattice with union t and the lowest element 0. Consider an            for efficiently storing default annotations on triples.
A-default domain KA = hA? ∪ {⊥? , >? }, ⊕A , ⊗A , ⊥? , >? i,                 We are grateful to Marcelo Arenas, Boris Motik, Floris
where A? = (A×{T rue, F alse}) and ⊥? , >? are new special                Geerts and Alex Simpson for helpful discussions. This work
symbols. Note, that the meanings of ⊥? and an annotation                  was supported by a grant from the UK Engineering and
with 0 are different: The first one says that a triple is not in          Physical Sciences Research Council.
the support of a graph and the second says that the value of                                            R EFERENCES
the attribute is minimal according to t. The second boolean
                                                                           [1] D. Brickley and R. V. Guha. RDF Vocabulary Description Language
component of A? corresponds to ? in the taxonomy example                       1.0: RDF Schema. W3C Recommendation. http://www.w3.org/TR/
above, i.e. it is T rue in an annotation for a triple iff the actual           rdf-schema/, Feb. 2004.
value of the attribute is the union of the first component of the          [2] P. Buneman, S. Khanna, and W. Tan. Why and Where: A Characteriza-
                                                                               tion of Data Provenance. In ICDT 2001, volume 1973 of LNCS, pages
annotation with values those can be derived from triples above,                316–330. Springer, 2001.
and F alse overwise. The operations are defined as follows:                [3] J. J. Carroll, C. Bizer, P. J. Hayes, and P. Stickler. Named graphs,
                                                                               provenance and trust. In WWW, pages 613–622, 2005.
   (a1 , b1 ) ⊕A (a2 , b2 ) = 
                              (a1 t a2 , b1 ∨ b2 ),                        [4] L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. DBNotes: a post-it
                                 (a1 t a2 , b2 ) iff b1 = T rue,               system for relational databases based on provenance. In SIGMOD ’05:
   (a1 , b1 ) ⊗A (a2 , b2 ) =                                                  Proceedings of the 2005 ACM SIGMOD international conference on
                                 (a1 , b1 ) iff b1 = F alse.                   Management of data, pages 942–944, New York, NY, USA, 2005. ACM.
                                                                           [5] Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a
These operations extend to elements ⊥? and >? in a way to                      warehousing environment. ACM Trans. Database Syst., 25(2):179–227,
keep the properties of ⊥ and > in the dioid. Next, we describe                 2000.
the meaning of this operations. The addition ⊕A is applied                 [6] R. D. Dowell, R. M. Jokerst, A. Day, S. R. Eddy, and L. Stein. The
                                                                               Distributed Annotation System. BMC Bioinformatics, 2:7, 2001.
when we union annotations about the same triple. Hence, the                [7] G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, and
values a1 and a2 are joined by the semilattice operation t                     V. Christophides. Coloring RDF Triples to Capture Provenance.
and the possibility to derive values from a ρdf structure above                In International Semantic Web Conference, pages 196–212, 2009.
                                                                           [8] T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings.
exists only if it exists in any of the considering annotations.                In PODS ’07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-
The product ⊗A is applied when we infer triples by transitivity                SIGART symposium on Principles of database systems, pages 31–40,
of sc or similar inference rule from Tab. IV. If b1 = T rue we                 New York, NY, USA, 2007. ACM.
                                                                           [9] C. Gutierrez, C. A. Hurtado, and A. Vaisman. Introducing Time into
union the current value a1 with the derived value a2 and the                   RDF. IEEE Trans. on Knowl. and Data Eng., 19(2):207–218, 2007.
possibility of father deriving depends on b2 . If b1 = F alse, we         [10] C. Gutiérrez, C. A. Hurtado, and A. A. Vaisman. Temporal RDF. In
just keep the annotation (a1 , b1 ) which override any annotation              ESWC, pages 93–107, 2005.
                                                                          [11] P. Hájek. Metamathematics of Fuzzy Logic (Trends in Logic). Springer,
from a structure above. Finally, KA can be easily checked to                   1 edition, November 2001.
be ⊗-noncommutative >-dioid.                                              [12] A. Hogan, G. Lukacsy, N. Lopes, A. Polleres, U. Straccia, A. Zimmer-
   To use the A-default dioid KA for storing values of an                      mann, and S. Decker. RDF Needs Annotations. In Proceedings of W3C
                                                                               Workshop — RDF Next Steps. W3C, 2010.
attribute A for resources we assume (0, T rue)-annotation for             [13] F. Manola, E. Miller, and B. McBride. RDF Primer, W3C Recommen-
a triple by default and do not keep it in a memory. As soon as                 dation. http://www.w3.org/TR/REC-rdf-syntax/, Feb. 2004.
we need the real value of the attribute for a resource a we infer         [14] S. Muñoz, J. Pérez, and C. Gutiérrez. Minimal Deductive Systems for
                                                                               RDF. In ESWC, pages 53–67, 2007.
a triple (a, type, r) and get the first component of annotation           [15] U. Straccia. A Minimal Deductive System for General Fuzzy RDF.
for it. (Here r is the root of ρdf subgraph of the considered                  In Proceedings of the 3rd International Conference on Web Reasoning
graph; if it does not exist, we can always introduce it.)                      and Rule Systems (RR-09), number 5837 in Lecture Notes in Computer
                                                                               Science, pages 166–181. Springer-Verlag, 2009.
                                                                          [16] U. Straccia, N. Lopes, G. Lukacsy, and A. Polleres. A General
                      VII. C ONCLUSIONS                                        Framework for Representing and Reasoning with Annotated Semantic
   Although annotation algebras have been studied for database                 Web Data. In Proceedings of the Twenty-Fourth AAAI Conference on
                                                                               Artificial Intelligence (AAAI-10). AAAI Press, 2010.
query languages [8] and have also recently been investigated              [17] Y. R. Wang and S. E. Madnick. A Polygen Model for Heterogeneous
for RDF query languages [7], we have suggested an alternative                  Database Systems: The Source Tagging Perspective. In VLDB, pages
and more natural algebra for the annotation of RDFS, and we                    519–538, 1990.

</pre>