<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RDF Data Descriptions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Georg Lausen</string-name>
          <email>lausen@informatik.uni-freiburg.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Schmidt</string-name>
          <email>m.schmidt00@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kaiserstra e 86</institution>
          ,
          <addr-line>66133 Scheidt</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Freiburg, Institute for Computer Science</institution>
          ,
          <addr-line>79110 Freiburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Linked Open Data (LOD) sources on the Web are increasingly becoming more popular. RDF constraints can be used to characterize the RDF graphs being provided by such sources. For applications that process data retrieved from several of such RDF graphs it becomes interesting to analyze the relationships of the di erent sets of constraints associated with the sources providing the RDF graphs. In this short paper we discuss how the constraints from di erent sources can be aggregated to a set of constraints characterizing the union of the RDF graphs under consideration. For expressing constraints we use Datalog+/-.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The recent RDF Validation Workshop [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] states a gap between the
current standards o ering and the industry needs for validation of RDF data.
As a possible solution, in continuation of our previous work [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we have
developed a constraint language RDD (RDF Data Descriptions) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], that
captures a broad range of constraints including keys, cardinalities,
subclass, and subproperty restrictions, making it easy to implement RDD
checkers and clearing the way for semantic query optimization.
      </p>
      <p>
        The intention of an RDD is similar to Stardog ICV [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], where
constraints are stated using OWL and considered relative to a certain
inference machinery whose type may range from no inferencing, RDFS- to
OWL-inferencing. In contrast, RDD is a language using a compact
specialpurpose syntax designed for only expressing constraints independent of a
speci c inference machinery. This makes RDD in particular applicable for
RDF under ground semantics, which is a common scenario in the Linked
Data context.
      </p>
      <p>
        While in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] we considered a restricted scenario where a single RDD
de nes the constraints given in a single RDF graph, in this short paper
we suggest to broaden the view to a set of RDF graphs, each described
by its own RDD de ning a set of associated constraints. The major
difculties of such a scenario arise as the information represented by the
graphs may overlap in the sense that certain resources may be described
in more than one graph. To accomplish such situations the notion of a
context has been coined [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. While in this paper RDF in di erent context
is discussed with respect to information aggregation, our concern is
aggregation of constraints, which has not been studied before, to the best
of our knowledge.
      </p>
      <p>Let us consider RDF graphs Ga and Gb and corresponding RDDs
RDDa and RDDb, respectively. Both graphs are assumed to be
consistent, i.e. all the constraints in the respective RDD are ful lled. The main
question we are interested in is how the union of the graphs Ga [ Gb is
related to the union of the respective sets of constraints a and b. As Ga
and Gb may contain triples referring to subjects with the same URI, in
general it will hold Ga [ Gb does not ful ll all constraints in a [ b. For
example, whenever a certain predicate p is de ned to be single-valued in
RDDa and RDDb, then two corresponding triples (s; p; o1) and (s; p; o2)
may appear in the union Ga [ Gb of both graphs, so that the constraint
is not guaranteed to hold in Ga [ Gb. As solution for such cases we
propose aggregation of constraints what in our example would mean that the
single-valued constraint is replaced by a constraint restricting the number
of occurrences of values to 2. In general, given RDDs a and b, we are
interested to construct an RDD ( a; b) such that for any RDF graphs
Ga and Gb, where Ga j= a and Gb j= b, we have Ga j= ( a; b),
Gb j= ( a; b) and Ga[Gb j= ( a; b). The task of deriving ( a; b)
is called constraint aggregation.</p>
      <p>
        Recently, Cortes-Calabuig and Paredaens [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] have presented a
constraint language for RDF equipped with deductive rules for equality and
tuple generating dependencies. However, as can be seen from the following
example, their constraint language is not general enough to be used for an
RDD. For these reasons we have chosen the framework of Datalog+/- [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
which o ers the needed expressiveness.
      </p>
      <p>In Figure 1 we exhibit two RDDs describing RDF graphs representing
employees and projects. To demonstrate constraint aggregation let us
consider predicate reportsTo, which is de ned via a path-constraint in
RDDa. In RDDb for reportsTo it is de ned that each employee must
report to exactly two objects. These constraints are written in
Datalog+/as follows (predicate names are abbreviated):</p>
      <p>G($s; rT; $o) ! 9$o1(G($s; wF; $o1); G($o1; aT; $o))
G($s; rT; $o1); G($s; rT; $o2); G($s; rT; $o3) ! $o1 = $o2 _ $o1 = $o3 _ $o2 = $o3
G($s; type; E) ! 9$o1; o2(G($s; rT; $o1); G($s; rT; $o2); $o1 6= $o2)
Moreover, RDDa de nes worksFor and assignedTo as total functions.
Therefore it can be inferred that reportsTo is a partial function, even
though this is not stated in RDDa. Using this additional information,
PREFIX ex: &lt;http://www.example.com#&gt;
CWA CLASS ex:Employee {</p>
      <p>KEY rdfs:label : LITERAL
PARTIAL ex:employedBy : RESOURCE
MAX(2) ex:prevEmployedBy : RESOURCE
TOTAL ex:worksFor, RANGE (ex:Project)
PATH(ex:worksFor/ex:assignedTo)
ex:reportsTo, RANGE(ex:Consortium) }</p>
      <p>PREFIX ex: &lt;http://www.example.com#&gt;
CWA CLASS ex:Employee {</p>
      <p>KEY rdfs:label : LITERAL
PARTIAL ex:employedBy : RESOURCE
ex:prevEmployedBy : RESOURCE,</p>
      <p>SUBPROPERTY employedBy
MIN(2), MAX(2) ex:reportsTo,</p>
      <p>RANGE(ex:Association) }
CWA CLASS ex:Project {</p>
      <p>TOTAL ex:assignedTo,</p>
      <p>
        RANGE(ex:Consortium) }
by constraint aggregation we get a min(0)- and max(3)-constraint for
predicate reportsTo. Note that without the inferred constraint,
predicate reportsTo has to be considered to be unrestricted and therefore
only the trivial constraint max(1) can be derived by constraint
aggregation. Formally, inferring constraints in the Datalog+/- framework can
be done based on the chase procedure [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We are currently investigating
termination and complexity of the corresponding constraint implication
problem. However, constraint aggregation, as proposed in this paper, by
itself is independent from the concrete constraint language considered.
For example, in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for a wide number of ontology languages termination
and e ciency of the required chase procedure is demonstrated.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. Stardog. http://http://Stardog.com/.</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Cal</surname>
          </string-name>
          , Georg Gottlob, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          .
          <article-title>A general datalog-based framework for tractable query answering over ontologies</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>14</volume>
          :
          <fpage>57</fpage>
          {
          <fpage>83</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Cal</surname>
          </string-name>
          , Georg Gottlob, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Pieris</surname>
          </string-name>
          .
          <article-title>Towards more expressive ontology languages: The query answering problem</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>193</volume>
          :
          <fpage>87</fpage>
          {
          <fpage>128</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Alvaro</surname>
            Cortes-Calabuig and
            <given-names>Jan</given-names>
          </string-name>
          <string-name>
            <surname>Paredaens</surname>
          </string-name>
          .
          <article-title>Semantics of constraints in rdfs</article-title>
          .
          <source>In AMW</source>
          , pages
          <volume>75</volume>
          {
          <fpage>90</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Alin</given-names>
            <surname>Deutsch</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alan</given-names>
            <surname>Nash</surname>
          </string-name>
          .
          <source>Chase. In Encyclopedia of Database Systems</source>
          , pages
          <fpage>323</fpage>
          {
          <fpage>327</fpage>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>R.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mccool</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and R.</given-names>
            <surname>Fikes</surname>
          </string-name>
          .
          <article-title>Contexts for the semantic web</article-title>
          .
          <source>In ISWC</source>
          , pages
          <volume>32</volume>
          {
          <fpage>46</fpage>
          . Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Georg</given-names>
            <surname>Lausen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Meier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Schmidt</surname>
          </string-name>
          .
          <article-title>SPARQLing Constraints for RDF</article-title>
          .
          <source>In EDBT</source>
          , pages
          <volume>499</volume>
          {
          <fpage>509</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Michael</given-names>
            <surname>Schmidt</surname>
          </string-name>
          and
          <string-name>
            <given-names>Georg</given-names>
            <surname>Lausen</surname>
          </string-name>
          .
          <article-title>Pleasantly consuming linked data with rdf data descriptions</article-title>
          .
          <source>In COLD</source>
          , volume
          <volume>1034</volume>
          <source>of CEUR Workshop Proc</source>
          .,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. W3C.
          <article-title>Rdf validation workshop, practical assurances for quality rdf data</article-title>
          . http://www.w3.org/
          <year>2012</year>
          /12/rdf-val/,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>