<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Directed Hypergraph Model for RDF</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amad´ıs Antonio Mart´ınez Morales</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mar´ıa Esther Vidal Serodio</string-name>
          <email>mvidal@ldc.usb.ve</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Sim ́on Bol ́ıvar</institution>
          ,
          <country country="VE">Venezuela</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de Carabobo</institution>
          ,
          <country country="VE">Venezuela</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>RDF is a proposal of the W3C to express metadata about resources in the Web. The RDF data model allows several representations, each one with its own limitations at expressive power and support for the tasks of query answering and semantic reasoning. In this paper, we present a directed hypergraph model for RDF to represent RDF documents efficiently, overcoming those limitations. 1 Related Work An RDF document can be viewed as a graph: nodes are resources and arcs are properties. Formally [4], suppose there are three infinite sets: U (URIs), B = {bj : j ∈ N} (blank nodes), and L (literals). (s, p, o) ∈ (U ∪B)×U ×(U ∪B∪L) is an RDF triple, s is called the subject, p the predicate, and o the object. An RDF graph T is a set of RDF triples. The universe of T , univ(T ), is the set of elements of U ∪B∪L that occur in the triples of T . sub(T ) (resp. pred(T ), obj(T )) is the set of all elements in univ(T ) that appear as subjects (resp. predicates, objects) in T . RDF graphs allow several representations: labeled directed graphs (LDG) [4,7], undirected hypergraphs (UH) [5], and bipartite graphs (BG) [5,6]. In the LDG model, given an RDF graph T , nodes in V are elements of sub(T ) ∪ obj(T ), and arcs in E are elements of pred(T ) [4,7]. Each (s, p, o) ∈ T p is represented by a labeled arc, s −→ o. The number of nodes and arcs for LDG representation is |V | ≤ 2|T | and |E| = |T | [5]. This approach may violate some of the graph theory constraints. Thus, while LDG model is the most widely used representation, it can not be considered a formal model for RDF [1]. In the UH model, given an RDF graph T , each t = (s, p, o) ∈ T is a hyperedge in E and each element of t (subject s, predicate p, and object o) is a node in V . The number of nodes and hyperedges for UH representation is |V | = |univ(T )| and |E| = |T | [5]. However, UH represent RDF documents as a generalization of undirected graphs, losing the notion of direction in RDF graphs, which impacts the task of semantic reasoning. Besides, it can be hard to graphically represent large RDF graphs, like the museum example [2]. In the BG model, given an RDF graph T , there are two types of nodes in V : statement nodes St (one for each (s, p, o) ∈ T ) and value nodes V al (one for each x ∈ univ(T )). Arcs in E relate statement and value nodes as follows: Each t ∈ St has three outcoming arcs that point to the corresponding node for the subject, predicate, or object of the triple represented by t. The number of nodes and arcs for BG representation is |St| = |T |, |V al| = |univ(T )|, and |E| = 3|T | [5,6]. While BG satisfy the requirement of a formal model for RDF, issues such as reification, entailment and reasoning have not been addressed yet [1].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Definition 1. Let T be an RDF graph. The RDF directed hypergraph
representing T is a tuple H(T ) = (W, E, ρ) such that: W = {w : w ∈ univ(T )} is the set
of nodes, E = {ei : 1 ≤ i ≤ |T |} is the set of hyperarcs, and ρ : W ×E → {s, p, o}
is the role function of nodes w.r.t. hyperarcs. Let t ∈ T be an RDF triple,
e ∈ E an hyperarc, and w ∈ W a node such that w ∈ head(e) ∪ tail(e).
Then the following must hold: (i) (ρ(w, e) = s) ⇔ (w ∈ tail(e)) ∧ (w ∈ sub({t})),
(ii) (ρ(w, e) = p) ⇔ (w ∈ tail(e)) ∧ (w ∈ pred({t})), and (iii) (ρ(w, e) = o) ⇔
(w ∈ head(e)) ∧ (w ∈ obj({t}))</p>
      <p>The information is only stored in the nodes, while hyperarcs preserve the role
of nodes and the notion of direction in RDF graphs. Thus, the space complexity
of our approach must be smaller than the complexity of representations in section
1. The number of nodes and hyperarcs for DH representation is |W | = |univ(T )|
and |E| = |T |. Besides, concepts and algorithms of hypergraph theory could be
used to manipulate RDF graphs under this representation.
3 Preliminary Experimental Results
Labeled directed graph (LDG) and directed hypergraph (DH) representations
were studied over twenty synthetic simple RDF documents, randomly
generated using an uniform distribution. Around 33% of the resources simultaneously
played the role of subject, predicate, or object. Document sizes were increased,
ranging from 1000 to 100000 triples. We used two metrics in this study: (a) the
space in memory required to store information, measured as the number of nodes
and arcs and (b) the number of comparisons required to answer an elemental
query. In both cases, the LDG approach showed a trend of linear dependence on
the size of the document, while DH exhibited a more independent behavior.
4 Conclusions and Future Plan
We proposed a directed hypergraph model for RDF. Initial results make us
believe that our approach scales better than existing representations. In the future,
we propose to: (1) extend this representation for RDFS graphs, (2) develop query
evaluation algorithms for conjunctive and SPARQL queries, (3) study the
impact of this model on the tasks of query answering and semantic reasoning, and
(4) conduct empirical studies to analyze the goodness of our approach.
References
1. F. Dau, “RDF as Graph-Based, Diagrammatic Logic”, Proceedings of ISMIS’06,
pages 332–337, 2006.
2. FORTH Institute of Computer Science (2003), “RQL v2.1 User Manual”. [Online]
http://139.91.183.30:9090/RDF/RQL/Manual.html
3. G. Gallo, G. Longo, S. Pallottino, and S. Nguyen, “Directed Hypergraphs and</p>
      <p>Applications”, Discrete Applied Mathematics, Vol. 42, No. 2, pages 177–201, 1993.
4. C. Gutierrez, C. A. Hurtado, and A. O. Mendelzon, “Foundations of Semantic Web</p>
      <p>Databases”, Proceedings ACM Symposium on PODS, pages 95–106, 2004.
5. J. Hayes, A Graph Model for RDF, Diploma Thesis, Technische Universitt
Darmstadt / Universidad de Chile, 2004.
6. J. Hayes and C. Gutierrez, “Bipartite Graphs as Intermediate Model for RDF”,</p>
      <p>Proceedings of the ISWC’04, pages 47–61, 2004.
7. G. Klyne and J. J. Carroll (2004), “Resource Description Framework (RDF):
Concepts and Abstract Syntax”, W3C Recommendation. [Online] http://www.w3.
org/TR/2004/REC-rdf-concepts-20040210/</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>