A Nested Graph Model for Visualizing RDF Data Renzo Angles Department of Computer Science, Universidad de Chile Department of Computer Science, Universidad de Talca rangles@utalca.cl Abstract. This paper presents an abstract data model for visualizing RDF data based on the notion of nested graphs. Our study gives theo- retical results that shows directions to enhance the representation and visualization of RDF data. 1 Introduction The term data exploration refers to the process by which the user is able to visual- ize, browse and query the data. Data visualization involves selecting, transform- ing and representing abstract data in a form that facilitates human interaction and understanding. RDF data is typically very large, highly interconnected, and heterogeneous without following a fixed schema [5]. Any technique for exploring RDF data should therefore be scalable; should support graph-based navigation; should be generic, not depending on a fixed schema; and allow exploration of the RDF descriptions without a-priori knowledge of its structure [14]. Current applications for exploring RDF data use different interfaces, each one having its advantages and disadvantages. For example, circle-and-arrow di- agrams (e.g., IsaViz [15]) are intuitive and useful to show the graph structure of RDF, but they are not proper to visualize large datasets. In another hand, an object-based interface (e.g., Tabulator [1,8]) provides a view where the user is able to see the data as a nested structure of resources. The objective of this paper is to show that underlying object-based inter- faces, there is a powerful data model which needs to be studied formally to take advantage of its properties. In this sense, we analyze object-based interfaces. We define a general framework based on the notion of nested graphs, a concept that was introduced in the area of database modeling by the hypernode model [6]. A nested graph extends the plain structure of a graph to a nested structure, al- lowing a simple and flexible representation of nested complex objects. We study the information capacity of a nested graph (i.e., the set of objects that can be modeled by it) in terms of its plain representation called a graphset. The contribution of this paper is the formal definition of an abstract data model for visualizing RDF data based on nested graphs. We show that RDF graphs, nested graphs and graphsets are three equivalent representations for RDF data. We propose the nested graph model as an abstract representation for object-based interfaces. The paper is organized as follows. In Section 2 we present the RDF model and we study object-based interfaces. A framework for nested graphs and graphsets is defined in Section 3. The abstract model for visualizing RDF data is presented in Section 4. Finally, Section 5 presents some conclusions. 2 Preliminaries 2.1 The RDF Model [12,10] Assume there are pairwise disjoint infinite sets U, B, L such that, U is the set of RDF URI references, L is the set of RDF literals, and B is the set of Blank nodes. We denote by T the union U∪B∪L.A tuple (v1 , v2 , v3 ) ∈ (U∪B)×U×T is called an RDF triple, where v1 is the subject, v2 the predicate 1 , and v3 the object. A set of RDF triples is called an RDF Graph. 2.2 Exploring RDF data Nowadays, there are many interfaces for browsing RDF data, each one having its advantages and disadvantages. Among these interfaces we can mention the following types: – Keyword Search: It is suffices for simple information lookup, but not for higher search activities such as browsing and querying (e.g., Swoogle [17]). – Explicit queries: It consists in using a query language for querying the data. It has the advantage that the language can be powerful enough to express any query, however writing queries is difficult and requires schema knowledge. – Graph Visualization: It is the most basic visualization model and is based on circle-and-arrow diagrams (e.g. IsaViz [15]). It provides an intuitive and useful interface when trying to understand the structure of the data (a graph in the case of RDF). However it is not an appropriate way to look at data when a big number of nodes are present or for comparing objects of the same class. Moreover, graph visualization does not scale to large datasets. – Faceted Browsing: In this model the information is faceted, that is, com- posed of orthogonal sets of categories. Facets allow the user to restrict the information space to be visualized and to find information without an a- priori knowledge of its schema. As example of application using facets is the Flamenco Search Interface [13]. – Object-based: In this kind of interface, the user is able to see the descrip- tion of a resource (i.e., its outgoing and incoming properties) as being an object which encapsulates its information. The basic approach shows the data of a unique resource in each moment (e.g., Marbles [3,7], Gruff [11], BrownSauce [16] and DAML Viewer [9]). A more complex approach con- sists to show a nested structure of resources (e.g., The Tabulator [1,8] and Zitgist [2]). An example of the complex approach is presented in Figure 1. Fig. 1. The Tabulator: Generic data browser [1,8] We concentrate our effort in to study object-based interfaces. The following lists some characteristics of this kind of interface. – The resource’s description includes outgoing and (possibly) incoming prop- erties, which are presented as a list of property-value pairs (recall that an outgoing / incoming property comes from a statement where the resource occurs as the subject / object respectively). An incoming property is com- monly represented by an expression “is property of”. For example see prop- erty acquaintance in Figure 1. – The nesting of resources is not necessarily hierarchical and cyclic references are possible. For example in Figure 1, the node Tim Berners-Lee contains the property is acquaintance of which introduces a cyclic reference to the root resource David Li. – Encapsulation of information is allowed because a non-expanded resource hides its properties. – There is redundant data, in that whenever an object is expanded more than one level, for and outer property we will also found a “dual inner property” in the opposite direction. As example, see the properties acquaintance and is acquaintance of in Figure 1. – The exploration begins in a root node (a resource that acts as the container), and browsing is achieved by selecting a property-value which either shows a new description (in the basic approach) or expands a node (in the complex approach). – Queries can be defined by either writing a query expression or constructing graphically a graph pattern in a query-by-example style (e.g. The Tabula- tor [8]). From the above features, we consider that object-based interfaces are a good approach for visualizing RDF data. In fact, we will show that underlying these interfaces there is a powerful data model which deserves to be studied formally to take advantage of its properties and features. 1 The predicate is also known as the property of the triple. Peru continent value South America currency value "Nuevo Sol" capital value Lima nickname value "City of the Kings" province value Lima province country value Peru ... Fig. 2. An example of nested graph 3 Nested Graph Model In this section, we will define an abstract data model based on the notion of nested graphs, a concept that was introduced in the area of graph database modeling by the Hypernode Model [6]. 3.1 Nested Graphs Definition 1. (Nested Graph) Assume that Σ is an infinite set of labels. A nested graph is defined recursively as follows: (i) a triple (u, N, E) such that u ∈ Σ, N ⊂ Σ and E ⊆ N × N × N is a nested graph; (ii) let NG be a set of nested graphs, then any triple (u, N, E) for which u ∈ Σ, N ⊂ Σ ∪ NG and E ⊆ N × N × N is also a nested graph. Given a nested graph G = (u, N, E), u is called the name of G, N is the set of nodes of G, and E is the set of edges of G. Given a node n ∈ N , if n ∈ Σ then n is called a primitive node and name(n) = n; otherwise, if n = (u′ , N ′ , E ′ ) ∈ NG then n is called a complex node and name(n) = u′ . Figure 2 presents graphically a nested graph named Peru. It contains primi- tive nodes (e.g. continent and "Nuevo Sol"), complex nodes (e.g. Lima) and edges value (e.g. continent → South America). The complex node Lima is a nested graph which is nested inside the nested graph Peru. Definition 2. Let G = (u, N, E) be a nested graph and k > 1. The operator nodesk (G) is defined recursively as follows: (i) nodes1 (G) = N ; and S (ii) nodesk (G) = nodes1 (G) ∪ G′ ∈N ∩NG nodesk−1 (G′ ). Then, nodesk (G) extracts nodes from G by examining recursively each nested graph in G until reaching the k th levelSof nesting. Additionally, we define nodes∗ (G) = j≥1 nodesj (G). Peru Lima continent value South America nickname value "City of the Kings" currency value "Nuevo Sol" province value Lima province capital value Lima country value Peru Fig. 3. An example of graphset which contains two plain graphs called Peru and Lima respectively. A nested graph G is cyclic if G ∈ nodes∗ (G) or if there exists G′ ∈ nodes∗ (G) such that G′ is cyclic. If G is not cyclic then it is a hierarchical nested graph. A node n is encapsulated in G if n ∈ nodes∗ (G). For example, the nested graph of Figure 2 is cyclic because the nested graph Peru occurs as node in the nested graph Lima, i.e., the nested graph Peru is encapsulated in itself. 3.2 Graphsets In this section we present the notion of graphset, a plain (or unnested) represen- tation for a nested graph. The complexity of working with a nested structure is usually reduced by transforming it to a plain structure. In this sense, we define additional operators and properties for nested graphs in terms of graphsets. Definition 3. (Plain Graph) A Plain Graph is a nested graph (u, N, E) satis- fying that N ⊂ Σ, i.e., a plain graph has no nested graphs as nodes. Given two plain graphs G1 = (u1 , N1 , E1 ) and G2 = (u2 , N2 , E2 ), we say that G1 is a subgraph of G2 , denoted G1 ⊆ G2 , iff N1 ⊆ N2 and E1 ⊆ E2 . We say that G1 and G2 are isomorphic, denoted G1 ≈ G2 , if and only if G1 ⊆ G2 and G2 ⊆ G1 . Additionally, consider the following operations between G1 and G2 : Union: G1 ∪ G2 = (u3 , N1 ∪ N2 , E1 ∪ E2 ) Intersection: G1 ∩ G2 = (u3 , N1 ∩ N2 , E1 ∩ E2 ) Difference: G1 − G2 = (u3 , N1 \ N2 , {(n1 , n2 , n3 ) ∈ N1 | ni ∈ N1 \ N2 }) where u3 ∈ Σ is a fresh label. If name(G1 ) = name(G2 ) then u3 = name(G1 ). Definition 4. (Graphset) A Graphset S is a set of plain graphs satisfying that, for each two plain graphs G1 and G2 in S, name(G1 ) 6= name(G2 ) , i.e., a label u ∈ Σ identifies at most one plain graph in S. We denote by GS the set of graphsets. Figure 3 shows an example of a graphset. Given two graphsets S1 and S2 , we say that S1 is a sub-graphset of S2 , denoted S1 ⊆ S2 , if and only if for each G1 ∈ S1 there exists G2 ∈ S2 satisfying that name(G1 ) = name(G2 ) and G1 ⊆ G2 . Then, S1 and S2 are equal, denoted S1 = S2 , if and only if S1 ⊆ S2 and S2 ⊆ S1 . Additionally, define the maximal-intersection and union between two graph- sets S1 and S2 : S1 ⊓ S2 = {G1 ∪ G2 | G1 ∈ S1 , G2 ∈ S2 and name(G1 ) = name(G2 )} S1 ∪ S2 = (S1 ⊓ S2 ) ∪ {G1 ∈ S1 | for all G2 ∈ S2 , name(G1 ) 6= name(G2 )} ∪ {G2 ∈ S2 | for all G1 ∈ S1 , name(G2 ) 6= name(G1 )} 3.3 Information Capacity of Nested Graphs The information capacity of a representation is given by the set of objects mod- eled by such representation. Additionally, two complex object types are abso- lutely equivalent if and only if they can both be reduced to a normal form com- plex object types, which is based on some natural restructuring operators [4]. In this direction, the representation capacity of a nested graph will be defined in terms of graphsets. First, we introduce operations for transforming nested graphs into graphsets. Definition 5. (Flattening of Nested Graphs) Let G = (u, N, E) be a nested graph. We define the operators flat and flat∗ as follows: – flat(G) = (u, N ′ , E ′ ) where N ′ = {name(n) | n ∈ N } and E ′ = {(name(n1 ), name(n2 ), name(n3 )) | (n1 , n2 , n3 ) ∈ E} – flat∗ (G) = flat(G) ∪ {flat∗ (G′ ) | G′ ∈ N ∩ NG} Then, flat(G) transforms the nested graph G into a plain graph by flattening its first level of nesting. Additionally, flat∗ (G) flattens G and each nested graph G′ in G until a fixpoint is reached. For example, Figure 3 shows the graphset obtained by flattening the nested graph in Figure 2. In the opposite direction, a graphset S can be transformed into a set of nested graphs by expanding the structure of each plain graph in S, i.e, by replacing simple nodes by complex nodes. The following definition formalizes this notion: Definition 6. (Expansion of Graphsets) Given a plain graph G and a graphset S, we define function integrate(G, S) recursively as follows: for each primitive node n ∈ nodes1 (G), if there exists G′ ∈ S such that name(G′ ) = n, then replace n by integrate(G′ , S). This procedure isSapplied until a fixpoint is reached. Additionally, we define integrate(S) = G∈S integrate(G, S). Note that integrate(S) returns a set of nested graphs, i.e, one for each plain graph in S. For example, if we expand the graphset in Figure 3, we will obtain the nested graph in Figure 2 plus the nested graph in Figure 4. The following lemma defines the equivalence, in terms of representation, be- tween nested graphs and graphsets. Lemma 1. Let NG be the set of nested graphs and GS be the set of graphsets. Then: ∗ ∗ (i) for each nested graph G ∈ NG, G S ∈ integrate (flat ∗(G)); and (ii) for each graph set S ∈ GS, S = G∈integrate∗ (S) flat (G). Lima nickname value "City of the Kings" province value Lima province country value Peru continent value South America currency value "Nuevo Sol" capital value Lima ... Fig. 4. A nested graph obtained by changing the structure of the nested graph pre- sented in Figure 2 Consider a set of nested graphs M starting from a graphset S (i.e., M = integrate∗ (S)), S Then it holds that M models the same set of objects that S (because S = G∈integrate∗ (S) flat∗ (G)). However, in some cases it can occur that a minimal subset M ′ of M is enough for modeling all the data modeled by the graphset S (i.e., we can obtain S by flattening each nested node in M ′ ). This minimal subset will be called the core of a set of nested graphs. Definition 7. Let M be Sa set of nested graphs. S A Core of M is a minimal subset M ′ of M satisfying that G′ ∈M ′ flat∗ (G′ ) = G∈M flat∗ (G). For example, consider the nested graph presented in Figure 2 and call it G. If we flatten G, we have that S = flat∗ (G) is the graphset presented in Figure 3. Now, we have that integrate∗ (S) will contain the nested graphs of Figure 2 and Figure 4. Note that both nested graphs are a core of the graphset S, because any of them contains all the data modeled by S. As we see, the core is not necessarily unique. Now, if the core of a set of nested graphs M is a single nested graph G, then G is called the single-source of M . In the context of graphsets, the above property introduces the notion of a single-source graphset. Definition 8. (Single-source graphset) A graphset S is called a single-source graphset if the core of S is unique. The notion of a core is useful for visualizing nested graphs. Consider the problem of selecting a good start point for navigation. If we use the core as a minimal set of navigation, we can reduce the number of visible or active nested graphs without losing data, such that all the data could be accessed by navigating through the nested graphs of the core. Clearly this problem could have a direct solution if we have a single-source graphset, then all the data can be accessed from a single nested graph. Lemma 2. Determining if a graphset S is single-source can be computed in polynomial time. Proof. Let S be a graphset. To construct a digraph G = (N, E), where the set of nodes N is the set of names of the graphs in S, and there is an arc from u1 to u2 in E if the name u2 occurs as node in the graph named u1 . We say that S is single-source if G is connected and there is a unique spanning tree for G (i.e., there is a tree which connects all the nodes together). Finally, the information capacity of a nested graph G is defined by the graph- set flat∗ (G). Additionally, the equivalence of two nested graphs is given by the equivalence in their information capacities, i.e., they can be reduced to the same graphset. Definition 9. (Equivalence of nested graphs) Two nested graphs G1 and G2 are equivalent, denoted G1 ≈ G2 , if and only if flat∗ (G1 ) = flat∗ (G2 ). For example, the nested graphs presented in Figure 2 and Figure 4 are equiva- lent. Additionally, we say that G1 is a sub-nested graph of G2 , denoted G1 ⊆ G2 , if it holds that flat∗ (G1 ) ⊆ flat∗ (G2 ). 4 Abstract Data Model for RDF Data In this section, we define the abstract model for visualizing RDF data based on nested graphs. Assume that NG∗ ⊂ NG is the set of nested graphs whose set of labels is given by U ∪ L (RDF URIs and labels), and each nested graph (u, N, E) in NG∗ satisfies: (i) u ∈ U; (ii) N ⊂ U ∪ L ∪ NG∗ ; (iii) E ⊆ ((N \ L) × {+} × N ) ∪ ((N \ L) × {−} × (N \ L)); and (iv) if n ∈ N then n occurs in some edge of E. We denote by GS∗ the set of graphsets having only plain graphs from NG∗ . Given an RDF graph2 GRDF , the following algorithm returns a graphset S ∈ GS∗ which models the same data as GRDF : Let S be an empty graphset for each triple (v1 , v2 , v3 ) in GRDF do G1 = (v1 , N1 , E1 ) where N1 = {v2 , v3 } and E1 = {(v2 , +, v3 )} S ← S ∪ G1 if v3 ∈ U then G2 = (v3 , N2 , E2 ) where N1 = {v1 , v2 } and E2 = {(v2 , −, v1 )} S ← S ∪ G2 end if end for It means that for each resource u ∈ U occurring either as subject or object in some triple of GRDF , there exists a plain graph (u, N, E) where E contains: (i) an edge (n1 , +, n2 ) for each triple (u, n1 , n2 ) where u occurs as the subject; and (ii) an edge (n1 , −, n2 ) for each triple (n2 , n1 , u) where u occurs as the object. In this sense, a plain graph (u, N, E) in S interprets a resource identified u whose outgoing and incoming properties are encapsulated by edges labeled “+′′ and “−′′ respectively. 2 We restrict our study to RDF graphs having no blank nodes. David Li type + Person family_name + "Li" Given name + "David" acquaintance + Tim Berners−Lee Type + Person seeAlso + http://dbpedia.org/resource/Tim_Berners−Lee name + "Tim Berners−Lee" requested + http://dbpedia.org/resource/Tim_Berners−Lee acquaintance − David Li ... acquaintance + James Hollenbach personal mailbox + "david_li@mit.edu" Fig. 5. An abstract representation to the interface in Figure 1. From Lemma 1, we have that an RDF graph GRDF can also be represented by the set of nested graphs integrate∗ (S), where S is the graphset constructed from GRDF (as defined above). It allow us to present the following proposition. Proposition 1. The nested graph model is an abstract representation for Object- based interfaces. For example, consider the object-based interface presented in Figure 1. The expanded node David Li can be represented, from an abstract point of view, by the nested graph presented in Figure 5. If we compare both figures, we can see that an expanded node in the interface can be represented by a nested graph, and outgoing and incoming properties are represented by edges labeled “+” and “-” respectively. For example, the incoming link is acquaintance of → David Li of Figure 1 is represented by the edge acquaintance →- David Li in Figure 5. This abstract representation based on nested graphs enable us to study for- mally object-based interfaces. For example, the notion of single-source graphset can be used to select an object as the start point of navigation. A language for querying object-based interfaces could be based on the operators defined for nested graphs and graphsets. Exploration states can be stored, serialized or shared in the form of nested graphs or graphsets. 5 Conclusions In this paper, we studied interfaces for visualizing RDF data, in particular object- based interfaces. We present a formal framework for representing and querying of nested graphs. We showed that an RDF graph can be modeled in terms of nested graphs and graphsets (a plain representation for nested graphs). We show that studying formal properties of RDF graphs viewed as nested graphs (or graphsets) is possible to optimize some visualization parameters. Further work includes to analyze properties of the model and the definition of a query language. Acknowledgments. R. Angles was supported by Mecesup project No. UCH0109, and by FDI VAC650006, Faculty of Engineering, Universidad de Talca. The au- thor wish to thank the reviewers for their comments. References 1. The tabulator. http://www.w3.org/2005/ajar/tab. 2. Zitgist. http://zitgist.com/products/dataviewer/dataviewer.html, 2007. 3. Marbles linked data browser. http://beckr.org/marbles, 2008. 4. S. Abiteboul and R. Hull. Restructuring Hierarchical Database Objects. Theoret- ical Computer Science, 62(1-2):3–38, 1988. 5. R. Angles and C. Gutierrez. Querying RDF Data from a Graph Database Per- spective. In Proceedings of the 2nd European Semantic Web Conference (ESWC), number 3532 in LNCS, pages 346–360, 2005. 6. R. Angles and C. Gutierrez. Survey of graph database models. ACM Computing Surveys (CSUR), 40(1):1–39, 2008. 7. C. Becker and C. Bizer. Dbpedia mobile: A location-enabled linked data browser. In 1st Workshop about Linked Data on the Web (LDOW), April 2008. 8. T. Berners-Lee, Y. Chen, L. Chilton, D. Connolly, R. Dhanaraj, J. Hollenbach, A. Lerer, and D. Sheets. Tabulator: Exploring and Analyzing linked data on the Semantic Web. In Procedings of the 3rd International Semantic Web User Interaction Workshop (SWUI06), 2006. 9. M. Dean and K. Barber. DAML Viewer. http://www.daml.org/viewer/, 2001. 10. P. Hayes. RDF Semantics. http://www.w3.org/TR/2004/REC-rdf-mt-20040210/, February 2004. 11. F. Inc. Gruff: A Grapher-Based Triple-Store Browser. http://agraph.franz.com/gruff/, June 2008. 12. G. Klyne and J. Carroll. Resource Description Framework (RDF) Concepts and Abstract Syntax. http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/, February 2004. 13. The Flamenco Search Interface Project. http://flamenco.berkeley.edu/. 14. E. Oren, R. Delbru, and S. Decker. Extending faceted navigation for RDF data. In Proceedings of the 5th International Semantic Web Conference (ISWC), 2006. 15. E. Pietriga. IsaViz: A Visual Authoring Tool for RDF. http://www.w3.org/2001/11/IsaViz/. 16. D. Steer. Brownsauce RDF Browser. http://brownsauce.sourceforge.net/, 2002. 17. Swoogle. http://swoogle.umbc.edu/, 2004.