Building Linked Data Applications with Fusion: A Visual Interface for Exploration and Mapping Samur Araujo1, Geert-Jan Houben1, Daniel Schwabe2, Jan Hidders1 1 Delft University of Technology, PO Box 5031, 2600 GA Delft, the Netherlands 2 PUC-Rio, Rua Marques de Sao Vicente, 225, Rio de Janeiro, Brazil {s.f.cardosodearaujo, g.j.p.m.houben, a.j.h.hidders}@tudelft.nl dschwabe@inf.puc-rio.br Abstract. Building applications over Linked Data often requires a mapping between the application model and the ontology underlying the source dataset in the Linked Data cloud. Explicitly formulating these mappings demands a comprehensive understanding of the underlying schemas (RDF ontologies) of the source and target datasets. This task can be supported by integrating the process of schema exploration into the mapping process and help the application designer with finding the implicit relationships that she wants to map. This demo describes Fusion - a framework for closing the gap between the application model and the underlying ontologies in the Linked Data cloud. Fusion simplifies the definition of mappings by providing a visual user interface that integrates the exploratory process and the mapping process. Its architecture allows the creation of new applications through the extension of existing Linked Data sources with additional data. Keywords: semantic web, data interaction, data management, RDF mapping, Linked Data 1 Introduction Nowadays, the Linked Data1 cloud provides a new environment for building applications where many datasets are available for consumption. Although data in this cloud is ready to use, applications over the Linked Data cloud have currently an intrinsic characteristic: they consume RDF2 data “as is”, since designers do not have write permission over the data in the cloud which would enable them to change the data in any way. This fact raises an important issue concerning the development of applications over Linked Data: how to fill the gap between the ontology associated with the application model and the ontology used to represent the underlying data from the Linked Data cloud? The main benefit of mapping these two models is that 1 Linked Data - http://linkeddata.org/ 2 http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ Linked Data can then be accessed through properties defined in the application model, which is more convenient for the designer, consequently simplifying considerably the development and maintenance of the application. This demo presents Fusion [1], a lightweight framework to support application designers in building applications over Linked Data. It supports designers in mapping the ontology of the used Linked Data sources to their application model by integrating the process of exploration of the target schema with the task of expressing a mapping rule itself. Fusion features a visual user interface that guides the designer in the process of specifying a mapping rule. It uses a standard RDF query language and allows Linked Data to be accessed using properties defined in the application model, consequently simplifying the use of Linked Data in a specific context. 2 Architecture Overview The main aim of Fusion is to help the designer in discovering relationships in RDF graphs that exist in the Linked Data cloud and specifying rules for the derivation of new properties for these relationships. Fusion’s architecture provides a complete environment to specify and execute a derivation rule. An overview of Fusion’s architecture is shown in Fig. 1. The Fusion server engine is responsible for executing the derivation rule itself. During the process of executing a rule, it queries a source endpoint in the Linked Data, processes the results, and produces a set of new triples that will be added to the Fusion repository. Any RDF data store can be used as a Fusion repository. Currently, Fusion implements adapters for Sesame6 and Virtuoso7 data stores, although other adapters can be easily added to its architecture. All derived triples in Fusion contain as subject a resource whose URI belongs to the queried dataset, so the derived data is intrinsically interlinked with the Linked Data cloud. For this reason, a query over a federation of endpoints that includes the Fusion repository endpoint will allow the designer to have a view over the Linked Data that also includes the properties defined in her application model. Fig. 1 – Fusion’s architecture overview. Fusion is implemented in Ruby on Rails8 as a web application. It uses the ActiveRDF9 API that allows an RDF graph to be accessed in the object-oriented 6 http://www.openrdf.org/ 7 http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/ 8 http://rubyonrails.org/ 9 http://www.activerdf.org/ paradigm. By using this API the properties of an RDF resource can be accessed as an attribute of its corresponding Ruby10 object. This architecture allows the designer to write complex functions for computing a new datatype property value using the full power of the Ruby language, which cannot be achieved simply by using the SPARQL language. Derivation rules could be executed on demand, by any rule engine associated to the databases in the federation. However, even if theoretically possible, executing inference rules, or even instantiating a virtual view, over the Linked Data is still an open problem, since it raises many performance issues. Indeed, querying data that is already materialized is always faster than querying data that needs to be processed at runtime. Fusion avoids this problem by materializing the result of the rules as new triples in the Fusion repository when the derivation rules are defined. 3 Example of Use This section presents a scenario that illustrates the use of Fusion to create an application by extending Linked Data sources with additional properties. Suppose that the designer wants to establish the relationship between US senators and the US state that they represent. Therefore she needs to construct a derivation rule that will find and define such a correspondence between politicians and states in the GovTrack.Us’s Linked Data. In the first step in the process, the designer provides an example of two resources in GovTrack.Us that she knows in advance that are actually related, for instance, the politician Christopher Bond and the state of Missouri. Also, she needs to declare the GovTrack.Us endpoint to be queried and the maximum depth of the path. As the result of this first step, Fusion shows all the paths that connect these two example resources satisfying the maximum path length. This result is shown in Fig. 2. In this example, the paths found have a maximum length of 3. Fig. 2 – Fusion’s interface showing the discovered paths. 10 http://www.ruby-lang.org/en/ In this view, the designer can now look for the path that has the intended semantics. Note that with this view the tool assists the designer in this discovery process, since she does not need to query the schema manually in order to find these paths. The first path shown in Fig. 2 indicates that the politician Christopher Bond has a role as senator representing the state Missouri, and in our example case the designer can now infer that this is an instance of the path that she is looking for. After this conclusion, the designer chooses that instance to be the template for the rule. Fig. 3 – Generalizing the path for the property isSenatorOf in GovTrack.Us. In the next step, shown in Fig. 3, the designer will define the derivation rule itself, which means that she visually formulates a query, which generalizes the selected path from the first step into a query that selects the elements to be connected through the property isSenatorOf. To complete this operation she also needs to define the graph where the derived triples will be stored and a specific URI to be used as the predicate of the new triples, which in this example will be the URI http://example.org/isSenatorOf. Note that in this example 3 nodes were generalized such that only paths between resources of the RDF type Politician and RDF type State that contain an intermediate node that is part of the United States Senate will be considered during the derivation process. Consequently, Fusion will derive the new property isSenatorOf for all instances of the class Politician that are connected to an instance of the class State through the designated path. The whole process ends with Fusion adding new triples to Fusion repository. References 1. Araujo S., Houben G., Schwabe D., Hidders J. Fusion – Visually Exploring and Eliciting Relationships in Linked Data. In Proceedings of the 9th International Semantic Web Conference (ISWC2010). Shanghai, China. Nov 07-11, 2010.