Explorator: a tool for exploring RDF data through direct manipulation. Samur F. C. de Araújo Daniel Schwabe Catholic University of Rio de Janeiro Catholic University of Rio de Janeiro R. M. S. Vicente 225 R. M. S. Vicente 225 Gávea, Rio de Janeiro, RJ, Brazil Gávea, Rio de Janeiro, RJ, Brazil +55 21 3527-1500 +55 21 8241-4313 saraujo@inf.puc-rio.br dschwabe@inf.puc-rio.br ABSTRACT investigating and learning about a set of data without a-priori knowledge of its domain. This data is expressed in RDF1, and is In this paper we introduce Explorator, a tool for exploring the typically stored in very large interconnected databases, without a Semantic Web data by direct manipulation. Explorator homogeneous schema. The exploration mechanisms currently implements a model of operations that is supported by a visual available are not sufficient to accomplish the user tasks in the SW. interface that enables the user, with minimal knowledge of RDF Keyword search, e.g. Sindice2, only addresses simple information model, to explore an RDF database without a-priori knowledge of lookup. Explicitly formulated queries, e.g. iSparql 3 , requires data domain. Consequently, it is well suited for tasks that involve schema and technical knowledge from the users. Semantic information search, exploration and visualization. browsers, e.g. Tabulator [3], are not designed to explore huge datasets and semantic faceted browsing, e.g. BrowseRDF [12], is Categories and Subject Descriptors inefficient for fact-finding or known-item retrieval and some more H.5.3 Web-based interaction; H.5.4 Hypertext/Hypermedia - complex exploratory tasks. Navigation, H.3.3 Information Search and Retrieval – search In this paper we will describe a model for representing process, query formulation. information processing by users in exploratory tasks, and Explorator tool, which provides a browser interface supporting General Terms this model. Explorator is based on the metaphor of direct Algorithms, Design, Experimentation, Human Factors, manipulation of information on the interface, with immediate Languages, Theory, Verification. feedback of user actions. The remainder of the paper is organized as follows. Section 2 defines more precisely the exploratory Keywords search itself; Section 3 presents the information processing model; RDF, exploratory search, exploration, ontology, semantic web. describes Explorator tool and its interface; Section 4 we present some details of its implementation; Section 5 presents some conclusions and directions for further work. 1. INTRODUCTION As the volume of information on the Web increases considerably, 2. EXPLORATORY SEARCH we need better tools to help us discover and make sense of the In the hypertext field, we call information exploration the process available information, as well as to seek answers to specific of seeking, learning about and investigating a (potentially large) questions we may have. collection of information items through search, browsing or Currently, seeking information is a task that permeates most navigation, but not excluding other forms, in order to discover activities we develop in our day-to-day. Depending on the type of something new. activity we perform, we use different strategies and tactics to Research in the area called exploratory search [11] has tried to search for information. In the web, these tactics are supported by develop solutions that support information exploration. computational tools such as keyword search, navigation and Exploratory search is applicable in situations where the user’ task browsing [11]. But the process of seeking information is not and the search environment have complex elements that require simply finding it, we must keep in mind that the task of the user constant user interpretation during the exploration process. For ranges from simply searching for a known item to activities such example, how to support the user’s search task when she is not as knowledge acquisition, understanding of concepts, discovery, familiar with the search domain, or she does not have sufficient planning, transforming, etc. [11] knowledge about the domain to make a query; how to support the A more recent development has been the Semantic Web (SW), navigation in vast information spaces, or when the navigation, and the rapidly growing amount of semantically annotated data searching and browsing are not enough. In other words, how to leads to the need to support not only for searching, but also for 1 RDF – Resource Description Framework Copyright is held by the author/owner(s). 2 http://sindice.com LDOW 2009, April 20-24, 2009, Madrid, Spain. 3 iSparql can be accessed at http://demo.openlinksw.com/isparql/ take into account all aspects [2, 7, 11] that influence the able to extract semantic annotations from HTML pages obtained exploration process: the user’s task, the user’s context, the user’s from URIs that cannot be dereferenced as an RDF file, using profile, the environment, the information provenance, etc. GRDLL. In spite of distinct dereferencing processes being able to Marchionini [11] made a distinction between exploratory search, retrieve different amounts of information, the process itself does lookup and search retrieval. According to him, exploratory search not improve the nature of tasks performed in these tools. In fact, is based not only on lookup but also on investigation and learning. the set of exploration tasks are limited to navigation between sub- He argues that investigative search and learning search require graphs by clicking on the resources displayed in the interface and more human iteration than a simple lookup, because these are dereferencing the corresponding URIs. exploratory processes that support tasks that require the cognitive Another way to access SW data is by querying a SPARQL and interpretative ability of user. These kinds of tasks are Endpoint that receives a SPARQL12 query and returns a set of commonly found in the exploration of RDF databases, where the RDF resources described in XML notation. There are a few tools users need to identify classes and properties from the schema, in that allow us to explore a SPARQL Endpoint. NITELIGHT [15] order to understand concepts, acquire knowledge and learn about and iSPARQL13 are Visual Query Systems (VQS) [5] which allow the domain. visual construction of SPARQL queries, differing mainly in the Berners-Lee et al. [3] argue that once the information sought is visual notation employed. It is understood that to use these tools found, it may be necessary to analyze it. According to their the user must have a full comprehension of the underlying RDF description, exploration and analysis are distinct processes that are schema and the query language syntax, therefore leading to a high inter-related during the user’s task. In our point of view, the cognitive load for newcomers and less experienced users. process of exploration involves both finding a piece of Tabulator also provides a way to query its data using SPARQL by information and investigating or learning about its domain, providing an interface in which the user can formulate a query because it is guided by the need to perform a task. The cognitive based on the selection of the elements of the RDF graph displayed process of analysis permeates the entire exploratory task, since on the interface. However, more complex queries need to be while browsing, the user creates an expectation of what she will edited manually, exposing the user to some of the issues cited obtain, she sees what has been achieved and uses this information before. to guide her in the next step. Some tools address a different goal in the process of accessing In order to provide to the user an exploratory search tool that SW data. Instead of focusing on access to RDF data, they focus on supports learning and investigative search on SW, we focused on how to consume RDF data. Exhibit [9] is a lightweight structured three fronts: data publishing tool that can be used to export small collection of RDF data. This tool accomplishes an important role on the SW, • Information search (how semantic data is found on the by publishing content from different sources on the Web. Semantic Web), Taking all this into consideration, we can see there are no tools • Information usage (how semantic data is used on the adequate to explore the semantic web as a whole. Currently, the Semantic Web), browsers and SPARQL query builders are addressing different • Information visualization (how semantic data is goals, and were designed for different kinds of users. In order to presented on the Semantic Web). provide a complete and integrated exploratory search mechanism to access the SW data, we are proposing Explorator. 2.1 Information Search (in the SW) 2.2 Information Usage (in the SW) Nowadays, we can access the SW data in three different manners: through a SPARQL Endpoint4, through an URI, or by processing The RDF model provides a format for data, information, and semantically annotated HTML pages (e.g. Microformats 5 or knowledge exchange. However, the repositories of data are RDFa6). There are tools which can explore the SW directly, such scattered on the SW, which demand a unified mechanism to as semantic web browsers, such as Tabulator [3], Disco7, Zitgist access them. Many information-intensive human tasks demand the data viewer 8 , Marbles 9 , ObjectViewer 10 and Openlink RDF manipulation of multiple pieces of information. In a SW Browser11. exploration tool, at a low level, the objects manipulated are RDF data (resources, triples, literals, properties, etc) and queries. These These tools all implement a similar exploration strategy, allowing are the information items being manipulated when using an RDF the user to visualize an RDF sub-graph in a tabular fashion or in a browser. more “visual” way (e.g., map views or timelines) when applicable. The sub-graph is obtained by dereferencing [4, 6] an Consider the SW user looking for all papers mentioning another URI and each tool uses a distinct approach for this. Tabulator is paper; or all paper authors’ phone numbers. The user may encounter different data architectures while performing such 4 tasks. For example, the information sought may be stored in http://www.w3.org/TR/rdf-sparql-protocol/ multiple RDF files or in a single large RDF repository, and 5 http://microformats.org/ expressed in distinct vocabularies. It is crucial that any 6 exploratory tool be able to consolidate the information to be http://www.w3.org/TR/xhtml-rdfa-primer/ accessed in an integrated way. The user should be able to merge 7 http://www4.wiwiss.fu-berlin.de/bizer/ng4j/disco/ information described in different vocabularies, at least by 8 http://dataviewer.zitgist.com/ directly manipulating each piece of information. For example, 9 http://beckr.org/marbles 10 12 http://objectviewer.semwebcentral.org/ http://www.w3.org/TR/rdf-sparql-query/ 11 13 http://demo.openlinksw.com/rdfbrowser/index.html iSparql can be accessed at http://demo.openlinksw.com/isparql/ suppose she is looking for all email addresses by dereferencing Tabulator’s more general view represents the information in a tree four different URIs, each one returning triples expressed in a structure. As the user selects a resource in the interface, a new distinct vocabulary. Even if she could see all the data together, she node is added to the tree, thus recording user’s navigation process would not be able to manipulate this set of information to obtain a in the interface. The authors argue that it is comfortable for the unique final set of email addresses, only by using current RDF user to see the information in a tree-oriented interface, due to browsers’ functionality. familiarity with other sources of data are also represented in a Some of these browsers, like Openlink RDF Browser, cache all hierarchical structure. The authors also proposed a model of views RDF data during the user’s navigation. Therefore, the user can to be applied when the domain is known. A view oriented towards treat pieces of information from different sources as coming from a specific domain improves the understanding of the instances a unique repository. However, the user cannot issue a query on the being explored. For example, it is better to see geographic results, which limits the kinds of tasks supported. For example, it coordinates on a map than in a table. is very difficult to obtain the homepage address for all people From the user's task point of view, the representation of known to someone, as reported in their FOAF profile, by using information helps its assimilation, but it does not expand the kinds one of the RDF Browsers mentioned earlier. of tasks that can be done. What we have observed so far is that From the user’s task point of view, exploring the SW involves without a proper model of exploration, involving well-defined asking questions and getting answers about the schema and operations, the user’s exploration resumes to navigating between instances. Obviously, understanding what is presented, what and the nodes of an RDF graph, sequentially. how it can be manipulated is essential for the user to be able to formulate her question. Thus, querying is an important way for the 3. EXPLORATOR user to increase her knowledge about the schema and data contained in an RDF repository. Direct SPARQL query Explorator 14is an open-source exploratory search tool for RDF formulation, which is allowed in some browsers, still imposes a graphs, implemented in a direct manipulation interface metaphor. higher mental load from the user, even for the more advanced. In It implements a custom model of operations, and also provides a addition, the user often does not have enough knowledge about Query-by-example [18] interface. Additionally, it provides faceted the domain to formulate a query. As seen in Cartaci et al. [5], the navigation over any set obtained during the operations in the raw use of query languages induces the user to make mistakes model that are exposed in the interface. It can be used to explore during writing, considerably increasing the time for query both a SPARQL endpoint as well as an RDF graph in the same formulation and usually being far from the mental model that the way as “traditional” RDF browsers. Its general architecture is user has of the reality. represented in the diagram below: Ding at al. [7] argue that the object of interest is not only the domain schema and instances, but also the source of data, which EXPLORATOR INTERFACE is an import piece of information in the exploratory process. In fact, when we are exploring several repositories, we could want to know from where each piece of information comes from. Marbles EXPLORATOR MODEL and Disco are examples of RDF browsers that track the provenance of the information, helping the user in judging its credibility. REPOSITORIES In summary, current tools allow the user to manipulate raw RDF data and do not provide a user friendly way to ask question. The SEMANTIC WEB DATA user is limited to visualizing the result as aggregate data. Any processing is done manually, and the user has a limited way to rearrange, group or filter the data, and process it further. We will Figure 1. Explorator’s general architecture. discuss later how Explorator can be a step forward in SW data manipulation. At the most elementary level, the user’s task resumes to 2.3 Information Visualization (in the SW) dereferencing an URI or formulating and executing a SPARQL query against a SPARQL Endpoint. In Explorator, every A SW browser navigates along relationships between concepts. At SPARQL Endpoint is a repository, that can be enabled or disabled each step of navigation, in this unknown and semi-structured (in and can be manipulated individually or integrated into a single the sense of schema-less) space, a set of RDF triples is displayed global source of RDF data. The dereferenced URIs are stored in a in the interface. local SESAME 15 repository which can then be queried and Browsers such as Disco, Marbles, Zitgist data viewer, Openlink manipulated as if it were a SPARQL Endpoint. In other words, the RDF Viewer, represent RDF data in a tabular fashion. In Disco’s user always explores a federation of databases, containing SPARQL Endpoints and RDF triples obtained by dereferencing interface, each triple is a line in a two columns table, the specific URIs. navigation is done by clicking on the resources displayed in the interface. Marbles does the same, and groups the values of properties that occur more than once for the same resource. In addition to the tabular presentation, the user has a more refined 14 view of the triples being displayed. As in Disco, for each Explorator information, including a demo interface and the navigation step, the whole content is replaced by a new set of URL of the subversion repository can be accessed at triples retrieved from the dereferenced URI. http://.www.tecweb.inf.puc-rio.br/explorator 15 http://www.openrdf.org/ The set of manipulation operations is limited to the operations _:a foaf:name "Johnny Lee Outlaw" . defined in our processing information model which we will _:a foaf:mbox . _:b foaf:name "Peter Goodguy" . describe next. _:b foaf:mbox . _:c foaf:mbox . 3.1 The Information Processing Model The query above should return all triples. On the other hand, the Exploring a set of information items in the SW is understood here function SPO(∅,{foaf:mbox}, ∅) can be translated to: as a process of transforming resources and triple by successive application of operations. SELECT ?s ?p ?o WHERE { ?s ? p ?o. Filter (p = Our experience in Web application design methods [10, 16] has foaf:mbox)} . shown us that it useful to characterize the user information processing as set of manipulation operations, in what has been This query returns all triples that have the property foaf:mbox. called “set based navigation” [14]. This view is also supported by more recent proposal such as Parallax16. Basically, the user is Consider the more complex example of how this model could be always processing (browsing) information items within a set of used, to solve the task: “find all Russian lakes”: interest; if necessary, this set is further manipulated to either Let S be a function that returns all subjects from a set of triples. remove uninteresting elements or to add additional elements of SPO( interest. S( SPO(∅,{rdf:type},{mondial:Lake}) ), Explorator’s model is composed of two elements: the manipulated items and the manipulation operations. The items are primitive {mondial:locatedIn}, elements in the RDF model: triple, resources, literals, URIs, etc. {mondial:Russia} The operations are grouped in two sets: set operations and search operations. ) We will show in the following sub-sections that this model can The expression above returns all triples that have the property encompass classical browsing, set-based navigation as found in mondial:locateIn with value mondial:Russia. SHDM [10], and faceted browsing, as well as keyword search. It should be noted that, whereas these examples show single valued parameters, in general the parameters for SPO are sets. 3.1.1 Sets The model manipulates two kinds of sets – sets of RDF triples and 3.1.3 Set Operations sets of RDF resources. When dealing with sets of RDF resources, The model allows the user to manipulate items of information the usual set operations, union, intersection and difference are within the RDF domain. Once the user has obtained a set of triples available. Since RDF resources are treated as URIs, blank nodes and resources, she can manipulate them individually, formulate will only be included if they are assigned URIs, as occurs in some new queries, or create new sets. To do so, the model supports the data stores. following set operations: When operating on sets of triples, we interpret the set operations Let A be the set of all triples. as applying to any of the triple components, namely, subjects (S), predicates (P) or objects (O). This is equivalent to projecting a set Union: of triples along one of its three slots. Given two sets M and N, each containing a triple, the union between M and N is the union of triples of M and N. 3.1.2 Search Operation Intersection: As previously stated, there are two ways to access the data in SW: The intersection set I between M and N is the union of the triples dereferencing an URI or querying a SPARQL Endpoint. We in A such that the subject of the triples in I appear in triples in define in our model general query operation, called SPO (S, P, O), both M and N. to be applied to a SPARQL Endpoint. This operation allows the Difference: user to obtain a new set of interest, which can then be processed in the next step in the task. The difference set D between M and N contains the triples in A such that their subjects appear in triples in M and do not appear in The SPO operation has three parameters, all of which are sets: a triples in N . set of subjects, predicates, and objects. This operation is a subset of general SPARQL queries, allowing the user to query an RDF Note that, in this model, the result is always a set of triples, and database by providing an example pattern of the desired set of the operations are always computed on the sets of subjects, triples. predicates or objects of these triples. For example, the function SPO(∅,∅,∅) can be translated into the following SPARQL query: 3.2 Visualizing RDF data with Explorator SELECT ?s ?p ?o WHERE {?s ?p ?o} . In existing RDF browsers, the data are expressed in one of the following metaphors: table, tree or graph. In our approach, the For the following data: interface represents the elements of the underlying exploration @prefix foaf: . model: resources, triples and sets. 16 http://mqlx.com/~david/parallax/index.html interface (ctrl-click) and then click on the union operation to form the corresponding set. The second subdivision, marked as 2, includes the operands for the SPO operation. In this case, the user must select one set, and then click on one of S, P or O. She may also assign another set to one of the other operands (S, P, O). Clicking on “=” produces the result. Clicking on “clear” resets the operands previously selected. 1 2 Figure 4. Operations in Explorator toolbar. The sets are represented as boxes, and stand for both sets of triples or sets of resources. Strictly speaking, all boxes represent sets of triples which can be grouped by subject, property or object. Classes are shown in blue, and RDF properties are shown in green. Figure 5. Sets of triples represented in Explorator’s interface. Figure 2. A set of triples displayed in Explorator. The subject On the left we have all triples with Budapest as subject. On is “Niger”, the properties and values are listed under it. Considering a generic exploration mechanism over the RDF model, the concept of triple, entity and resource are mixed. In Explorator’s interface. The predicates and objects of the triples are nested and right aligned under the subject, thus evidencing the entity represented by the subject of the triple, as shown in the figure 2. Explorator uses the following heuristic to render a resource (or URI) in the interface: • If the resource has a label, name or title property, it renders its value. • Otherwise the URI localname is rendered. In this interface, each element can be manipulated individually. Sets of subjects, predicates and objects can be selected by the user and provided as parameters in the operations described in the model. Dereferencing an URI, or the result of an operation over the model always results in a new set in the interface. In this sense, Explorator incorporates elements of the Direct Manipulation paradigm [17], since the output of an operation may the right we have some triples grouped by subject. be used as input of another, as they are expressed in the same notation. Direct Manipulation is a user-system interaction To select a triple the user simply clicks on the surrounding box, paradigm that allows users to point at visual representations of whose border becomes dashed to indicate the selection. If the user objects and actions to carry out tasks rapidly and observe the double-clicks on a triple, it is interpreted as a request for all triples results immediately. Explorator’s interface follows this paradigm. with the same subject as the subject of the clicked triple. The interface has two main elements, the toolbar and the result sets. The toolbar has a menu giving access to repository 3.3 Faceted Navigation configuration and additional functionalities; a search box; and a group of buttons representing the operations of the model. In addition to the operations already described, we have also defined a model for specifying tailor made facets. This model can be specified using a custom made vocabulary called FACETO, which we do not elaborate here for reasons of space. Figure 3. Explorator toolbar. While many tools implement faceted navigation (FacetMap 17 , Longwell18, BrowseRDF19, Flamenco20, Exhibit21, /facet22 [8] ), The operations menu is divided in two groups, as shown in Figure none allow the specification of facets using RDF. 4. The first area (Fig. 4 - 1) has the set operations: To operate, the user must select the first set among the sets displayed, then click on the operation (union, intersection or difference), then select 17 http://www.facetmap.com/ (click on) another set, and then click on ‘=’. Specifically for 18 union, the user can also click on multiple resources in the http://simile.mit.edu/wiki/Longwell 19 http://browserdf.org/ are several possible ways to achieve this task; one possible way would be as follows: 1. Find all the lakes in the database; 2. Find Russia, the country; 3. Find all the lakes in Russia obtaining a set we will call LR; 4. Find the countries that share a boundary with Russia (Russia’s neighbors); 5. Find all the lakes in Russia’s neighbors, obtaining a set we will call LN; and 6. Build the set of the lakes contained exclusively in Russia by calculating the difference between the previous sets: LR-LN To find all the lakes in the database, the user first searches for “lake”: Figure 6: Explorator’s faceted interface. Using FACETO, the designer may. 1. Specify a facet based on a given RDF property; 2. Specify a facet based on computed values. For example, she may define a “dimension” facet based on the combination of values of the “width” and “height” properties. 3. Define synonyms among different resources that represent the same information. 4. Define a facet as an arbitrary enumeration of values, or as a range. For example, “inexpensive” and “expensive”. 5. Specify a facet based on a hierarchical relation, such as “located in”. She locates the Lake class (in blue) in the resulting set, and gets the set of instances of the Lake class by clicking on it, to obtain all Note also, none of the existing tools can be applied directly to an the lakes in the database: arbitrary SPARQL Endpoint. Using Explorator, the user can facet any set of triples retrieved during her navigation. As an added convenience, we have also implemented an algorithm, based on entropy measures, that given a set of triples, determines the set of properties that is most discriminant for that set, and builds a set of facets based on these properties. Again, due to space limitations, we do not detail this algorithm here. This operator can be activated by clicking on the F* button in the interface of any set. Due SPARQL language limitations (missing of aggregation functions), applying this operation over a SPARQL endpoint may be very time consuming. 3.4 An Example Let us now illustrate the usage of Explorator. Suppose the user needs to find all the lakes contained exclusively in Russia. There 20 http://flamenco.berkeley.edu/ 21 http://simile.mit.edu/exhibit/ 22 http://slashfacet.semanticweb.org/ Next, to find Russia, she searches for “Russia” and locates the Continuing to build the query, she selects the resource Russia and resource Russia in the resulting set: sets it as the object of her query: To make sure she has the right resource, she views the resource details: She executes the query to obtain the set of all lakes in Russia: Next, to find all lakes LR in Russia, she selects the set of all lakes and sets it as the subject of her query by clicking on the [S] Next, to find the countries that share a boundary with Russia, she toolbar button: views the details of the Russia resource and locates the “neighbor” property for Russia, thereby finding its neighboring countries: She then executes the query to find all lakes in Russia’s neighboring countries: To find all the Russian lakes that are also in Russia’s neighbors, she selects the set of Lakes in Russia and sets it as the subject of her next query: Finally, to build the set of the lakes contained exclusively in Russia, she needs to calculate the difference between the set of lakes in Russia and the set of lakes in Russia’s neighbors. To do this, she selects the first set and the difference operator: Finally, she selects the second set (containing the lakes in Russia’s neighbors) and executes the difference operation by clicking on the equal sign [=] toolbar button, thereby obtaining the She selects the set of Russia’s neighbors and sets it as the object desired result: of her query: 4. IMPLEMENTATION In the following we outline our implementation architecture and some notable details. We decided to use a two layer architecture which separates the upper presentation layer from the lower model layer. 4.1 Presentation Layer was very effective in formulation of complex queries over an unknown domain. For the implementation of the proposed interface we adopted the Explorator also allows faceted navigation, and we developed an approach of adding semantic annotations in the HTML code to RDF vocabulary for faceted specification and an algorithm for define interface widgets behavior. To that end, we used the automatic extraction of all facets of a set of triples. Prototype23 library, which allows us to easily navigate the DOM tree, select elements by their class attribute values - using CSS - We have conducted a preliminary study [1] that has shown and link operations to interface events such as onclick, encouraging results. Users with only basic knowledge of RDF onmouseover, onkeyup, etc.. This technique enables us to create were able to elaborate nontrivial queries with Explorator. We very dynamic interfaces for direct manipulation with continuous realized that Explorator’s performance (query execution time) had representation, incremental actions and feedback. Also, all users a negative impact on the user experience, especially when requests to the server are made using Ajax24, allowing users to accessing remote endpoints. It may be the case that users explored continue to explore data while their request are being processed. less because of the time it took to compute the queries. In fact, the time consumption is demanded by the SPARQL datastores, which are still in early stages, especially when compared to relational DBMSs. This issue is of the utmost importance and is being 4.2 Model Layer addressed for future versions. Not surprisingly, the experiments showed us that Explorator is The model layer can be summed up in the picture better suited to advanced users who have solid knowledge about below: RDF. Nevertheless, the experiments were brief, so we cannot yet draw any conclusions about Explorator’s learning curve. Preliminary evidence indicates that once the initial difficulty is EXPLORATOR MODEL overcome, users can become quite proficient with the system. The next step in our study will be to investigate the use of ACTIVERDF Explorator as an epistemic tool, for users to understand more about the represented data domain, as opposed to performing predefined tasks and answering specific questions. In particular, RDF DATABASE an open hypothesis is the adequacy of the RDF model to match the user’s mental models – some of the collected evidence suggests that it might be too low level, which means suitable Figure 7. Explorator model architecture abstractions might have to be introduced. Exposing Explorator’s operation model to naïve users is still a challenge which is the We used the ActiveRDF [13] framework as a layer for translating subject of ongoing research. the Explorator model to the RDF model. Basically, we used the ActiveRDF to generate SPARQL queries from our model. The set Additional larger-scale experiments should be conducted to operations are performed on Ruby objects because the ActiveRDF compare different user interface alternatives and interaction and SPARQL do not support those operations natively. The query paradigms to better support both novice and expert users in and cache mechanism of ActiveRDF were modified to better exploring the semantic web. To do so, Explorator can be support integration with Explorator’s model. instrumented to remotely capture the users’ actions at the user interface and on the underlying processing model. The default dereferencing mechanism implemented is quite simple: it simply retrieves and loads all triples retrieved from the As future work, we will extend the model to support the definition URI into a SESAME repository. No inference or recursive of parameterized sets, i.e., sets derived from parameterized dereferencing heuristic is applied. As a result of this approach, the operations. Following the QBE paradigm, the user will be able to user can explore the triples retrieved along the direct URI select any set in the interface, and indicate which should be the navigation as a SPARQL Endpoint. parameters. Once this has been done, the user can then plug the output of a box as the input of another box (set), thus establishing a graph of inter-related operations, much like a spreadsheet. Such 5. CONCLUSION parameterized sets can be saved to libraries, to be later reused by any user. Exploratory search is a data exploration technique that supports complex user’s tasks involving lookup as well as learning and Explorator needs some improvements related to the dereferencing investigation. We have shown how this technique can be heuristics. Also, we are working on some mechanisms to enable employed for arbitrary RDF databases. We have developed an exporting RDF, and for enabling alternative views to allow the information-processing model that supports the tasks in the user to visualize the resources and triples in table, timetables and Semantic Web that not only consist of a searching for a known maps, as well as in customized domain-dependent formats. item, but also consists of acquisition and assimilation of In summary, Explorator’s contributions are: knowledge and concepts in an RDF database. This model has been implemented in a tool called Explorator. We use the direct • An information exploration model for RDF based on manipulation metaphor in the construction of the interface, which facet and set navigation; • An exploration environment that allows query formulation by direct manipulation, allowing remote 23 and local SPARQL endpoints exploration; http://www.prototypejs.org/ 24 • Automatic facet generation for given sets of RDF http://ajaxpatterns.org/ triples; • A facet specification vocabulary and corresponding [13] Oren E.,Delbru R., Gerke S., Haller A., Decker S. implementation within the tool (not shown in this ActiveRDF: ObjectOriented Semantic Web Programming. paper). Digital Enterprise Research Institute National University of Explorator is an open source project and can be accessed at Ireland, Galway Galway, Ireland. 2007 http://www.tecweb.inf.puc-rio.br/explorator. [14] ROSSI, G.; SCHWABE, D.; LYARDET, F.; "Patterns for Designing Navigable Spaces", Proceedings of PLoP98 (Tech Report TR #WUCS-98-25, Washington University, St. ACKNOWLEDGMENT. Daniel Schwabe was partially Louis, MO, USA), Monticello, Illinois, USA, August 1998. supported by a grant from CNPq. [15] Russell, A., Smart, P. R., Braines, D. and Shadbolt, N. R. 6. REFERENCES (2008). NITELIGHT: A Graphical Tool for Semantic Query Construction. In: Semantic Web User Interaction Workshop (SWUI 2008), 5th April, Florence, Italy. 2008. [1] Araújo F. C. S.; Schwabe D.; Barbosa D. J. S. Experimenting [16] Schwabe, D., Rossi, G.: An object-oriented approach to web- with Explorator: a Direct Manipulation Generic RDF based application design. Theory and Practice of Object Browser and Querying Tool. Visual Interfaces to the Social Systems (TAPOS), Special Issue on the Internet, v. 4#4, and the Semantic Web. VISSW 2009. Sanibel Island, Florida October, 1998, 207-225. February 2009 (http://www.smart- [17] Shneiderman, Ben, Direct manipulation: a step beyond ui.org/events/vissw2009/index.html) programming languages. IEEE Computer 16,8 (August [2] Baldonado M. Q. W., Winograd T. SenseMaker: An 1983), 57-69. Information-Exploration Interface Supporting the Contextual [18] Zloof, M. M., 1977. Query-by-example: a database language. Evolution of a User’s Interests. 1996 IBM System Journal 16, 324-343, 1977. [3] Berners-Lee T., Chen Y., Chilton L., Connolly D., Dhanaraj R.,Hollenbach J, Lerer A., and Sheets D. Tabulator: Exploring and Analyzing linked data on the Semantic Web. Decentralized Information Group. Computer Science and Artificial, Intelligence Laboratory. Massachusetts Institute of Technology. Cambridge, MA, USA. 2006. [4] Best Practice Recipes for Publishing RDF Vocabularies. http://www.w3.org/TR/swbp-vocab-pub/ [5] Catarci, T., Costabile, M. F., Levialdi, S., Batini, C., 1997. Visual Query Systems for Databases: A Survey. Journal of Visual Languages and Computing, 8(2), 215-260, 1997. [6] Dereferencing a URI to RDF. http://esw.w3.org/topic/DereferenceURI [7] Ding L., Zhou L., Finin T., Joshi A. How the Semantic Web is Being Used: An Analysis of FOAF Documents. Proceedings of the 38th Hawaii International Conference on System Sciences – 2005 [8] Hildebrand M., Ossenbruggen J. v. and Hardman L. /facet: A Browser for Heterogeneous Semantic Web Repositories. The 5th International Semantic Web Conference (ISWC). Athens, GA, USA. 2005 [9] Huynh D. F., Karger D. R., Miller R. C.. Exhibit: lightweight structured data publishing. International World Wide Web Conference. Proceedings of the 16th international conference on World Wide Web (WWW). Banff, Alberta, Canada. 2007 [10] Lima, F.; Schwabe, D.: “Application Modeling for the Semantic Web”, Proceedings of LA-Web 2003, Santiago, Chile, Nov. 2003. IEEE Press, pp. 93-102, ISBN (available at http://www.la-web.org). [11] Marchionini G. Exploratory search: From finding to understanding. Comm. Of the ACM, 49(4), 2006. [12] OREN, E.; Delbru, R.; Decker, S. Extending faceted navigation for RDF data. 5th International Semantic Web Conference, Athens, GA, USA, LNCS 4273, p. 5-9. 2006