Experimenting with Explorator: a Direct Manipulation Generic RDF Browser and Querying Tool Samur F. C. de Araújo Daniel Schwabe Simone D.J. Barbosa Informatics Department, PUC-Rio Rua Marques de Sao Vicente, 225 {saraujo, dschwabe, simone}@inf.puc-rio.br + 55 21 3527-1510 ABSTRACT with other RDF browsers is presented in that reference. In this paper we present a preliminary study with In the next section, we argue for the importance of Explorator, a tool for exploring RDF data by direct supporting exploratory search. The third section describes manipulation. Explorator’s visual user interface allows Explorator’s processing model. In the fourth section, we users to explore a semi-structured RDF database to both describe Explorator’s direct manipulation user interface, gain knowledge and answer specific questions about a following the interaction paradigm we deemed more domain, through browsing, search, and exploration adequate for the kinds of manipulation we support. The mechanisms. fifth section describes the user testing studies we conducted, and the final section concludes the paper with a Author Keywords summary of the findings and directions for future work. Exploratory search, semantic browsing, user interface for semantic data exploration, semantic web. EXPLORATORY SEARCH In the hypertext field, search, navigation and browsing are ACM Classification Keywords terms that describe distinct processes of information H5.m. Information interfaces and presentation (e.g., HCI): retrieval. Carmel et al. [2] did an extensive study about the Miscellaneous. cognitive process of browsing and searching, and based on it we will draw the following distinctions. INTRODUCTION  Search is the process of seeking a specific known piece of As the volume of information on the Web increases information. considerably, we need better tools to help us discover and make sense of the available information, as well as to seek  Browsing is the process of investigating a vast collection answers to specific questions we may have. of information items in a superficial and not oriented way. This paper presents a preliminary study with Explorator [5], a direct manipulation tool we have  Navigation is the oriented process to access, view or developed to support the exploration of semi-structured select a number of information items. RDF databases. Our goal is to support the users in We call information exploration the process of seeking, discovering and understanding a domain, as well as in learning about, and investigating a (potentially large) answering specific questions about the domain through collection of information items through search, browsing or browsing, search, and exploration. The work reported here navigation, but not excluding other forms, in order to extends the results described in [1] by describing additional discover something new. experiments and corresponding lessons learned. In particular, comparisons between Explorator and its model The research area called exploratory search [9] has tried to develop solutions that support information exploration. Exploratory search is applicable in situations Permission to make digital or hard copies of all or part of this work for where the user’s task and the search environment have personal or classroom use is granted without fee provided that copies are complex elements that require constant user interpretation not made or distributed for profit or commercial advantage and that copies during the exploration process. For example, how to bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior support the user’s search task when she is not familiar with specific permission and/or a fee. the search domain, or she does not have sufficient VISSW 2009, February 8–, 2009, Sanibel Island, FL, USA. knowledge about domain to make a query; how to support Copyright 2009 held by the authors the navigation in vast information spaces, or when the 1 navigation, searching and browsing are not enough. EXPLORATOR’S PROCESSING MODEL Marchionini [9] made a distinction between exploratory Our experience in Web application design methods [8, 11] search, lookup and search retrieval. According to him, has shown us that it is useful to characterize the user exploratory search is based not only on lookup but also in information processing as set manipulation operations, in investigation and learning. He argues that investigative what has been called “set-based navigation” [8]. This view search and learning search require more human iteration is also supported by more recent working tools such as than a simple lookup, because these are exploratory Parallax1. Basically, the user is processing (browsing) processes that support tasks that require the cognitive and information items within a set of interest; if necessary, this interpretative ability of user. These kinds of tasks are set is further manipulated to either remove uninteresting commonly found in the exploration of RDF databases, elements or to add additional elements of interest to the set. where the users need to identify classes and properties from the schema, in order to understand concepts, acquire We will show in the following subsections that this knowledge and learn about the domain. In order to provide model can encompass classical browsing, set-based the user with an exploratory search tool that supports navigation as found in SHDM [8], and faceted browsing learning and investigative search on the semantic web, we [10], as well as keyword search. The model has been more focused on three inter-related aspects: extensively described in an accompanying paper [1], and is only briefly presented here to facilitate the understanding of  Information search (how semantic data is found), the studies we have conducted.  Information manipulation (how semantic data is used), Sets  Information visualization (how semantic data is The model manipulates two kinds of sets: sets of RDF presented). triples and sets of RDF resources. For sets of RDF resources, the usual set operations —union, intersection and Understanding Semantic Data difference— are available. Since RDF resources are treated The typical challenge when accessing an RDF repository is as URIs, blank nodes will only be included if they are how do users make sense of the available data? At what assigned to URIs, as occurs for some data stores. level of abstraction do they think of that information? When operating on sets of triples, we interpret the set Research in Cognitive Science has shown that people’s operations as applying to any of the triple components, bodily experience and the way we use imaginative namely, subjects (S), predicates (P) or objects (O). This is mechanisms are central to how we construct categories to equivalent to projecting a set of triples along one of its three make sense of experience [7]. Eleanor Rosch (apud Lakoff positions, as illustrated in Figure 1. In the remainder of the [7]) proposed that thought, in general, is organized in terms paper, each position will also be called a role in a triple. of prototypes and basic-level categories. We follow on their footsteps and hypothesize that A people, when exploring an information space in the S P O semantic web, focus not on sentences that describe the properties of the entities in the database, but on the entities that play the roles of subjects and objects in those sentences, especially entities that would be considered members of the basic-level categories implicit in the T database schema. As such, our user interface privileges the visualization and manipulation of such entities, as will be seen in the fifth section. In other words, entities would be equivalent to resources in RDF that denote “things” that people conceptualize in order to solve tasks. An important caveat of our work at this point in our research is that we are first focusing on people who have Figure 1. Triple (T), sets of resources (S, P, and O), and set of some knowledge of the RDF data structure, and triples (A). investigating whether they are able to explore the semantic A triple is denoted by (s,p,o), where s, p, and o are space by means of the kinds of queries and operations resources. Let A be a set of triples. The set R of resources allowed by the proposed model describe next. With positive of A can be given as: results at this step, we shall then proceed to provide a more adequate user interface for those unfamiliar to RDF as well. 1 http://mqlx.com/~david/parallax/index.html R = S ∪ P ∪ O : ∀s,p,o (s,p,o) ∈ A and s ∈ S and p ∈ P and o The query above should return all triples. On the other ∈ O. hand, the function SPO(∅,{foaf:mbox}, ∅) can be translated into: Given the triple set A, we also have the following functions: SELECT ?s ?p ?o WHERE { ?s ? p ?o. Filter (p = foaf:mbox)} . S = R (A) = {x ∈ S | ∃ p,o:(x,p,o)∈ A and p,o ∈ R} And this query returns all triples that have the property s P = R (A) = {x ∈ P | ∃ s,o:(s,x,o)∈ A and s,o ∈ R} foaf:mbox. p O = R (A) = {x ∈ O | ∃ s,p:(s,p,x)∈ A and s,p ∈ R} It is important to note that, although in SPARQL we o Where S is the set of all subjects, P is the set of all the cannot pass arrays of resources to a query, our SPO predicates and O is the set of all objects in the triples of A. function works with either single resources or sets of resources. Semantic Operations Set Operations Given a set of triples A, a set of resources R, and subsets S, P, and O of R (S ⊆ R, P ⊆ R, O ⊆ R), we can define the The model supports the following set operations: SPO function as follows: Let V = {s,p,o}; v, v’ ∈ V  the set of all triples in A: Let UR ={x ∈ UR | x ∈ Rv(M) or x ∈ Rv’(N) } SPO(∅,∅,∅) = {(s,p,o) ∈ A | s,p,o ∈ R} U = (M,v) ∪ (N,v’) ≡ SPO(UR,∅,∅)  the set of only the triples in A whose subject is in S: Let IR ={x ∈ IR | x ∈ Rv(M) and x ∈ Rv’(N) } SPO(S,∅,∅) = {(s,p,o) ∈ A | s ∈ S and p,o ∈ R} I = (M,v) ∩ (N,v’) ≡ SPO(IR,∅,∅)  the set of only the triples in A whose predicate is in P: Let DR ={x ∈ DR | x ∈ Rv(M) and x ∉ Rv’(N) } SPO(∅,P,∅) = {(s,p,o) ∈ A | p ∈ P and s,o ∈ R} D = (M,v) – (N,v’) ≡ SPO(DR,∅,∅)  the set of only the triples in A whose object is in P: The union, intersection and difference operations are SPO(∅,∅,O) = {(s,p,o) ∈ A | s,p ∈ R and o ∈ O} calculated over sets of resources playing a certain role (v or v’) in a triple. For instance, (M,o) = Ro(M), i.e., represents  the set of only the triples in A whose subject is in S and all resources that play the role of object in the M set of predicate is in P: triples. The operation is calculated over these sets and then SPO(S,P,∅) = {(s,p,o) ∈ A | s ∈ S and p ∈ P and o ∈ R} resulting on the triples where the resulting set plays the role of the subject.  the set of only the triples in A whose subject is in S and object is in O: A simple example of how this model could be used to solve the task “find all Russian lakes” is as follows: SPO(S,∅,O) = {(s, p,o) ∈ A | s ∈ S and p ∈ R and o ∈ O} SPO(R(SPO(∅,∅,{mondial:Lake}),s),  the set of only the triples in A whose predicate is in P and ∅, object is in O: R(SPO(∅, ∅,{'Russia'}),s)) SPO(∅,P,O) = {(s, p,o) ∈ A | s ∈ R and p ∈ P and o ∈ O} or  the set of only the triples in A whose subject is in S, SPO(R(SPO(∅,∅, { mondial:Lake}),s), ∅, {mondial:Russia}) predicate is in P, and object is in set O: The following section presents Explorator’s direct SPO(S,P,O) = {(s, p,o) ∈ A | s ∈ S and p ∈ P and o ∈ O} manipulation interface and shows how it keeps the users in The function SPO(∅,∅,∅) can be translated into the control of their searching, browsing, navigating, and overall following SPARQL query: exploration of the RDF database. SELECT ?s ?p ?o WHERE {?s ?p ?o} . EXPLORATOR’S DIRECT MANIPULATION USER For the following data: INTERFACE Direct manipulation is a user-system interaction paradigm @prefix foaf: . that allows users to point at visual representations of objects _:a foaf:name "Johnny Lee Outlaw" . _:a foaf:mbox . and actions to carry out tasks rapidly and observe the results _:b foaf:name "Peter Goodguy" . immediately [13]. The direct manipulation paradigm mainly _:b foaf:mbox . consists of: _:c foaf:mbox .  visual presentation of the world of action: show users the available objects and actions;  rapid, incremental, and reversible actions; 3  selection by pointing, not typing; and  selecting multiple resources, by ctrl+clicking on them;  continuous visual display of status.  selecting a binary operation over two sets of resources — union, intersection, and difference—, by clicking on the In argument for direct manipulation, Shneiderman[13] corresponding toolbar button; states that first time users “are struggling to understand what they see on the display while keeping in mind their  assigning a role —S, P or O— to a set of resources in an information needs. They would be distracted if they had to SPO query, by clicking on the corresponding toolbar learn complex query languages or elaborate shape-coding button; rules” [13:511].  calculating the operation result, by clicking on the [=] Shneiderman lists the following high-level tasks for toolbar button; and open-ended browsing of known collections and exploration  changing the visualization of a set of resources, e.g. of the availability of information on a topic: grouping them by one of the roles (S, P, O), expanding or  specific fact finding (known-item search), e.g, Find the collapsing all the triples in the set, and so on. These country named Russia; changes in visualization are made by clicking on toolbar buttons on the corresponding set pane.  extended fact finding, e.g., What are the neighboring countries of Russia? Whereas the actual result of any of the above operations is a set of triples, the visual presentation is a set of resources.  open-ended browsing, e.g., Is there information about the This is achieved by grouping these triples by one of the past presidents of each country? roles (S, P, O), and hiding the other triple elements until the  exploration of availability, e.g., What geographic user expands the corresponding interface widget (Figure 3). information is available for Brazil? Empirical studies show that users perform better and have higher subjective satisfaction when they can view and control the search [9]. This was one of Explorator’s main goals: to put users in control of their queries, and provide immediate feedback to their actions. Figure 2 illustrates the Explorator user interface: Figure 3. Two views of the resource Country: collapsed on the left, expanded on the right. Sample Scenario Let us now illustrate the usage of Explorator. Suppose a geographer called David needs to find all the lakes contained exclusively in Russia (and not in any other country). There are several possible ways to achieve this Figure 2. Snapshot of Explorator’s interface. task; on possible way would be as follows: To empower users in their exploration tasks, Explorator 1. Find all the lakes in the database; supports the following operations at the user interface2: 2. Find Russia, the country;  searching for all resources containing a given string (using the search box in the toolbar); 3. Find all the lakes in Russia obtaining a set we will call LR;  selecting a resource (e.g. Russia), by clicking on it; 4. Find the countries that share a boundary with Russia  detailing a resource, by double-clicking on it to reveal all (Russia’s neighbors); its properties, by showing all the triples where the resource is the subject; 5. Find all the lakes in Russia’s neighbors, obtaining a set we will call LN; and 2 6. Build the set of the lakes contained exclusively in Additional operations are supported, such as faceted Russia by calculating the difference between the navigation, among others. We present here only the previous sets: LR-LN operations that are relevant to the described studies. To find all the lakes in the database, David first searches for “lake”: He locates the Lake class in the resulting set, and gets the Next, to find all the lakes LR in Russia, he selects the set of set of instances of the Lake class by clicking on the all lakes and sets it as the subject of his query by clicking Instances link, to obtain all the lakes in the database: on the [S] toolbar button: Continuing to build the query, he selects the resource Russia and sets it as the object of his query: Next, to find Russia, he searches for “Russia” and locates the resource Russia in the resulting set: He executes the query to obtain the set of all lakes in Russia: To make sure he has the right resource, David views the resource details: 5 Next, to find the countries that share a boundary with He then executes the query to find all lakes in Russia’s Russia, he views the details of the Russia resource and neighboring countries: locates the neighbor property in Russia, thereby finding its neighboring countries: To find all the lakes in Russia’s neighbors, he selects the set Finally, to build the set of the lakes contained exclusively in of Lakes in Russia and sets it as the subject of his next Russia, he needs to calculate the difference between the set query: of lakes in Russia and the set of lakes in Russia’s neighbors. To do this, he selects the first set and the difference operator: He selects the set of Russia’s neighbors and sets it as the object of his query: Finally, he selects the second set (containing the lakes in Russia’s neighbors) and executes the difference operation by clicking on the equal sign [=] toolbar button, thereby obtaining the desired result: 7. For each action I took in the system, I obtained exactly what I expected. When tabulating the results, we grouped the 2 most positive answers as “agree”, and the 3 most negative answers as “disagree”, obtaining the averages depicted in the following table: Question Agree Disagree 1 90.91% 9.09% 2 36,36% 63,64% 3 90.91% 9.09% 4 50.00% 50.00% USER TESTING 5 40.91% 59.09% 6 We have conducted a pilot study and a small-scale 86.36% 13.64% experiment with Explorator to better understand the role, 7 50.00% 50.00% benefits and challenges of such a general-purpose semantic data exploration tool. In parallel with the pilot study, we inspected the Pilot study Explorator’s user interface. As a result of this inspection, we have decided to make some changes in the user Six users were recruited who knew some basic concepts of the semantic web and RDF, such as the representation in interface, to make it more consistent and less cluttered. The triples. They were provided an instructions script resulting user interface is the one reported in this paper, and containing a few examples illustrating the tool usage to is also the version used in the experiment described next. perform simple queries. Regarding the study planning, the pilot study revealed that it was too early to collect opinions about the system as in After going through the script, users were asked to the proposed Likert scale. Consequently, we revised the perform a set of tasks using Explorator. Tasks 1 and 2 were study methodology to adopt a more qualitative approach in performed on a database of cell phone handsets, whereas which we are able to gain more insight on the underlying tasks 3 and 4 were performed on a database of geopolitical motives of the users’ actions, leaving a more quantitative data, similar to the “CIA World Factbook”. study for later stages in the research.  Task 1: form the set of all handsets made for Latin Small-scale experiment America that also have a WAP 2.0 browser, using the faceted navigation mechanism offered by Explorator. Due to the necessarily exploratory nature of the study at this stage, we have conducted a more in-depth qualitative study  Task 2: Same as task 1, but without the faceted [1] with the revised user interface. We asked users to navigation, i.e., using the query-building mechanisms. perform the same set of information exploration tasks using  Task 3: form a set with the names of the capital cities of Explorator as in the pilot study. The users’ interaction with neighboring countries of Tanzania. the system was recorded using screen capture software, and their oral remarks were recorded in audio.  Task 4: form a set with the name of all lakes which are entirely contained within Russia. We have asked users to think aloud while carrying out the tasks, so as to give us insight on their thought processes Having completed each task, they were asked to grade the [4]. At the end of the interactive session, we quickly following sentences in a 5-point Likert scale: interviewed users and posed the following questions: 1. I have perfectly understood the task I had to perform.  Which aspects of the user interface and interaction 2. I found it too easy to use this tool to perform this task. confused you or made you feel insecure about what you were doing and the results you were getting? 3. This kind of system would be very useful in my day-to- day activities.  What would you like to change in Explorator? 4. I perfectly understood how the system works.  What did you like the best in Explorator? 5. I found the interface very easy to use. Four (4) users were recruited who knew some basic concepts of the semantic web and RDF, such as the 6. I noticed I could have performed this task in several representation in triples. alternative ways in this system. 7 Results  All participants began the task 1 searching for a known term. Ex.: “browser”, “wap 2.0”, “Latin America”, During the experiment, we noticed that the participants faced two separate problems in carrying out tasks. The first “Nokia”, etc. We have noticed that the user tends to use problem was related to the domain exploration itself, or the search when looking for a known item. how to discover the domain properties. The second problem  Some participants did not realize they could select the set was related to the participants’ interaction with the user as a whole. interface and with the new widgets proposed.  Users constantly referred to classes when intending to Regarding the first issue, we noticed that all users refer to their instances, as illustrated by the following needed to find out the relations between classes and query: SPO (Lake, locatedIn, Russia). By Lake here the instances to be able to formulate their queries properly. In users actually meant the set of lakes, and not the class that process of domain exploration, all participants tried to itself. retrieve the properties of the instances from their class. For example, some participants expanded the class Country  The participants expected to be able to scroll horizontally expecting to obtain the properties of the instances of as new sets were created. However, the current scroll is Country. However, the semantics of this operation in the vertical and this confused the participants. tool is to display all the triples where the resource is the  Despite the color coding of classes and properties, subject. This might work for some ontologies in which participants recurrently used a class instead of a property “domain” and “range” properties are declared, but this was in SPO queries. However, by the end of the experiment, not the case in the examples. all participants acknowledged such differences and said There was a recurring situation in which the to have made such mistake due to a lack of attention. participants made an intersection between a class and a set  The participants did not identify some clickable elements of instances. Ex: Lake – intersection – {Baikal, Caspian, in the screen. One of them said, “I did not click here New York, Ness, London, Paris}. When asked about what because the hand cursor for the mouse did not appear” they expected, the participants said that they hoped to (P2). We noticed that the mental model of all users obtain the lakes related to those instances. reflected their familiarity with the Windows interface. During the process of learning about the domain, some Therefore, we noticed that the Explorator’s widgets need participants formulated queries such as: SPO(Russia, rdfs: to be explained to users so they can use them correctly. property,?). When asked about this query, the participants  The participants successfully understood the set metaphor said that they hoped to obtain all the properties of Russia. at the user interface, i.e., they understood that each box at There was another recurring situation, in which the user the interface represented a set of resources. thought in Portuguese and literally tried to translate what they had in mind into the SPO operation. A query that CONCLUDING REMARKS indicated this type of reasoning was: SPO(Lake, locatedIn, The preliminary studies have shown encouraging results. Russia). Note, in this case, that the implemented semantics Users with only basic knowledge of RDF were able to is different from the one desired by the user. elaborate nontrivial queries with Explorator. Most participants had difficulties in obtaining the We detected that the user confused the way classes and properties to formulate their queries. We conclude that it is the instances were handled at the user interface. From their vital to have a shortcut in the user interface to obtain the list comments, however, we have realized they had the right of class properties. Note, however, that there actually is a intention, but in this case the user interface got in the way. widget in the interface where the user can view all the This problem led us to a redesign to make it explicit properties of an instance. Nevertheless, this widget was not whether the selection of an element at the user interface accessed, perhaps because this information was not refers to the instances of the class or the class itself, conveyed to the user in the instructions script. maintaining the reference to the instances as the default. Regarding the second issue, we noticed that some However, new experiments must be conducted to verify the visual elements were not intuitive to the participants. They efficiency of this proposed solution. tended to associate the most common interface operations, We also realized that the Explorator’s performance had such as maximize and minimize, with icons that are used a negative impact on the user experience. It may be the case today in the Windows OS, as the following testimony that users explored less because of the time it took to shows: “It would be better if the icon were equal to that of compute the queries. This issue is of the utmost importance Windows” (P1). Also note that we did not provide any and is being addressed for future versions. instructions to the participant about these newly introduced icons. As expected, the experiments showed us that Explorator is better suited to advanced users who have solid Additional observations were as follows: knowledge about RDF. Nevertheless, the experiments were brief, so we cannot yet draw any conclusions about in Second Language Research. Clevedon, Avon: Explorator’s learning curve. Multilingual Matters, 24–54. The next step in our study will be to investigate the use 5. Explorator tool: http://www.tecweb.inf.puc- of Explorator as an epistemic tool, for users to understand rio.br/explorator/demo (this version may already have more about the represented data domain, as opposed to evolved from the one reported in this paper). performing predefined tasks and answering specific 6. Koenemann, J.; Belkin, N.J. 1996. A Case For questions. In particular, an open hypothesis is the adequacy Interaction: A Study of Interactive Information Retrieval of the RDF model to match the user’s mental models – Behavior and Effectiveness. Proceedings of CHI 1996, some of the collected evidence suggests that it might be too pp. 205-212. low level, which means suitable abstractions might have to 7. Lakoff, G. 1987. Women, Fire, and Dangerous Things: be introduced. What Categories Reveal about the Mind. The University Additional larger-scale experiments should be of Chicago Press. conducted to compare different user interface alternatives 8. Lima, F.; Schwabe, D. 2003. Application Modeling for and interaction paradigms to better support both novice and the Semantic Web, Proceedings of LA-Web 2003, expert users in exploring the semantic web. To do so, Santiago, Chile, Nov. 2003. IEEE Press, pp. 93-102, Explorator can be instrumented to remotely capture the ISBN (available at http://www.la-web.org). users’ actions at the user interface and on the underlying processing model. 9. Marchionini G. 2006. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), ACKNOWLEDGMENTS 2006. Daniel Schwabe and Simone Barbosa were partially 10. Oren, E.; Delbru, R.; Decker S. 2006. "Extending supported by grants from CNPq. faceted navigation for RDF data". 5th International Semantic Web Conference, Athens, GA, USA, REFERENCES November 5-9, 2006, LNCS 4273 1. Araujo, S. & Schwabe, D. “Explorator: a tool for 11. Rossi, G.; Schwabe, D.; Lyardet, F. Patterns for exploring RDF data through direct manipulation.” Designing Navigable Spaces. Proceedings of PLoP98 Submitted to WWW 2009. (Tech Report TR #WUCS-98-25, Washington 2. Carmel E., Crawford S. e Chen H. 1992. Browsing in University, St. Louis, MO, USA), Monticello, Illinois, Hypertext: A Cognitive Study. IEEE Transactions on USA, August 1998. Systems, Man, and Cybernetics, vol. 22. no. 5, Sep/Oct 12. Schwabe, D.; Rossi, G. 1998. An object-oriented 1992. approach to web-based application design. Theory and 3. Denzin, N.K.; Lincoln, Y.S. 2005. Introduction: The Practice of Object Systems (TAPOS), Special Issue on Discipline and Practice of Qualitative Research. In N.K. the Internet, v. 4#4, October, 1998, 207-225. Denzin & Y.S. Lincoln (eds.) The SAGE Handbook of 13. Shneiderman, B. 1998. Designing the User Interface: Qualitative Research, 3rd edition. Thousand Oaks: Strategies for Effective Human-Computer Interaction, SAGE Publications. 3rd edition. Reading, MA: Addison-Wesley. 4. Ericsson, K.; Simon, H. (1987). Verbal reports on thinking, in C. Faerch & G. Kasper (eds.): Introspection 9