=Paper= {{Paper |id=Vol-538/paper-2 |storemode=property |title=Explorator: A tool for exploring RDF data through direct manipulation |pdfUrl=https://ceur-ws.org/Vol-538/ldow2009_paper2.pdf |volume=Vol-538 |dblpUrl=https://dblp.org/rec/conf/www/AraujoS09 }} ==Explorator: A tool for exploring RDF data through direct manipulation== https://ceur-ws.org/Vol-538/ldow2009_paper2.pdf
   Explorator: a tool for exploring RDF data through direct
                         manipulation.
      Samur F. C. de Araújo                                                                            Daniel Schwabe
 Catholic University of Rio de Janeiro                                                         Catholic University of Rio de Janeiro
         R. M. S. Vicente 225                                                                          R. M. S. Vicente 225
  Gávea, Rio de Janeiro, RJ, Brazil                                                             Gávea, Rio de Janeiro, RJ, Brazil
          +55 21 3527-1500                                                                              +55 21 8241-4313
       saraujo@inf.puc-rio.br                                                                    dschwabe@inf.puc-rio.br



ABSTRACT                                                              investigating and learning about a set of data without a-priori
                                                                      knowledge of its domain. This data is expressed in RDF1, and is
In this paper we introduce Explorator, a tool for exploring the       typically stored in very large interconnected databases, without a
Semantic Web data by direct manipulation. Explorator                  homogeneous schema. The exploration mechanisms currently
implements a model of operations that is supported by a visual        available are not sufficient to accomplish the user tasks in the SW.
interface that enables the user, with minimal knowledge of RDF        Keyword search, e.g. Sindice2, only addresses simple information
model, to explore an RDF database without a-priori knowledge of       lookup. Explicitly formulated queries, e.g. iSparql 3 , requires
data domain. Consequently, it is well suited for tasks that involve   schema and technical knowledge from the users. Semantic
information search, exploration and visualization.                    browsers, e.g. Tabulator [3], are not designed to explore huge
                                                                      datasets and semantic faceted browsing, e.g. BrowseRDF [12], is
Categories and Subject Descriptors                                    inefficient for fact-finding or known-item retrieval and some more
H.5.3 Web-based interaction; H.5.4 Hypertext/Hypermedia -             complex exploratory tasks.
Navigation, H.3.3 Information Search and Retrieval – search
                                                                      In this paper we will describe a model for representing
process, query formulation.
                                                                      information processing by users in exploratory tasks, and
                                                                      Explorator tool, which provides a browser interface supporting
General Terms                                                         this model. Explorator is based on the metaphor of direct
Algorithms, Design, Experimentation,            Human     Factors,    manipulation of information on the interface, with immediate
Languages, Theory, Verification.                                      feedback of user actions. The remainder of the paper is organized
                                                                      as follows. Section 2 defines more precisely the exploratory
Keywords                                                              search itself; Section 3 presents the information processing model;
RDF, exploratory search, exploration, ontology, semantic web.         describes Explorator tool and its interface; Section 4 we present
                                                                      some details of its implementation; Section 5 presents some
                                                                      conclusions and directions for further work.
1. INTRODUCTION
As the volume of information on the Web increases considerably,
                                                                      2. EXPLORATORY SEARCH
we need better tools to help us discover and make sense of the        In the hypertext field, we call information exploration the process
available information, as well as to seek answers to specific         of seeking, learning about and investigating a (potentially large)
questions we may have.                                                collection of information items through search, browsing or
Currently, seeking information is a task that permeates most          navigation, but not excluding other forms, in order to discover
activities we develop in our day-to-day. Depending on the type of     something new.
activity we perform, we use different strategies and tactics to       Research in the area called exploratory search [11] has tried to
search for information. In the web, these tactics are supported by    develop solutions that support information exploration.
computational tools such as keyword search, navigation and            Exploratory search is applicable in situations where the user’ task
browsing [11]. But the process of seeking information is not          and the search environment have complex elements that require
simply finding it, we must keep in mind that the task of the user     constant user interpretation during the exploration process. For
ranges from simply searching for a known item to activities such      example, how to support the user’s search task when she is not
as knowledge acquisition, understanding of concepts, discovery,       familiar with the search domain, or she does not have sufficient
planning, transforming, etc. [11]                                     knowledge about the domain to make a query; how to support the
A more recent development has been the Semantic Web (SW),             navigation in vast information spaces, or when the navigation,
and the rapidly growing amount of semantically annotated data         searching and browsing are not enough. In other words, how to
leads to the need to support not only for searching, but also for
                                                                      1
                                                                          RDF – Resource Description Framework
 Copyright is held by the author/owner(s).                            2
                                                                          http://sindice.com
 LDOW 2009, April 20-24, 2009, Madrid, Spain.                         3
                                                                          iSparql can be accessed at http://demo.openlinksw.com/isparql/
take into account all aspects [2, 7, 11] that influence the             able to extract semantic annotations from HTML pages obtained
exploration process: the user’s task, the user’s context, the user’s    from URIs that cannot be dereferenced as an RDF file, using
profile, the environment, the information provenance, etc.              GRDLL. In spite of distinct dereferencing processes being able to
Marchionini [11] made a distinction between exploratory search,         retrieve different amounts of information, the process itself does
lookup and search retrieval. According to him, exploratory search       not improve the nature of tasks performed in these tools. In fact,
is based not only on lookup but also on investigation and learning.     the set of exploration tasks are limited to navigation between sub-
He argues that investigative search and learning search require         graphs by clicking on the resources displayed in the interface and
more human iteration than a simple lookup, because these are            dereferencing the corresponding URIs.
exploratory processes that support tasks that require the cognitive     Another way to access SW data is by querying a SPARQL
and interpretative ability of user. These kinds of tasks are            Endpoint that receives a SPARQL12 query and returns a set of
commonly found in the exploration of RDF databases, where the           RDF resources described in XML notation. There are a few tools
users need to identify classes and properties from the schema, in       that allow us to explore a SPARQL Endpoint. NITELIGHT [15]
order to understand concepts, acquire knowledge and learn about         and iSPARQL13 are Visual Query Systems (VQS) [5] which allow
the domain.                                                             visual construction of SPARQL queries, differing mainly in the
Berners-Lee et al. [3] argue that once the information sought is        visual notation employed. It is understood that to use these tools
found, it may be necessary to analyze it. According to their            the user must have a full comprehension of the underlying RDF
description, exploration and analysis are distinct processes that are   schema and the query language syntax, therefore leading to a high
inter-related during the user’s task. In our point of view, the         cognitive load for newcomers and less experienced users.
process of exploration involves both finding a piece of                 Tabulator also provides a way to query its data using SPARQL by
information and investigating or learning about its domain,             providing an interface in which the user can formulate a query
because it is guided by the need to perform a task. The cognitive       based on the selection of the elements of the RDF graph displayed
process of analysis permeates the entire exploratory task, since        on the interface. However, more complex queries need to be
while browsing, the user creates an expectation of what she will        edited manually, exposing the user to some of the issues cited
obtain, she sees what has been achieved and uses this information       before.
to guide her in the next step.                                          Some tools address a different goal in the process of accessing
In order to provide to the user an exploratory search tool that         SW data. Instead of focusing on access to RDF data, they focus on
supports learning and investigative search on SW, we focused on         how to consume RDF data. Exhibit [9] is a lightweight structured
three fronts:                                                           data publishing tool that can be used to export small collection of
                                                                        RDF data. This tool accomplishes an important role on the SW,
       •    Information search (how semantic data is found on the       by publishing content from different sources on the Web.
            Semantic Web),
                                                                        Taking all this into consideration, we can see there are no tools
       •    Information usage (how semantic data is used on the         adequate to explore the semantic web as a whole. Currently, the
            Semantic Web),                                              browsers and SPARQL query builders are addressing different
       •    Information visualization (how semantic data is             goals, and were designed for different kinds of users. In order to
            presented on the Semantic Web).                             provide a complete and integrated exploratory search mechanism
                                                                        to access the SW data, we are proposing Explorator.
2.1 Information Search (in the SW)
                                                                        2.2 Information Usage (in the SW)
Nowadays, we can access the SW data in three different manners:
through a SPARQL Endpoint4, through an URI, or by processing            The RDF model provides a format for data, information, and
semantically annotated HTML pages (e.g. Microformats 5 or               knowledge exchange. However, the repositories of data are
RDFa6). There are tools which can explore the SW directly, such         scattered on the SW, which demand a unified mechanism to
as semantic web browsers, such as Tabulator [3], Disco7, Zitgist        access them. Many information-intensive human tasks demand the
data viewer 8 , Marbles 9 , ObjectViewer 10 and Openlink RDF            manipulation of multiple pieces of information. In a SW
Browser11.                                                              exploration tool, at a low level, the objects manipulated are RDF
                                                                        data (resources, triples, literals, properties, etc) and queries. These
These tools all implement a similar exploration strategy, allowing
                                                                        are the information items being manipulated when using an RDF
the user to visualize an RDF sub-graph in a tabular fashion or in a
                                                                        browser.
more “visual” way (e.g., map views or timelines) when
applicable. The sub-graph is obtained by dereferencing [4, 6] an        Consider the SW user looking for all papers mentioning another
URI and each tool uses a distinct approach for this. Tabulator is       paper; or all paper authors’ phone numbers. The user may
                                                                        encounter different data architectures while performing such
4
                                                                        tasks. For example, the information sought may be stored in
    http://www.w3.org/TR/rdf-sparql-protocol/                           multiple RDF files or in a single large RDF repository, and
5
    http://microformats.org/                                            expressed in distinct vocabularies. It is crucial that any
6                                                                       exploratory tool be able to consolidate the information to be
    http://www.w3.org/TR/xhtml-rdfa-primer/                             accessed in an integrated way. The user should be able to merge
7
    http://www4.wiwiss.fu-berlin.de/bizer/ng4j/disco/                   information described in different vocabularies, at least by
8
    http://dataviewer.zitgist.com/                                      directly manipulating each piece of information. For example,
9
    http://beckr.org/marbles
10                                                                      12
     http://objectviewer.semwebcentral.org/                                  http://www.w3.org/TR/rdf-sparql-query/
11                                                                      13
     http://demo.openlinksw.com/rdfbrowser/index.html                        iSparql can be accessed at http://demo.openlinksw.com/isparql/
suppose she is looking for all email addresses by dereferencing         Tabulator’s more general view represents the information in a tree
four different URIs, each one returning triples expressed in a          structure. As the user selects a resource in the interface, a new
distinct vocabulary. Even if she could see all the data together, she   node is added to the tree, thus recording user’s navigation process
would not be able to manipulate this set of information to obtain a     in the interface. The authors argue that it is comfortable for the
unique final set of email addresses, only by using current RDF          user to see the information in a tree-oriented interface, due to
browsers’ functionality.                                                familiarity with other sources of data are also represented in a
Some of these browsers, like Openlink RDF Browser, cache all            hierarchical structure. The authors also proposed a model of views
RDF data during the user’s navigation. Therefore, the user can          to be applied when the domain is known. A view oriented towards
treat pieces of information from different sources as coming from       a specific domain improves the understanding of the instances
a unique repository. However, the user cannot issue a query on the      being explored. For example, it is better to see geographic
results, which limits the kinds of tasks supported. For example, it     coordinates on a map than in a table.
is very difficult to obtain the homepage address for all people         From the user's task point of view, the representation of
known to someone, as reported in their FOAF profile, by using           information helps its assimilation, but it does not expand the kinds
one of the RDF Browsers mentioned earlier.                              of tasks that can be done. What we have observed so far is that
From the user’s task point of view, exploring the SW involves           without a proper model of exploration, involving well-defined
asking questions and getting answers about the schema and               operations, the user’s exploration resumes to navigating between
instances. Obviously, understanding what is presented, what and         the nodes of an RDF graph, sequentially.
how it can be manipulated is essential for the user to be able to
formulate her question. Thus, querying is an important way for the      3. EXPLORATOR
user to increase her knowledge about the schema and data
contained in an RDF repository. Direct SPARQL query                     Explorator 14is an open-source exploratory search tool for RDF
formulation, which is allowed in some browsers, still imposes a         graphs, implemented in a direct manipulation interface metaphor.
higher mental load from the user, even for the more advanced. In        It implements a custom model of operations, and also provides a
addition, the user often does not have enough knowledge about           Query-by-example [18] interface. Additionally, it provides faceted
the domain to formulate a query. As seen in Cartaci et al. [5], the     navigation over any set obtained during the operations in the
raw use of query languages induces the user to make mistakes            model that are exposed in the interface. It can be used to explore
during writing, considerably increasing the time for query              both a SPARQL endpoint as well as an RDF graph in the same
formulation and usually being far from the mental model that the        way as “traditional” RDF browsers. Its general architecture is
user has of the reality.                                                represented in the diagram below:

Ding at al. [7] argue that the object of interest is not only the
domain schema and instances, but also the source of data, which
                                                                                           EXPLORATOR INTERFACE
is an import piece of information in the exploratory process. In
fact, when we are exploring several repositories, we could want to
know from where each piece of information comes from. Marbles                                 EXPLORATOR MODEL
and Disco are examples of RDF browsers that track the
provenance of the information, helping the user in judging its
credibility.                                                                                      REPOSITORIES

In summary, current tools allow the user to manipulate raw RDF
data and do not provide a user friendly way to ask question. The                             SEMANTIC WEB DATA
user is limited to visualizing the result as aggregate data. Any
processing is done manually, and the user has a limited way to
rearrange, group or filter the data, and process it further. We will    Figure 1. Explorator’s general architecture.
discuss later how Explorator can be a step forward in SW data
manipulation.
                                                                        At the most elementary level, the user’s task resumes to
2.3 Information Visualization (in the SW)                               dereferencing an URI or formulating and executing a SPARQL
                                                                        query against a SPARQL Endpoint. In Explorator, every
A SW browser navigates along relationships between concepts. At         SPARQL Endpoint is a repository, that can be enabled or disabled
each step of navigation, in this unknown and semi-structured (in        and can be manipulated individually or integrated into a single
the sense of schema-less) space, a set of RDF triples is displayed      global source of RDF data. The dereferenced URIs are stored in a
in the interface.                                                       local SESAME 15 repository which can then be queried and
Browsers such as Disco, Marbles, Zitgist data viewer, Openlink          manipulated as if it were a SPARQL Endpoint. In other words, the
RDF Viewer, represent RDF data in a tabular fashion. In Disco’s         user always explores a federation of databases, containing
                                                                        SPARQL Endpoints and RDF triples obtained by dereferencing
interface, each triple is a line in a two columns table, the
                                                                        specific URIs.
navigation is done by clicking on the resources displayed in the
interface. Marbles does the same, and groups the values of
properties that occur more than once for the same resource. In
addition to the tabular presentation, the user has a more refined       14
view of the triples being displayed. As in Disco, for each                   Explorator information, including a demo interface and the
navigation step, the whole content is replaced by a new set of               URL of the subversion repository can be accessed at
triples retrieved from the dereferenced URI.                                 http://.www.tecweb.inf.puc-rio.br/explorator
                                                                        15
                                                                             http://www.openrdf.org/
The set of manipulation operations is limited to the operations             _:a   foaf:name        "Johnny Lee Outlaw" .
defined in our processing information model which we will                   _:a   foaf:mbox         .
                                                                            _:b   foaf:name        "Peter Goodguy" .
describe next.
                                                                            _:b   foaf:mbox         .
                                                                            _:c   foaf:mbox         .
3.1 The Information Processing Model
                                                                        The query above should return all triples. On the other hand, the
Exploring a set of information items in the SW is understood here       function SPO(∅,{foaf:mbox}, ∅) can be translated to:
as a process of transforming resources and triple by successive
application of operations.                                                  SELECT ?s ?p ?o WHERE          { ?s ? p ?o. Filter (p =
Our experience in Web application design methods [10, 16] has                 foaf:mbox)} .
shown us that it useful to characterize the user information
processing as set of manipulation operations, in what has been          This query returns all triples that have the property foaf:mbox.
called “set based navigation” [14]. This view is also supported by
more recent proposal such as Parallax16. Basically, the user is         Consider the more complex example of how this model could be
always processing (browsing) information items within a set of          used, to solve the task: “find all Russian lakes”:
interest; if necessary, this set is further manipulated to either       Let S be a function that returns all subjects from a set of triples.
remove uninteresting elements or to add additional elements of          SPO(
interest.
                                                                                  S( SPO(∅,{rdf:type},{mondial:Lake}) ),
Explorator’s model is composed of two elements: the manipulated
items and the manipulation operations. The items are primitive                    {mondial:locatedIn},
elements in the RDF model: triple, resources, literals, URIs, etc.                {mondial:Russia}
The operations are grouped in two sets: set operations and search
operations.                                                             )

We will show in the following sub-sections that this model can          The expression above returns all triples that have the property
encompass classical browsing, set-based navigation as found in          mondial:locateIn with value mondial:Russia.
SHDM [10], and faceted browsing, as well as keyword search.                It should be noted that, whereas these examples show single
                                                                        valued parameters, in general the parameters for SPO are sets.
3.1.1 Sets
The model manipulates two kinds of sets – sets of RDF triples and       3.1.3 Set Operations
sets of RDF resources. When dealing with sets of RDF resources,         The model allows the user to manipulate items of information
the usual set operations, union, intersection and difference are        within the RDF domain. Once the user has obtained a set of triples
available. Since RDF resources are treated as URIs, blank nodes         and resources, she can manipulate them individually, formulate
will only be included if they are assigned URIs, as occurs in some      new queries, or create new sets. To do so, the model supports the
data stores.                                                            following set operations:
When operating on sets of triples, we interpret the set operations      Let A be the set of all triples.
as applying to any of the triple components, namely, subjects (S),
predicates (P) or objects (O). This is equivalent to projecting a set   Union:
of triples along one of its three slots.                                Given two sets M and N, each containing a triple, the union
                                                                        between M and N is the union of triples of M and N.
3.1.2 Search Operation                                                  Intersection:
As previously stated, there are two ways to access the data in SW:      The intersection set I between M and N is the union of the triples
dereferencing an URI or querying a SPARQL Endpoint. We                  in A such that the subject of the triples in I appear in triples in
define in our model general query operation, called SPO (S, P, O),      both M and N.
to be applied to a SPARQL Endpoint. This operation allows the           Difference:
user to obtain a new set of interest, which can then be processed in
the next step in the task.                                              The difference set D between M and N contains the triples in A
                                                                        such that their subjects appear in triples in M and do not appear in
The SPO operation has three parameters, all of which are sets: a        triples in N .
set of subjects, predicates, and objects. This operation is a subset
of general SPARQL queries, allowing the user to query an RDF            Note that, in this model, the result is always a set of triples, and
database by providing an example pattern of the desired set of          the operations are always computed on the sets of subjects,
triples.                                                                predicates or objects of these triples.
For example, the function SPO(∅,∅,∅) can be translated into
the following SPARQL query:                                             3.2 Visualizing RDF data with Explorator
     SELECT ?s ?p ?o WHERE        {?s ?p ?o} .                          In existing RDF browsers, the data are expressed in one of the
                                                                        following metaphors: table, tree or graph. In our approach, the
For the following data:                                                 interface represents the elements of the underlying exploration
     @prefix foaf:       .                  model: resources, triples and sets.

16
     http://mqlx.com/~david/parallax/index.html
                                                                         interface (ctrl-click) and then click on the union operation to form
                                                                         the corresponding set.
                                                                         The second subdivision, marked as 2, includes the operands for
                                                                         the SPO operation. In this case, the user must select one set, and
                                                                         then click on one of S, P or O. She may also assign another set to
                                                                         one of the other operands (S, P, O). Clicking on “=” produces the
                                                                         result. Clicking on “clear” resets the operands previously selected.

                                                                                    1                    2

                                                                         Figure 4. Operations in Explorator toolbar.
                                                                         The sets are represented as boxes, and stand for both sets of triples
                                                                         or sets of resources. Strictly speaking, all boxes represent sets of
                                                                         triples which can be grouped by subject, property or object.
                                                                         Classes are shown in blue, and RDF properties are shown in
                                                                         green.


                                                                         Figure 5. Sets of triples represented in Explorator’s interface.
Figure 2. A set of triples displayed in Explorator. The subject
                                                                         On the left we have all triples with Budapest as subject. On
is “Niger”, the properties and values are listed under it.
Considering a generic exploration mechanism over the RDF
model, the concept of triple, entity and resource are mixed. In
Explorator’s interface. The predicates and objects of the triples are
nested and right aligned under the subject, thus evidencing the
entity represented by the subject of the triple, as shown in the
figure 2.
Explorator uses the following heuristic to render a resource (or
URI) in the interface:
     •    If the resource has a label, name or title property, it
          renders its value.
     •    Otherwise the URI localname is rendered.
In this interface, each element can be manipulated individually.
Sets of subjects, predicates and objects can be selected by the user
and provided as parameters in the operations described in the
model. Dereferencing an URI, or the result of an operation over
the model always results in a new set in the interface. In this
sense, Explorator incorporates elements of the Direct
Manipulation paradigm [17], since the output of an operation may
                                                                         the right we have some triples grouped by subject.
be used as input of another, as they are expressed in the same
notation. Direct Manipulation is a user-system interaction               To select a triple the user simply clicks on the surrounding box,
paradigm that allows users to point at visual representations of         whose border becomes dashed to indicate the selection. If the user
objects and actions to carry out tasks rapidly and observe the           double-clicks on a triple, it is interpreted as a request for all triples
results immediately. Explorator’s interface follows this paradigm.       with the same subject as the subject of the clicked triple.
The interface has two main elements, the toolbar and the result
sets. The toolbar has a menu giving access to repository                 3.3 Faceted Navigation
configuration and additional functionalities; a search box; and a
group of buttons representing the operations of the model.               In addition to the operations already described, we have also
                                                                         defined a model for specifying tailor made facets. This model can
                                                                         be specified using a custom made vocabulary called FACETO,
                                                                         which we do not elaborate here for reasons of space.
Figure 3. Explorator toolbar.                                            While many tools implement faceted navigation (FacetMap 17 ,
                                                                         Longwell18, BrowseRDF19, Flamenco20, Exhibit21, /facet22 [8] ),
The operations menu is divided in two groups, as shown in Figure         none allow the specification of facets using RDF.
4. The first area (Fig. 4 - 1) has the set operations: To operate, the
user must select the first set among the sets displayed, then click
on the operation (union, intersection or difference), then select        17
                                                                              http://www.facetmap.com/
(click on) another set, and then click on ‘=’. Specifically for          18
union, the user can also click on multiple resources in the                   http://simile.mit.edu/wiki/Longwell
                                                                         19
                                                                              http://browserdf.org/
                                                                         are several possible ways to achieve this task; one possible way
                                                                         would be as follows:
                                                                              1.   Find all the lakes in the database;
                                                                              2.   Find Russia, the country;
                                                                              3.   Find all the lakes in Russia obtaining a set we will call
                                                                                        LR;
                                                                              4.   Find the countries that share a boundary with Russia
                                                                                       (Russia’s neighbors);
                                                                              5.   Find all the lakes in Russia’s neighbors, obtaining a set
                                                                                        we will call LN; and
                                                                              6.   Build the set of the lakes contained exclusively in
                                                                                        Russia by calculating the difference between the
                                                                                        previous sets: LR-LN
                                                                         To find all the lakes in the database, the user first searches for
                                                                         “lake”:




Figure 6: Explorator’s faceted interface.
Using FACETO, the designer may.
       1.    Specify a facet based on a given RDF property;
       2.    Specify a facet based on computed values. For example,
             she may define a “dimension” facet based on the
             combination of values of the “width” and “height”
             properties.
       3.    Define synonyms among different resources that
             represent the same information.
       4.    Define a facet as an arbitrary enumeration of values, or
             as a range. For example, “inexpensive” and
             “expensive”.
       5.    Specify a facet based on a hierarchical relation, such as
             “located in”.                                               She locates the Lake class (in blue) in the resulting set, and gets
                                                                         the set of instances of the Lake class by clicking on it, to obtain all
Note also, none of the existing tools can be applied directly to an      the lakes in the database:
arbitrary SPARQL Endpoint. Using Explorator, the user can facet
any set of triples retrieved during her navigation.
As an added convenience, we have also implemented an
algorithm, based on entropy measures, that given a set of triples,
determines the set of properties that is most discriminant for that
set, and builds a set of facets based on these properties. Again,
due to space limitations, we do not detail this algorithm here. This
operator can be activated by clicking on the F* button in the
interface of any set.
Due SPARQL language limitations (missing of aggregation
functions), applying this operation over a SPARQL endpoint may
be very time consuming.

3.4 An Example
Let us now illustrate the usage of Explorator. Suppose the user
needs to find all the lakes contained exclusively in Russia. There


20
     http://flamenco.berkeley.edu/
21
     http://simile.mit.edu/exhibit/
22
     http://slashfacet.semanticweb.org/
Next, to find Russia, she searches for “Russia” and locates the          Continuing to build the query, she selects the resource Russia and
resource Russia in the resulting set:                                    sets it as the object of her query:




To make sure she has the right resource, she views the resource
details:
                                                                         She executes the query to obtain the set of all lakes in Russia:




Next, to find all lakes LR in Russia, she selects the set of all lakes
and sets it as the subject of her query by clicking on the [S]           Next, to find the countries that share a boundary with Russia, she
toolbar button:                                                          views the details of the Russia resource and locates the “neighbor”
                                                                         property for Russia, thereby finding its neighboring countries:
                                                                       She then executes the query to find all lakes in Russia’s
                                                                       neighboring countries:




To find all the Russian lakes that are also in Russia’s neighbors,
she selects the set of Lakes in Russia and sets it as the subject of
her next query:


                                                                       Finally, to build the set of the lakes contained exclusively in
                                                                       Russia, she needs to calculate the difference between the set of
                                                                       lakes in Russia and the set of lakes in Russia’s neighbors. To do
                                                                       this, she selects the first set and the difference operator:




                                                                       Finally, she selects the second set (containing the lakes in
                                                                       Russia’s neighbors) and executes the difference operation by
                                                                       clicking on the equal sign [=] toolbar button, thereby obtaining the
She selects the set of Russia’s neighbors and sets it as the object    desired result:
of her query:




                                                                       4. IMPLEMENTATION
                                                                       In the following we outline our implementation architecture and
                                                                       some notable details. We decided to use a two layer architecture
                                                                       which separates the upper presentation layer from the lower
                                                                       model layer.
4.1 Presentation Layer                                                    was very effective in formulation of complex queries over an
                                                                          unknown domain.
For the implementation of the proposed interface we adopted the
                                                                          Explorator also allows faceted navigation, and we developed an
approach of adding semantic annotations in the HTML code to
                                                                          RDF vocabulary for faceted specification and an algorithm for
define interface widgets behavior. To that end, we used the
                                                                          automatic extraction of all facets of a set of triples.
Prototype23 library, which allows us to easily navigate the DOM
tree, select elements by their class attribute values - using CSS -       We have conducted a preliminary study [1] that has shown
and link operations to interface events such as onclick,                  encouraging results. Users with only basic knowledge of RDF
onmouseover, onkeyup, etc.. This technique enables us to create           were able to elaborate nontrivial queries with Explorator. We
very dynamic interfaces for direct manipulation with continuous           realized that Explorator’s performance (query execution time) had
representation, incremental actions and feedback. Also, all users         a negative impact on the user experience, especially when
requests to the server are made using Ajax24, allowing users to           accessing remote endpoints. It may be the case that users explored
continue to explore data while their request are being processed.         less because of the time it took to compute the queries. In fact, the
                                                                          time consumption is demanded by the SPARQL datastores, which
                                                                          are still in early stages, especially when compared to relational
                                                                          DBMSs. This issue is of the utmost importance and is being
4.2 Model Layer                                                           addressed for future versions.
                                                                          Not surprisingly, the experiments showed us that Explorator is
The model         layer    can    be   summed   up   in   the   picture   better suited to advanced users who have solid knowledge about
below:                                                                    RDF. Nevertheless, the experiments were brief, so we cannot yet
                                                                          draw any conclusions about Explorator’s learning curve.
                                                                          Preliminary evidence indicates that once the initial difficulty is
                          EXPLORATOR MODEL                                overcome, users can become quite proficient with the system.
                                                                          The next step in our study will be to investigate the use of
                                 ACTIVERDF                                Explorator as an epistemic tool, for users to understand more
                                                                          about the represented data domain, as opposed to performing
                                                                          predefined tasks and answering specific questions. In particular,
                            RDF DATABASE                                  an open hypothesis is the adequacy of the RDF model to match
                                                                          the user’s mental models – some of the collected evidence
                                                                          suggests that it might be too low level, which means suitable
Figure 7. Explorator model architecture                                   abstractions might have to be introduced. Exposing Explorator’s
                                                                          operation model to naïve users is still a challenge which is the
We used the ActiveRDF [13] framework as a layer for translating           subject of ongoing research.
the Explorator model to the RDF model. Basically, we used the
ActiveRDF to generate SPARQL queries from our model. The set              Additional larger-scale experiments should be conducted to
operations are performed on Ruby objects because the ActiveRDF            compare different user interface alternatives and interaction
and SPARQL do not support those operations natively. The query            paradigms to better support both novice and expert users in
and cache mechanism of ActiveRDF were modified to better                  exploring the semantic web. To do so, Explorator can be
support integration with Explorator’s model.                              instrumented to remotely capture the users’ actions at the user
                                                                          interface and on the underlying processing model.
The default dereferencing mechanism implemented is quite
simple: it simply retrieves and loads all triples retrieved from the      As future work, we will extend the model to support the definition
URI into a SESAME repository. No inference or recursive                   of parameterized sets, i.e., sets derived from parameterized
dereferencing heuristic is applied. As a result of this approach, the     operations. Following the QBE paradigm, the user will be able to
user can explore the triples retrieved along the direct URI               select any set in the interface, and indicate which should be the
navigation as a SPARQL Endpoint.                                          parameters. Once this has been done, the user can then plug the
                                                                          output of a box as the input of another box (set), thus establishing
                                                                          a graph of inter-related operations, much like a spreadsheet. Such
5. CONCLUSION                                                             parameterized sets can be saved to libraries, to be later reused by
                                                                          any user.
Exploratory search is a data exploration technique that supports
complex user’s tasks involving lookup as well as learning and             Explorator needs some improvements related to the dereferencing
investigation. We have shown how this technique can be                    heuristics. Also, we are working on some mechanisms to enable
employed for arbitrary RDF databases. We have developed an                exporting RDF, and for enabling alternative views to allow the
information-processing model that supports the tasks in the               user to visualize the resources and triples in table, timetables and
Semantic Web that not only consist of a searching for a known             maps, as well as in customized domain-dependent formats.
item, but also consists of acquisition and assimilation of                In summary, Explorator’s contributions are:
knowledge and concepts in an RDF database. This model has
been implemented in a tool called Explorator. We use the direct                •    An information exploration model for RDF based on
manipulation metaphor in the construction of the interface, which                   facet and set navigation;
                                                                               •    An exploration environment that allows query
                                                                                    formulation by direct manipulation, allowing remote
23
                                                                                    and local SPARQL endpoints exploration;
     http://www.prototypejs.org/
24
                                                                               •    Automatic facet generation for given sets of RDF
     http://ajaxpatterns.org/                                                       triples;
     •    A facet specification vocabulary and corresponding            [13] Oren E.,Delbru R., Gerke S., Haller A., Decker S.
          implementation within the tool (not shown in this                  ActiveRDF: ObjectOriented Semantic Web Programming.
          paper).                                                            Digital Enterprise Research Institute National University of
Explorator is an open source project and can be accessed at                  Ireland, Galway Galway, Ireland. 2007
http://www.tecweb.inf.puc-rio.br/explorator.                            [14] ROSSI, G.; SCHWABE, D.; LYARDET, F.; "Patterns for
                                                                             Designing Navigable Spaces", Proceedings of PLoP98 (Tech
                                                                             Report TR #WUCS-98-25, Washington University, St.
ACKNOWLEDGMENT. Daniel                   Schwabe     was    partially        Louis, MO, USA), Monticello, Illinois, USA, August 1998.
supported by a grant from CNPq.
                                                                        [15] Russell, A., Smart, P. R., Braines, D. and Shadbolt, N. R.
6. REFERENCES                                                                (2008). NITELIGHT: A Graphical Tool for Semantic Query
                                                                             Construction. In: Semantic Web User Interaction Workshop
                                                                             (SWUI 2008), 5th April, Florence, Italy. 2008.
[1] Araújo F. C. S.; Schwabe D.; Barbosa D. J. S. Experimenting         [16] Schwabe, D., Rossi, G.: An object-oriented approach to web-
     with Explorator: a Direct Manipulation Generic RDF                      based application design. Theory and Practice of Object
     Browser and Querying Tool. Visual Interfaces to the Social              Systems (TAPOS), Special Issue on the Internet, v. 4#4,
     and the Semantic Web. VISSW 2009. Sanibel Island, Florida               October, 1998, 207-225.
     February 2009 (http://www.smart-                                   [17] Shneiderman, Ben, Direct manipulation: a step beyond
     ui.org/events/vissw2009/index.html)                                     programming languages. IEEE Computer 16,8 (August
[2] Baldonado M. Q. W., Winograd T. SenseMaker: An                           1983), 57-69.
     Information-Exploration Interface Supporting the Contextual        [18] Zloof, M. M., 1977. Query-by-example: a database language.
     Evolution of a User’s Interests. 1996                                   IBM System Journal 16, 324-343, 1977.
[3] Berners-Lee T., Chen Y., Chilton L., Connolly D., Dhanaraj
     R.,Hollenbach J, Lerer A., and Sheets D. Tabulator:
     Exploring and Analyzing linked data on the Semantic Web.
     Decentralized Information Group. Computer Science and
     Artificial, Intelligence Laboratory. Massachusetts Institute of
     Technology. Cambridge, MA, USA. 2006.
[4] Best Practice Recipes for Publishing RDF Vocabularies.
     http://www.w3.org/TR/swbp-vocab-pub/
[5] Catarci, T., Costabile, M. F., Levialdi, S., Batini, C., 1997.
     Visual Query Systems for Databases: A Survey. Journal of
     Visual Languages and Computing, 8(2), 215-260, 1997.
[6] Dereferencing           a        URI            to         RDF.
     http://esw.w3.org/topic/DereferenceURI
[7] Ding L., Zhou L., Finin T., Joshi A. How the Semantic Web
     is Being Used: An Analysis of FOAF Documents.
     Proceedings of the 38th Hawaii International Conference on
     System Sciences – 2005
[8] Hildebrand M., Ossenbruggen J. v. and Hardman L. /facet: A
     Browser for Heterogeneous Semantic Web Repositories. The
     5th International Semantic Web Conference (ISWC). Athens,
     GA, USA. 2005
[9] Huynh D. F., Karger D. R., Miller R. C.. Exhibit: lightweight
     structured data publishing. International World Wide Web
     Conference.      Proceedings of the 16th international
     conference on World Wide Web (WWW). Banff, Alberta,
     Canada. 2007
[10] Lima, F.; Schwabe, D.: “Application Modeling for the
     Semantic Web”, Proceedings of LA-Web 2003, Santiago,
     Chile, Nov. 2003. IEEE Press, pp. 93-102, ISBN (available
     at http://www.la-web.org).
[11] Marchionini G. Exploratory search: From finding to
     understanding. Comm. Of the ACM, 49(4), 2006.
[12] OREN, E.; Delbru, R.; Decker, S. Extending faceted
     navigation for RDF data. 5th International Semantic Web
     Conference, Athens, GA, USA, LNCS 4273, p. 5-9. 2006