Konduit VQB: a Visual Query Builder for SPARQL on the Social Semantic Desktop Oszkár Ambrus Knud Möller Siegfried Handschuh oszkar.ambrus@deri.org knud.moeller@deri.org siegfried.handschuh@deri.org Digital Enterprise Research Institute (DERI) National University of Ireland, Galway (NUIG) ABSTRACT Konduit [10] is a tool for building visual workflows for RDF With the adoption of Nepomuk as an organic part of KDE data within Nepomuk-KDE, allowing for a flexible access to the semantic desktop became a reality to a great number of the local RDF data as well as mashing up with web-based users and is employed by a growing number of applications. data. It features a visual programming environment and al- Thus, the amount of semantic data is constantly growing on lows for various manipulations (merging, filtering, mashing the desktop. Therefore users need a way to access this data up, creating visual workflows, etc.) as well as executing dif- outside of the limiting use cases of the applications employ- ferent actions (executing scripts, automatizing emails, etc.) ing Nepomuk-KDE. using the queried RDF data. A query builder is used to gen- erate SPARQL queries for querying components which act We aim to assist users in building queries and running them as data sources in the RDF workflows, producing data that is to make use of RDF data that would otherwise be partially made use of further in the workflow. or completely hidden. In this paper, as an initial iteration of our efforts, we present four approaches to building SPARQL We want to provide a way for users to build these queries queries visually, based on two different categorizations: sche- in an intuitive way, with having no or little knowledge about ma-based vs. instance-based and SELECT vs. CONSTRUCT the querying language (i.e., SPARQL [11]). Although this queries. We present the used interfaces, visual languages and does not mean a complete abstraction from the underlying query generation methods associated to each of approaches details, we aim to assist users with limited technical knowl- as well as the autocompletion techniques for the instance- edge as well as those who know SPARQL and RDF (with based query builders. an emphasis on the latter, though) and provide an interface that suits the needs of both. We try to provide a tool with Author Keywords the necessary features similar tools provide, that also sup- Visual Query Builder, SPARQL, Nepomuk ports RDF, provides search assistance on a whole repository (as opposed to a single ontology) and is also integrated into INTRODUCTION Nepomuk-KDE (within Konduit and beyond) allowing for local data-driven querying as well as sharing queries online. The Social Semantic Desktop [4] is a paradigm transpos- ing Semantic Web concepts unto the desktop. Ontologies We explore several approaches due to the structured nature thus conceptualize information and semantic data is stored of RDF data and the difficulty of searching and querying in RDF. It loosens the borders between applications and pro- it in an intuitive and transparent manner. We present, as a vides a unified environment. The Nepomuk project [6] out- first attempt in our research on visual query builders, four lines the requirements and functionalities of the Social Se- interfaces aiming to achieve the above goal, with different mantic Desktop and defines an architecture specification that approaches to query building and varying degrees of com- fulfills these requirements. Nepomuk-KDE1 is a reference plexity. The first two of them are schema-based, allowing implementation of Nepomuk. It provides a platform to cre- for building queries using restrictions based on the ontology ate and handle all kinds of metadata. It uses RDF stores for structure. The second two use a triple construction-based the metadata persistence and provides a middleware for ap- approach; the user constructs the restricting triples of the plications to build upon, allowing them to store and access SPARQL query assisted by suggestions using both schema the semantic data on the desktop (or, alternatively, the Web). and instance information from the underlying repository. 1 http://nepomuk.kde.org/ RELATED WORK There are a number of tools that aim to assist users in build- ing queries for semantic data. Many of them provide novel and intuitive approaches and demonstrate useful features, such as NITELIGHT [12] or RDF-GL [7] aiming to repre- sent SPARQL constructs through graphical metaphors. Mash- Workshop on Visual Interfaces to the Social and Semantic Web QL [8], GRQL [1] and GLOO [5] propose queries as trees (VISSW2010), IUI2010, Feb 7, 2010, Hong Kong, China. Copyright is starting from a given class and restricting it incrementally held by the author/owner(s). 1 on the branches. SPARQLViz [2] provides a click-through wizard for composing queries and SEWASIE [3] features a limited ontology-based query formulation. Nevertheless, several of the tools only support querying based on a single ontology, some of them do not support RDF and SPARQL. Some of them require extensive manual editing of the queries, or do not feature clear relationships between query parts. Moreover, excepting SPARQLViz, all query builders are web-based (or only usable within their own sys- tem), not allowing for the integration of desktop data. Our two schema-based interfaces are mostly built on the in- tuition of MashQL and GRQL, in exposing schema struc- tures and possible restrictions branching from an initial class. The instance-based query builders resemble SPARQLViz in providing forms for the user to complete, but feature a single, less confusing and simple interface, with clear connections between the query parts. RUNNING EXAMPLE Figure 1. Schema-based SELECT query builder. Suppose we are searching for a contact from the local repos- itory whose name contains the letter ‘K’ and has written a publication on the semantic desktop. don’t exist in the repository, adding properties that have not been defined or converting an instance from one ontology to QUERY BUILDING another, e.g., transforming ?v foaf:name ‘‘Smith’’ The interface of the query builder application features a cen- to ?v nco:fullName ‘‘Smith’’. tral part for the visual query builder and previewers for the query and its results, as well as menu actions and a status bar. SCHEMA-BASED SELECT QUERIES The central query builder has four incarnations as presented The schema-based SELECT query builder (Figure 1) is the in the following sections, employing different approaches to most user-friendly approach to building SPARQL queries, building SPARQL queries. aimed at users having the least knowledge of semantic tech- nologies. We have devised a simplification of the SPARQL The most common searching interface, a search box for sim- language, which allows for this particular kind of builder. ple keyword searches to retrieve semantic information is high- It allows for selecting a class (which will be the type of ly ambiguous and needs extensive research, so abstracting the queried variable) and restricting it in a tree-like manner completely from the structure of semantic data is not yet our through its properties. intention. Also, we can’t yet provide a fully-featured ma- ture semantic querying application, as we aim to explore the Queries are built using schema information from Nepomuk. most appropriate ways to do it and research the possibilities The possible classes and predicates to be chosen are queried and limitations in accomplishing this task. and presented to the user. The two schema-based approaches use schema information We start from the assumption that users most often want from the ontologies in the Nepomuk system, allowing users to find certain information belonging to an entity of certain to compose tree-based queries in restricting the properties of classification and/or having a number of known restricting the resulting objects. This allows users to explore the local characteristics. For example, one would want to search for a schema structures. These approaches feature a simplification contact person (entity) with a given name (restricting char- of SPARQL, in allowing only to restrict the initial selection acteristic), similarly to doing a free text search. descending in a tree-like fashion. The interface therefore provides a way to build queries as The instance-based approaches are based on constructing trees, starting with the type of entity the user inquires for triples without necessarily knowing RDF or SPARQL (sim- and progressing with restrictions on the branches of the tree. ilarly to the Wikipedia Visual Query Builder2 . They use schema information as well as instance information in sug- The Visual Language gesting users possibilities in completing the subjects, predi- The visual language covers a subset of SPARQL. Listing 1 cates and objects of the constraining triples of the SPARQL provides a formal description of the queries built through the query. This autocompletion allows for users to explore the visual facilities of the interface. Note that the missing termi- data stored on the local RDF repository. The instance-based nal definitions ClassN ame and P redicate are IRI refer- query builders also allow for the construction of triples that ences, LiteralV alue is a string literal and V ariableN ame 2 http://dl-learner.org/Projects/dbpedia is a SPARQL variable (such as ?v or $x). Relation denotes 2 a relation such as contains, equals, etc. Query : : = Outputs Conditions Outputs : : = RootNode | L i t e r a l N o d e Conditions ::= GraphPattern GraphPattern : : = QueryTree+ QueryTree : : = RootNode TreeNumber RootNode : : = ClassName V a r i a b l e N a m e R e s t r i c t i o n s Restrictions : : = QueryNode∗ QueryNode ::= ClassRestriction | LiteralRestriction C l a s s R e s t r i c t i o n : : = P r e d i c a t e RootNode L i t e r a l R e s t r i c t i o n ::= Predicate Relation LiteralNode LiteralNode : : = VariableName L i t e r a l V a l u e Listing 1. EBNF description of the visual language (non-terminals) Query Generation Queries are generated based on the visual description ac- cording to the defined language. The SELECT part of the queries will be a set of variables extracted from the com- ponents (class combo boxes or literal text boxes) selected as Outputs. It is made up of the variable names in the RootN odes and LiteralN odes, for class combo boxes or literal text boxes, respectively. This happens in a transpar- Figure 2. Schema-based CONSTRUCT query builder. ent way, as variables are extracted automatically from the selected components. Visual Language Extension The WHERE clause is a set of RDF triples generated from The visual language is extended, as the output part of the the query tree structures present in the query form. We have queries is built using graph patterns constructed from triples. the following three cases: (1) For the root RootN ode the Outputs ::= GraphPattern generated triple is VariableName a ClassName and GraphPattern ::= Triple+ Triple ::= Subject Predicate Object for every restriction a triple is generated starting with Vari- Subject : : = VariableName ableName and continuing as presented in the following Predicate : : = P r e d i c a t e N a m e | V a r i a b l e N a m e | ClassName Object : : = P r e d i c a t e N a m e | V a r i a b l e N a m e | ClassName points (e.g. ?v41 a foaf:Person generated for the root | LiteralValue node in Figure 1). (2) A ClassRestriction completes the parent’s triple with Predicate VariableName (where Listing 2. EBNF description of the output part (non-terminals) VariableName is the variable of the RootN ode belong- ing to the new restriction) (e.g. ?v41 foaf: publications ?v77 generated for the publication Query Generation Extension restriction in Figure 1). (3) A LiteralN ode completes the The way outputs are generated has been changed for this triple with Predicate VariableName adding a regu- query builder: in this case outputs are triples and consist of lar expression filter string according to the chosen Relation the triples described in the graph pattern. and LiteralV alue (e.g. FILTER regex(?v59, ’K’, ’i’) generated for the first restriction shown in Figure 1). User Interface Extension The user interface is extended with a component for compos- User Interface ing output triples, as shown in Figure 2. It lists all variables The query builder form (Figure 1) allows for adding several for the subject field, all variables, predicates and classes for query trees (staring from different classes), and restricting the predicate field and all variables, predicates, classes and them by their properties. If a property has a literal range, the literal values for the object field. Users can select desired user can enter a value and restrict it on a relation, such as triples and add them to the output list. equals or contains. If its range is not a literal, they can add restrictions. INSTANCE-BASED SELECT QUERIES Building queries with the instance-based SELECT builder Outputs are selected by right clicking on a combo box repre- relies on schema and instance information from the underly- senting a class or a value. The corresponding variable will be ing RDF repository. added to the output, and the combo box will be highlighted. The Visual Language SCHEMA-BASED CONSTRUCT QUERIES Variable, class and predicate names are IRIs describing the The approach to building schema-based CONSTRUCT que- corresponding entities, as described for the previous builders, ries is very similar to the one shown in the previous section. where the meaning of Relation is also explained. Instance The difference lies in the way in which the outputs are se- IRIs are taken from the repository as autocompletion pop- lected. ups based on user input. The LiteralV alue represents valid 3 SPARQL literals, such as strings or integers (e.g. "Exam- ple" ). See Listing 3 for the formal EBNF description. Query : : = Outputs Conditions Outputs : : = VariableName+ Conditions ::= GraphPattern GraphPattern ::= Triple+ Triple ::= Subject Predicate Object Subject : : = V a r i a b l e N a m e | ClassName | I n s t a n c e I R I Predicate : : = VariableName | PredicateName Object : : = V a r i a b l e N a m e | ClassName | I n s t a n c e I R I | Literal | FilterExpression Literal : : = L i t e r a l V a l u e { DataType }? FilterExpression ::= Relation LiteralValue Listing 3. Visual language of the instance-based SELECT query builder Query Generation Figure 3. Instance-based SELECT query builder. Resulting query is Queries are constructed by enumerating the variable names identical to the one shown in Fig. 1. for the SELECT part and taking the list of triples for the WHERE part. Filter expressions are built by adding a regu- lar expression filter string according to the chosen relation. Autocompletion The interface features an incremental autocompletion for the WHERE part based on user input described in [9], because of the infeasibility of listing all the instances. Whenever the user types something into any of the text boxes, the sys- tem pops up all the possible options of RDF entity names, classes, properties or instance identifiers and values. This helps the user in exploring the underlying data set. Figure 4. Instance-based CONSTRUCT query builder — output part. Autocompletion is achieved by running an incremental query Query Generation Extension every time the user enters text. The system queries for all en- tity names (classes, predicates) as well as all instance iden- The query generation is extended by taking the triples from tifiers or values that contain the user input and match the the Output graph pattern and enclosing them in the CON- graph pattern constructed so far (for the final query to have STRUCT part of the query. results). The user is then presented with the list of possible options, this list incrementally growing as more matches are Autocompletion Extension found from the RDF repository. This interface also supports autocompletion for the output part of the query, similarly to its sibling interface. There is a User Interface slight modification, however, since the output triplesv́alues The user interface features a component for building con- do not need to satisfy any conditions, only being present in ditional (WHERE) triples, as shown in Figure 3. The text forming the output triples. Thus, it is not required to comply fields provide autocompletion popups for user input, based with the rest of the graph pattern, so all existing entities are on what the repository contains, against the triples that have suggested to the user, that match the given input. already been added to the output list. The user can select a filter option using the desired relation or can specify the type User Interface Extension of the object, the latter defining the way it will be formatted The user interface is extended with an output construction and/or suffixed in the output. part, shown in Fig. 4, having an almost identical structure to the triple building component for the conditional (WHERE) Outputs are selected from an output list that is populated part, excluding the filtering option. with all the variables occurring in the conditional triples. DISCUSSION INSTANCE-BASED CONSTRUCT QUERIES The schema-based SELECT query builder allows for sim- Visual Language Extension ple querying and restricting the desired properties with user- The visual language is extended with triple patterns for the defined input. The advantage is that it is simple, intuitive output part as well. They are identical to the GraphP attern and satisfies the large number of occasions when the user nonterminal defined for instance-based SELECT queries. wants to search for something based on certain properties. There is one exception, namely the lack of F ilterExpres- Selecting the outputs is also straightforward and clear. The sions, since such expressions do not exist in the output part disadvantage is that it is limiting and inflexible, only allow- of SPARQL queries for obvious reasons. ing querying as trees, thus being unsuitable for some cases a 4 proficient user would meet. the European project NEPOMUK No. FP6-027705. The schema-based CONSTRUCT query builder allows to REFERENCES construct triples, this being required in many cases (in Kon- 1. N. Athanasis, V. Christophides, and D. Kotzinos. duit, all data are RDF triples). The advantages/disadvantages Generating on the fly queries for the semantic web: The are similar to its sibling approach, adding the complication ICS-FORTH graphical RQL interface (GRQL). Lecture of selecting the correct variables and classes for the output notes in computer science, pages 486–501, 2004. triples, but adding the flexibility of formatting the output. 2. J. Borsje, H. Embregts, and S. F. Frasincar. Graphical For the instance-based SELECT query builder we have the query composition and natural language processing in advantages of conditioning the results in a flexible, data- an rdf visualization interface, 2006. driven manner (with autocompletion based on the data in 3. T. Catarci, P. Dongilli, T. Di Mascio, E. Franconi, the repository), using user-defined variables and types, and G. Santucci, and S. Tessaris. An ontology based visual reusing variables. The disadvantage is that it features a direct tool for query formulation support. In ECAI, correspondence of the underlying RDF structure, making it volume 16, page 308, 2004. more complicated than the schema-based interfaces. 4. S. Decker and M. R. Frank. The networked semantic The instance-based CONSTRUCT query builder is the most desktop. In WWW Workshop on Application Design, flexible, triple-based query assistant, making it possible to Development and Implementation Issues in the compose advanced queries, with the obvious disadvantage Semantic Web, 2004. of being the least accessible to naı̈ve users. 5. A. Fadhil and V. Haarslev. Gloo: A graphical query language for owl ontologies. In B. C. Grau, P. Hitzler, CONCLUSIONS C. Shankey, and E. Wallace, editors, OWLED, volume We have presented four techniques to assist users in build- 216 of CEUR Workshop Proceedings. CEUR-WS.org, ing SPARQL queries to retrieve information from the ever- 2006. growing collection of semantic data. Aimed at beginners 6. T. Groza, S. Handschuh, K. Möller, G. Grimnes, and proficient users as well, the interfaces feature a range of L. Sauermann, E. Minack, C. Mesnage, M. Jazayeri, approaches that ease the composition of queries. The query G. Reif, and R. Gudjónsdóttir. The NEPOMUK project builders are based on the local (or possibly, remote) reposi- — on the way to the social semantic desktop. In tory, facilitating the discovery of the RDF store. T. Pellegrini and S. Schaffert, editors, Proceedings of I-Semantics’ 07, pages pp. 201–211. JUCS, 2007. The first two approaches relied solely on schema informa- tion, helping the users to query for instances of existing classes 7. F. Hogenboom, V. Milea, F. Frasincar, and U. Kaymak. and restricting them with the available properties. One of RDF-GL: A SPARQL-Based Graphical Query these is intended for writing SELECT queries by selecting Language for RDF. a set of outputs, the other one is for CONSTRUCT queries 8. M. Jarrar and M. D. Dikaiakos. Mashql: a presenting the used variables, predicates and classes in three query-by-diagram topping sparql. In ONISW ’08: lists for selecting the subject, predicate and object. Proceeding of the 2nd international workshop on Ontologies and nformation systems for the semantic The other two approaches use instance information as well web, pages 89–96, New York, NY, USA, 2008. ACM. in providing autocompletion popups based on user input to suggest possible options, taking into consideration the triples 9. K. Möller. Lifecycle Support for Data on the Semantic previously added. The SELECT query builder simply al- Web. PhD thesis, National University of Ireland, lows for selecting the variables used in the query for output, Galway, 2009. and the CONSTRUCT query builder allows for composing 10. K. Möller, S. Handschuh, S. Trug, L. Josan, and triples from the variables used and all existing entities in the S. Decker. Demo: Visual programming for the semantic repository. desktop with Konduit. In 5th European Semantic Web Conference (ESWC2008), Tenerife, Spain, volume 5021 We plan to perform a usability evaluation in determining the of LNCS, pages 849–553. Springer, June 2008. most appropriate tool from the ones presented for building SPARQL queries within Konduit and Nepomuk-KDE. We 11. E. Prud’hommeaux and A. Seaborne. SPARQL query will present it to users with no RDF/SPARQL background language for RDF. Recommendation, W3C, January as well as users with deep knowledge in semantic technolo- 2008. http: gies to decide on the future direction in what approach and //www.w3.org/TR/rdf-sparql-query/. features best suits a SPARQL query builder aimed at a fairly 12. P. R. Smart and Russell. A visual approach to semantic wide variety of users. query design using a web-based graphical query designer. In EKAW ’08: Proceedings of the 16th Acknowledgments international conference on Knowledge Engineering, The work presented in this paper has been funded (in part) by Science Foun- pages 275–291, Berlin, Heidelberg, 2008. dation Ireland under Grant No. SFI/08/CE/I1380 (Lı́on-2) and (in part) by Springer-Verlag. 5