SPARQL Faceter—Client-side Faceted Search Based on SPARQL Mikko Koho, Erkki Heino, and Eero Hyvönen Semantic Computing Research Group (SeCo), Aalto University, Department of Computer Science, Finland first.last@aalto.fi, http://seco.cs.aalto.fi/ Abstract. The faceted search paradigm is widely used in web applica- tions, and there are various tools available for implementing it on the server side. In contrast, this paper presents an HTML based compo- nent tool on the client side that can be plugged on virtually any public SPARQL endpoint on the web, using only SPARQL API for data re- trieval. To test and demonstrate the idea and the tool, application of the tool in a large in-use semantic portal is presented. 1 Introduction Faceted search [2, 12], known also as view-based search [9] and dynamic hierar- chies [10], is based on indexing data items along orthogonal category hierarchies, i.e., facets (e.g., places, times, document types etc.). In searching, the user selects in free order categories on facets, and the data items included in the selected categories are considered search results. After each selection, a count is com- puted for each category showing the number of results, if the user next makes that selection. In this way, search is guided by avoiding annoying ”no hits” re- sults. The idea of faceted search is especially useful on the Semantic Web where hierarhical ontologies used for data annotation provide a natural basis for facets, and reasoning can be used for mapping heterogeneous data to facets [4]. Faceted search can be implemented with popular server-side solutions, such as Solr1 , Sphinx2 , and ElasticSearch3 . However there is a lack of light-weight client-side faceted search tools or components that search data directly from a SPARQL endpoint. Such a tool would be very useful, because it could be used on virtually any open SPARQL endpoint on the web without any need for server side programming and access rights. This paper presents a web component for implementing faceted search applications in a browser, based only on a standard SPARQL API. 1 http://lucene.apache.org/solr/ 2 http://sphinxsearch.com/blog/2013/06/21/faceted-search-with-sphinx/ 3 https://www.elastic.co/ 2 The Tool: SPARQL Faceter Design Requirements The requirements for our tool, SPARQL Faceter, are based on our earlier experience on tediously implementing server side faceted search over and over again in different applications, as well as on the generic requirements of the search paradigm [12, 3]: there is a need for a lightweight browser-based faceted search engine tool that is easy to use and adapt for differ- ent RDF datasets published in SPARQL data services on the web. A particular demand of our own Linked Data Finland data service4 [5] is to support external users of the data in application development, and here the notion of the Rich Internet Application, where the data service is completely detached using stan- dard SPARQL API was deemed very useful. The tool should therefore fulfill the following requirements: 1. Data read from a SPARQL endpoint. It is a common practice to pub- lish RDF datasets as SPARQL services, and build applications that use the SPARQL endpoint to query for information. The data can reside anywhere on the web and be managed by anybody on the server side. No extra effort is required for managing the data from the application developer side. 2. Easily adaptable to different datasets. Different datasets have different data models and the application should be easy to configure for these. 3. Easy integration to web pages. A developer should be able to integrate a faceted search on some web page and modify the appearance of the faceted search freely. 4. Fluent user experience. When a user makes selections of the facets, the application should show results in a reasonable time. The UI should be easy to use and should support the exploratory analysis of datasets. 5. Support for hierarchical facets. The tool should be able to handle hierar- chical concepts in the facets, displaying the hierarchy in the facet selections. When a user selects a category upper in a hierarchy, the results should in- clude all results for its’ sub categories. 6. Easy to maintain. The tool is planned to be actively used, and thus should be easy to maintain and develop further. In short, the goal was to create a tool that would be easy to use out-of- the-box, and require minimal configuration, yet provide enough configuration options to be useful in different contexts. Design AngularJS5 was chosen as the implementation framework for our tool. SPARQL Faceter is developed as open source with source code available on GitHub. SPARQL Faceter consists of two distinct components. One is the main Se- mantic Faceted Search component6 , which contains the basic functionalities of 4 http://ldf.fi 5 https://angularjs.org/ 6 https://github.com/SemanticComputing/angular-semantic-faceted-search the tool, and is dependent on the second component: an AngularJS service for querying SPARQL endpoints with paging and object mapping7 . The Semantic Faceted Search component is comprised of different services, and most importantly, the facet selector directive. This directive is the main component of the tool, and provides the user interface for using the faceted search. The facet selector directive is independent of any results the facet selections might imply: it’s role is to keep track of what values are selected in what facets, and to display available selections to the user including the number of results each selection implies. The tool communicates the facet selections to whatever application is using it by calling a callback function whenever the selections change. Thus, SPARQL Faceter places no restrictions on what can be done with the user’s selections. The tool does, however, provide services for integrating the facet selections to SPARQL queries, handling SPARQL results, and updating the URL based on current selections. The Semantic Faceted Search component works by building a single SPARQL query for the facets it controls, and updating the query each time the user makes a selection. This has the advantage of keeping the states of the facets synchronized at all times. The downside is that with very many facets querying can become slow, and the user experience may suffer as the facets are unusable while their states are being queried. In order to make it feasible to have many facets available, facets can be enabled and disabled: only enabled facets are included in the query. Whether or not a facet is initially enabled is configurable. The tool shows a spinner on the components when querying for data, to convey to the user that the information in the elements is being updated. For enabled facets, the tool shows all possible values for the facets’ property, and the amount of instances that the selection results in. The amount of results is calculated based on all the current selections in all of the facets. Values that would lead to no results are omitted. In addition to the list of possible values, there is also a text input for searching the facet values. The tool also supports hierarchical facets, in which selecting a category upper in the hierarchy also shows results for all the categories that are below it in the hierarchy. The hierarchy specification is not limited to using a predefined property or vocabulary, but is based on configuring a property path that captures the hierarchy, and giving the top categories explicitly. The facet element displays a two-level hierarchy of the categories. For categories below the current category level, the hierarchy is flattened and all the sub categories are listed below the upper category. The category level is initially on the topmost categories, and selecting a value changes category level to the level of the selected value. The proposed way to use the facet selections, as returned by the tool, is to build another SPARQL query for retrieving the results according to the user’s selections. By using the tool’s services, the results can then be mapped into JavaScript objects with pagination and client-side caching of the received results. 7 https://github.com/SemanticComputing/angular-paging-sparql-service Installation and Configuration At its simplest, the only configuration op- tions needed by SPARQL Faceter are the URL of a SPARQL endpoint, a callback function to be called with the facet selections when they change, the RDF class URI of the resources which are the target of the search, and the facet configu- rations. A basic facet can be configured with just the URI of the property and a name for the facet to be displayed to the user. The callback function defines what happens when the facet selections are updated, which typically is to update a results display according to user selections. Other types of facets require some additional configuration options, but the amount of configuration needed for any facet type has been kept to a minimum. The other facet types include a keyword search facet, a time-span facet, and a hierarchical facet. More details about the configuration is given in the repository of the Semantic Faceted Search component. As for the results, the user of the tool is expected to build a query for retriev- ing results based on the facet selections. This makes the tool extremely flexible to different data models, but requires that the user of the tool is familiar with SPARQL syntax. 3 Applications We tested and evaluated SPARQL Faceter on the WarSampo portal [6]. The portal8 consists of different application perspectives built on top of interlinked Finnish Second World War data published on a SPARQL endpoint. The SPARQL endpoint is provided by an Apache Fuseki Server9 . The applications are tested to work with all major browsers (Chrome, Firefox and Internet Explorer). Use case 1: Death Records Perspective The WarSampo death records per- spective10 uses the application to provide an interface for searching and exploring the death records of Finnish WW2 casualties [7]. The dataset consists of about 95,000 death records, and almost 2.4 million RDF triples. Fig. 1 shows a screenshot of the faceted search of death records using SPARQL Faceter. The application interface contains 12 facets and a table-like view of the results. The facets include a keyword search facet for the persons’ names, a time- span facet for the time of death and a hierarchical military rank facet, and 9 basic facets of the annotated properties of the death records. The source code of the application is available online11 . A dataset this large provides a benchmark for the performance of the SPARQL Faceter, as SPARQL queries and also data handling on the client-side could take long enough to deteriorate the user experience. Quite much effort had to be put on optimizing the SPARQL queries for speed, while also making sure that no 8 http://www.sotasampo.fi/ 9 https://jena.apache.org/documentation/fuseki2/ 10 http://www.sotasampo.fi/en/casualties/ 11 https://github.com/SemanticComputing/WarSampo-death-records Fig. 1. The faceted search interface of death records [7] with a selected value in one facet. unnecessary processing of the results happens on the client-side. Normally when searching and browsing the dataset, results are displayed in a few seconds after the user makes a selection. With some selections that result in a large result set, and if additionally many facets are enabled, the user may have to wait more than ten seconds to see the results and updated facets. Most of the time goes to waiting for results from the SPARQL endpoint. Use case 2: Photographs Perspective The WarSampo photographs per- spective12 uses the application to create a faceted search interface for wartime photographs. The dataset consists of about 159,000 photographs and a total of about 1.6 million triples. The interface contains only 5 facets, which ensures fast response times. The facets include a time-span facet for the date the photograph was taken, a keyword search for descriptions, and basic facets for related people and places. Fig. 2 shows a screenshot of the photograph perspective. The perspective uses “infinite scrolling” to display the photographs: more results are retrieved as the user scrolls down for more photographs. Clicking on a photograph shows the user more information about the photograph, and provides hyperlinks to other WarSampo perspectives via linked entities. 4 Discussion In this paper we have presented the SPARQL Faceter for creating client-side faceted search applications. The configuration of the facets has been kept as 12 http://www.sotasampo.fi/en/photographs/ Fig. 2. The faceted search interface of wartime photographs with a selected value in one facet. simple as possible. SPARQL Faceter handles querying the SPARQL endpoint when needed, mapping the SPARQL results to JavaScript objects, and creating and updating the facet elements. To make use of the facet selection values, the developer using the tool has to retrieve results based on selected values, which usually includes writing SPARQL queries. Results Section 2 presented the design requirements for a Semantic Web tool that provides client-side faceted search of RDF datasets published as SPARQL services. Here is a detailed view of how the SPARQL Faceter manages to satisfy these requirements: 1. Data read from a SPARQL endpoint. All data is retrieved via SPARQL, from an endpoint defined by the user. 2. Easily adaptable to different datasets. Configuration of the tool is re- quired for every dataset. The configuration required for setting up facets is kept to a minimum. 3. Easy integration to web pages. A faceted search can be easily added to a web page by using the developed AngularJS components. The facet selections have a default look that can be customized using CSS. 4. Fluent user experience. For datasets that are distinctly smaller than the ones used in the use cases, the tool is almost certainly fast enough for good user experience. For the faceted search applications in the use cases, some user selections can be slow, but common usage gives results based on user selections in a few seconds. For larger datasets, the amount of facets may have to be restricted, depending on the processing capabilities of the SPARQL server. 5. Support for hierarchical facets. Two-level hierarchical facets are sup- ported by the tool, and demonstrated in the first use case. The dataset can contain more than two levels, but the hierarchical facet flattens the hierarchy to two levels for display in the select element. 6. Easy to maintain. The separation of concerns design principle is used in development to simplify code structure and maintenance. Automated tests are used to help maintain good code quality. Our application is able to satisfy the given requirements, and thus provides a usable tool for faceted search of RDF datasets. Related Work Jassa [11] is a JavaScript suite for SPARQL access, which among other things provides support for client-side faceted search. For creating simple faceted search applications Jassa is, however, a much larger and more complex solution than SPARQL Faceter. Sparklis [1] is an application for exploring and querying data from a SPARQL endpoint using a natural language interface13 , which also implements a faceted search. Sparklis focuses on data exploration and requires no prior knowledge of the data model of the target endpoint, whereas SPARQL Faceter aims to support development of applications tailored for a specific dataset. Oren et al. [8] have developed a prototype for faceted navigation of arbitrary RDF data with automatic facet selection, as a server-side solution. Future Work Plans for future development include: 1. Semi-automatic adaptation to new datasets. The tool could create a basic configuration for a new SPARQL endpoint, possibly based on only a class selection by the user. 2. Customizing facet appearance. To support customizing the appearance of the facets further , we are planning a configuration option for the user to define their own template for the facets. Acknowledgements Kasper Apajalahti and Jouni Tuominen contributed to an earlier version of our tool. This work was supported by the Linked Open Data Science project14 funded by the Ministry of Education and Culture in Finland. References 1. Ferré, S.: Expressive and scalable query-based faceted search over sparql endpoints. In: The Semantic Web–ISWC 2014, pp. 438–453. Springer (2014) 13 http://www.irisa.fr/LIS/ferre/sparklis/ 14 http:/seco.cs.aalto.fi/projects/lodsci/ 2. Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., Lee, K.P.: Finding the flow in web site search. CACM 45(9), 42–49 (2002) 3. Hearst, M.: Design recommendations for hierarchical faceted search interfaces. In: ACM SIGIR workshop on faceted search. pp. 1–5. Seattle, WA (2006) 4. Hyvönen, E., Saarela, S., Viljanen, K.: Application of ontology-based techniques to view-based semantic search and browsing. In: The semantic web: research and applications. First European Semantic Web Symposium (ESWS 2004). pp. 92–106. Springer-Verlag (2004) 5. Hyvönen, E., Tuominen, J., Alonen, M., Mäkelä, E.: Linked Data Finland: A 7-star model and platform for publishing and re-using linked datasets. In: The Semantic Web: ESWC 2014 Satellite Events, Revised Selected Papers. pp. 226–230. Springer- Verlag (May 2014) 6. Hyvönen, E., Heino, E., Leskinen, P., Ikkala, E., Koho, M., Tamper, M., Tuominen, J., Mäkelä, E.: WarSampo data service and semantic portal for publishing linked open data about the Second World War history. In: The Semantic Web—Latest Advances and New Domains (ESWC 2016). Springer-Verlag (May 2016) 7. Koho, M., Hyvönen, E., Heino, E., Tuominen, J., Leskinen, P., Mäkelä, E.: Linked death—representing, publishing, and using Second World War death records as linked open data. In: Proceedings of the 1st Workshop on Humanities in the Se- mantic Web - WHiSe (at ESWC 2016) (May 2016) 8. Oren, E., Delbru, R., Decker, S.: Extending faceted navigation for RDF data. In: The Semantic Web—ISWC 2006, pp. 559–572. Springer-Verlag (2006) 9. Pollitt, A.S.: The key role of classification and indexing in view- based searching. Tech. rep., University of Huddersfield, UK (1998), http://www.ifla.org/IV/ifla63/63polst.pdf 10. Sacco, G.M.: Dynamic taxonomies: guided interactive diagnostic assistance. In: Wickramasinghe, N. (ed.) Encyclopedia of Healthcare Information Systems. Idea Group (2005) 11. Stadler, C., Westphal, P., Lehmann, J.: Jassa: a JavaScript suite for SPARQL- based faceted search. In: Proceedings of the 2014 International Conference on Developers-Volume 1268. pp. 31–36. CEUR-WS. org (2014) 12. Tunkelang, D.: Faceted search, Synthesis lectures on information concepts, re- trieval, and services, vol. 1. Morgan & Claypool Publishers (2009)