A Faceted Search Index for OptiqueVQS Vidar N. Klungre and Martin Giese University of Oslo Abstract. We present an end-user oriented visual query system that combines faceted search with graph queries. Unlike previous systems, the architecture is designed to scale gracefully to both very large datasets and complex queries. This is achieved by setting up an indexing structure for facet values that can easily be scaled out arbitrarily. In return, it compromises some precision in computing sets of available facet values, but it does so in a highly configurable manner. Keywords: Faceted search, RDF, Index, Visual Query Interface 1 Introduction Faceted search [6] is a popular search and exploration paradigm which allows users to apply search filters to multiple orthogonal dimensions (facets) of the data. Filters can be added, removed or modified in any order, and every time this is done, the system immediately updates the list of results, giving the user instant feedback. To support this functionality, the system needs fast access to the underlying data. This is often provided by search engines like Lucene1 or Sphinx2 which provide better performance for the queries required by faceted search than RDB-based implementations. Faceted search has also been extended to semantics-based data, e.g. in Sem- Facet [1] and Rhizomer [2]. In these systems, datatype properties are treated as facets, but they also include some ability to query the graph structure via object properties. Combining faceted search with graph queries results in more expressive queries, but implementing faceted search also becomes computation- ally harder. The challenging task is to update the set of available values for the facets after each user interaction. The straightforward way of computing this set involves evaluating a query of similar complexity to the whole query built so far, and that needs to be done for each facet. For queries with large graph patterns, and large datasets, this can become too time consuming for an interactive sys- tem. The usual approach of using a search engine style index will not help, since these engines do not support graph queries. In this demo, we present a system that combines faceted search with graph queries, and that uses an indexing structure for facet values that can easily be scaled out arbitrarily. In return, it compromises some precision in computing sets of available facet values, in a highly configurable manner. 1 https://lucene.apache.org/ 2 http://sphinxsearch.com/ OptiqueVQS front end values concepts, relations, facet facet names final query Faceted Search Index controls s rol fil ls nt co Annotated Ontology Index config. Data (SPARQL endpoint) Fig. 1. Architecture of OptiqueVQS extended with the faceted search index. We implemented this new functionality as part of OptiqueVQS [5], an ontology- based visual query system, intended for users with little IT expertise. Op- tiqueVQS combines graph querying with filtering on datatype properties. How- ever, in previous versions of OptiqueVQS, the available values for each facet have been static, determined entirely by a suitably annotated ontology, and did not depend on the underlying data. Neither did they change in reaction to the filters on other facets. This new version adds a server side component that reads data from a SPARQL endpoint and stores information needed for efficient faceted search in a scalable index. This index, instead of the original SPARQL endpoint, is queried in reaction to user interactions to update the interface. The essence of the index structure is to use one table (that could be stored e.g. in a Lucene index) per concept available in the query interface. In addition to the facet values applicable to that concept, the index can also store information about the existence of object property links to other resources, facet values of such neighbouring resources, links from those to further resources, etc. How much information about neighbouring resources is to be stored in the index can be con- figured. Independently of the configuration, the index will provide an approxi- mation from above (a superset) of the possible remaining values for each facet. 2 Approximating Facet Value Computation Fig. 1 shows the architecture of OptiqueVQS with the new faceted search index and its configuration in the centre. This component was missing from previous versions, where the only contact between OptiqueVQS and the actual data was when the finished query was executed. Now the essential information for present- ing a reactive faceted search interface is pre-computed and stored in an index structure, which the front end communicates with. Given a partially constructed query Q and a focus variable ?x in Q, the index has to answer the question which values v for a given datatype property p on ?x are still possible. In other words: all values v such that extending Q with a tuple ?x p v would not make the query unsatisfiable. Answering this exactly will in general be as expensive as evaluating the whole partial query. We therefore approximate this computation, possibly computing a superset of the possible values, by considering only a part of Q, a neighbourhood of the focus variable ?x. In the simplest case, object properties are ignored, and classical faceted search on the data properties for the variable is performed, using only data filters on ?x. To do this effectively, an index table with one column per facet is created. Standard search engine technology can perform the required filtering and gathering of possible values in a scalable way. However, usability studies with OptiqueVQS have shown that users react to facet values that they consider to be obviously excluded by some of the object properties in their partial query. To accommodate an object property o, we can add a boolean-valued facet representing the existence of a triple ?x o ?y to some resource ?y in the data. The index data structure remains a simple table of facet values. As an example, consider the following data: :kona a :Drink; :kind "Coffee"; :suppliedBy :americanDrinks . :sencha a :Drink; :kind "Tea"; :suppliedBy :asianDrinks . :orangeJuice a :Drink; :kind "Juice" . :americanDrinks a :Supplier; :basedIn "America" . :asianDrinks a :Supplier; :basedIn "Asia" . The user builds a query for drinks supplied by companies based in America, see screenshot (a) in Fig. 2. In screenshot (b), the index is configured not to take the suppliedBy relation into account. Therefore, all three kinds of drinks are available as options for the Kind facet. In (c), the existence of a supplier is added to the index, and the choice Juice is disabled, since there is no product of that kind that has a supplier. This can be further refined by adding columns for datatype properties on the resources connected by object properties. These can then be taking into account when filtering for combinations of values permitted by Q. In the example, adding the based in facet of the supplier to the index table for drinks means that only Coffee will be available, as screenshot (d) shows. Which combinations of object and datatype properties should be added as columns to the search engine table, depending on the type of ?x, is controlled by the index configuration. The details of this configuration, how it controls index generation, and how the index is used to compute sets of possible facet values is explained in a technical report, together with a more elaborate example [3]. Independently of the configuration, the index has the shape of a single table that can be scaled out using standard methods to arbitrarily large datasets. If necessary, index storage and processing can be parallelized, which is considerably easier for single tables than for relational or graph databases. We will demonstrate our tool on an instance of the NPD benchmark [4]. An interesting remaining question concerns the relationship between per- ceived usability and index configuration. The larger the neighbourhood consid- ered by the index, the sharper the approximation of possible facet values, but the more space will also be required by the index. We surmise that users expec- tations of possible choices can be met by considering rather small configurations. To show this empirically in usability studies, and to get an idea of the trade-off between perceived usability and index size, are future work. (a) example query (b) without supplier (c) existence of supplier (d) location of supplier Fig. 2. Screenshots of the system suggesting values. Acknowledgements This work has been funded by the Norwegian Research Coun- cil through SIRIUS (NFR 237898). References 1. Marcelo Arenas et al. Faceted search over RDF-based knowledge graphs. J. Web Semantics, 37:55–74, 2016. 2. Josep Maria Brunetti, Roberto Garcı́a, and Sören Auer. From overview to facets and pivoting for interactive exploration of semantic web data. International Journal on Semantic Web and Information Systems (IJSWIS), 9(1):1–20, 2013. 3. Vidar N Klungre. A faceted search index for graph queries. Technical Report 469, Department of Informatics, University of Oslo, 2017. http://heim.ifi.uio.no/ martingi/pub/IndexReport.pdf. 4. Davide Lanti, Martin Rezk, Mindaugas Slusnys, Guohui Xiao, and Diego Calvanese. The npd benchmark for obda systems. In Proc. of the 10th Int. Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2014), volume 1261 of CEUR Workshop Proceedings, pages 3–18, 2014. 5. Ahmet Soylu et al. Experiencing OptiqueVQS: a multi-paradigm and ontology- based visual query system for end users. UAIS, 15(1):129–152, 2016. 6. Daniel Tunkelang. Faceted search. Synthesis lectures on information concepts, retrieval, and services, 1(1):1–80, 2009.