On Faceted Search over Knowledge Bases ? Bernardo Cuenca Grau1 , Evgeny Kharlamov1 , Dmitriy Zheleznyakov1 , Marcelo Arenas2 , and Šarūnas Marciuška1 1 University of Oxford; 2 Pontificia Universidad Católica de Chile Motivation An increasing number of applications rely on RDF [2] and SPARQL 1.1 [3] for storing, publishing, and querying semistructured data. The functionality of many such applications is enhanced with OWL 2 ontologies [1], which are used to provide a conceptual layer on top of data and enrich query answers with implicit information. Although the growing popularity of RDF, OWL 2, and SPARQL 1.1 has been ac- companied by the development of better and better query answering engines, writing SPARQL 1.1 queries is not well-suited for the majority of users. Thus, an important challenge is the development of simple yet powerful query interfaces that capture well- defined fragments of SPARQL 1.1. Faceted search is a prominent approach for accessing document collections that allows users to narrow down search results by incrementally applying filters, called facets, on the annotations associated to documents [18]. Faceted search has become a mainstream commercial technology, and it is ubiquitous in e-commerce websites. For example, hotel booking websites such as Booking.com allow users to refine search results by selecting suitable values in facets such as ‘Price’, ‘Star Rating’, or ‘Facilities’. Faceted search has been proposed as a suitable paradigm for querying document collections annotated with RDF, and several RDF-based faceted search systems have been developed [4, 6, 8, 13, 16, 9–11, 14, 12]. Existing approaches are, however, rather systems-oriented, and there is a lack of rigorous theoretical underpinnings. Further- more, existing works have focused mostly on RDF, thus essentially disregarding the role of OWL 2 ontologies. In particular, the following key questions have not been satisfactorily addressed in the literature: (Q1) What fragments of SPARQL 1.1 can be naturally captured using faceted search as a query paradigm? (Q2) What is the complexity of answering such queries? (Q3) What does it mean to generate and interactively update an interface according to a given ontology? Our goal is to provide such solid foundations. We have formalised faceted interfaces tailored towards graph-based data models, and identified a fragment of first-order logic capturing the underlying queries. We have studied the complexity of answering such queries for RDF and ontologies expressed in the OWL 2 profiles. Moreover, we have devised practical and generic algorithms for recomputing faceted interfaces in response to user actions. Finally, we have implemented and tested our faceted search algorithms ? Work supported by the Royal Society, the EPSRC projects Score!, Exoda, and MaSI3 , and the FP7 project OPTIQUE. in a prototype system [7], with encouraging results. In this extended abstract, we pro- vide a short overview of the main ideas underlying our approach. Technical Approach To illustrate our definitions and make our discussion concrete, we consider an excerpt of Yago [17] about US presidents, which is depicted in Figure 1. In this figure, dth and dbi stand for T. Roosevelt and B. Clin- country president person university ton, which are annotated as US dusa citizenOf presidents with ‘USpres’. More- dth dke dh founding hasChild gradFrom over, Theodore’s son Kermit dke date dbi dch ds 1783-09-03 dg and Bill’s daughter Chelsea dch are categorised as ‘person’. Fi- Fig. 1. Annotated entities nally, Stanford ds , Harvard dh , and Georgetown dg are anno- tated with ‘university’, and the USA dusa with ‘country’, which, in turn, is annotated with its ‘founding date’ 1783-09-03. Our goal is to find US presidents who graduated from either Harvard or Georgetown and have a child who graduated from Stanford. We model facets as pairs consisting of a predicate (or facet name) and a set of val- ues (typically documents represented by URIs or literals). Examples of facet names are the binary relations ‘gradFrom’ and ‘dateOfBirth’, and examples of values for these facets are specific documents such as ‘ds ’ (Stanford) and literals such as ‘1858-10-27’. Selection of multiple values within a facet can be interpreted either conjunctively or disjunctively, and hence we distinguish between conjunctive and disjunctive facets. Fur- thermore, we distinguish a special facet type, whose values are categories (i.e., unary predicates) rather than specific documents or literals. Finally, we also allow for a spe- cial value any which denotes the set of all values compatible with the facet predicate. We assume that C, UP and BP are pairwise disjoint infinite sets of constants, unary predicates and binary predicates. Definition 1. Let type and any be symbols not occurring in C ∪ UP ∪ BP. A facet is a pair (X, ◦Γ ), with ◦ ∈ {∧, ∨}, Γ a non-empty set, and either (i) X = type and Γ ⊆ UP, or (ii) X ∈ BP, any ∈ Γ and either Γ ⊆ C ∪ {any} or Γ ⊆ UP ∪ {any}. A facet of the form (X, ∧Γ ) is conjunctive, and a facet of the form (X, ∨Γ ) is disjunctive. In a facet F = (X, ◦Γ ), X is the facet name, denoted by F |1 , and Γ contains the facet values and it is denoted by F |2 . The following facets could be of use when searching over these documents. F1 = (type, ∨{USpres, president, person, country, university}), F2 = (hasChild, ∨{any, dke , dch }), F3 = (gradFrom, ∨{any, dh , ds , dg }), F4 = (citizenOf, ∧{dusa , duk }), F5 = (citizenOf, ∨{dusa , duk }). F1 can be exploited to restrict types of entities, F2 to narrow down search results to entities with children, where ‘any’ can be used to say that we are not looking for a specific child, F3 to select an alma mater, and F4 and F5 to select a citizenship. Faceted Interfaces. A faceted interface represents an arrangement of facets that can be displayed for users, and captures the choices of facet values made by them. Thus, it encodes both a query, whose answers constitute the current search results, and the possible choices of facet values available to users for further refinement. In contrast to traditional faceted search, our notion of interface allows the user to ‘navigate’ across interconnected sets of documents and establish independent filters to each of them. Definition 2. A basic faceted interface (BFI) is a pair (F, Σ), with F a facet and Σ ⊆ F |2 the set of selected values. The set of faceted interfaces (or interfaces, for short) is given by the following grammar, where I0 and I1 = (F, Σ) are BFIs and F |1 ∈ BP: I ::= path | (path ∧ path) | (path ∨ path), path ::= I0 | (I1 /I). A BFI encodes user choices for a specific facet, e.g., the BFI (F1 , {USpres}) selects the documents categorised as US presidents. BFIs are put together in paths: sequences of nested facets that capture navigation between sets of documents. With nesting ‘/’ we capture queries such as ‘people with a child who graduated from Stanford’ by using the interface (F2 , {any})/(F3 , {ds }) which first selects people having (any) children and then those children with a Stanford degree. Finally, two types of branching can be applied: (path1 ∧ path2 ) indicates that search results satisfy the conditions in both path1 and path2 , while (path1 ∨ path2 ) indicates that they satisfy those in path1 or path2 . The following interface Iex encodes the query ‘return US presidents who graduated from Harvard or Georgetown and who have a child who graduated from Stanford’.   (F1 , {USpres}) ∧ (F3 , {dh , dg }) ∧ (F2 , {any})/(F3 , {ds }) . Faceted Queries. Queries encoded in faceted interfaces can be captured by formulae in the positive existential fragment of first-order logic with one free variable, where in every disjunction ϕ1 ∨ ϕ2 , the subformulae ϕ1 and ϕ2 share at most one variable. Moreover, these queries involve predicates of arity at most two and are tree shaped. The output variable of a faceted query is the root variable in the query graph. We also consid- ered extensions of faceted interfaces that allow us to choose different output variables (a functionality typically referred to as refocusing [5]). The query encoded by Iex returns two presidents: Roosevelt and Clinton. We investigated combined complexity of faceted query evaluation for ontologies expressed in the OWL 2 profiles [15]. We showed that the problem is PT IME-complete for RL and EL ontologies and NP-complete for QL. We also investigated combined complexity under the active domain semantics (the de- fault in SPARQL query evaluation), and showed that it is tractable for all the profiles. Interface Generation and Update. Faceted navigation is an interactive process. Starting with an initial interface generated from a keyword search, users ‘tick’ or ‘untick’ facet values and the system reacts by updating both search results (query answers) and facets available for further navigation. We formally captured this interaction and proposed in- terface generation and update algorithms that are ‘guided’ by the (explicit and implicit) information in the ontology. Our algorithms are based on the same general principle: each element of the initial interface (resp. each change in an interface as a response of an action) must be ‘justified’ by a suitable entailment in O. In this way, by exploring the ontology, we can guide users in the formulation of meaningful queries. References 1. W3C: OWL 2 Web Ontology Language. http://www.w3.org/TR/owl2-overview/ 2. W3C: Resource Description Framework (RDF). http://www.w3.org/RDF/ 3. W3C: SPARQL 1.1 Query Language. www.w3.org/TR/sparql11-query/ 4. Berners-Lee, T., Hollenbach, J., Lu, K., Presbrey, J., Prudhommeaux, E., Schraefel, M.M.C.: Tabulator Redux: Browsing and Writing Linked Data. In: LDOW (2008) 5. Clarkson, E., Navathe, S.B., Foley, J.D.: Generalized Formal Models for Faceted User Inter- faces. In: JCDL. pp. 125–134 (2009) 6. Fafalios, P., Tzitzikas, Y.: X-ENS: Semantic Enrichment of Web Search Results at Real- Time. In: SIGIR (2013) 7. Grau, B.C., Kharlamov, E., Marciuska, S., Zheleznyakov, D., Arenas, M., Jimenez-Ruiz, E.: SemFacet: Semantic Faceted Search over Yago. In: WWW Demo (2014) 8. Hahn, R., Bizer, C., Sahnwaldt, C., Herta, C., Robinson, S., Bürgle, M., Düwiger, H., Scheel, U.: Faceted Wikipedia Search. In: BIS (2010) 9. Heim, P., Ziegler, J., Lohmann, S.: gFacet: A browser for the web of data. In: IMC-SSW. CEUR-WS, vol. 417 (2008) 10. Hildebrand, M., van Ossenbruggen, J., Hardman, L.: /facet: A Browser for Heterogeneous Semantic Web Repositories. In: ISWC (2006) 11. Huynh, D., Mazzocchi, S., Karger, D.R.: Piggy Bank: Experience the Semantic Web Inside Your Web Browser. J. Web Sem. 5(1) (2007) 12. Huynh, D.F., Karger, D.R.: Parallax and Companion: Set-based Browsing for the Data Web (2013) 13. Hyvönen, E., Saarela, S., Viljanen, K.: Ontogator: Combining View- and Ontology-Based Search with Semantic Browsing. In: XML Finland (2003) 14. Kobilarov, G., Dickinson, I.: Humboldt: Exploring Linked Data. In: LDOW (2008) 15. Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL 2 Web Ontol- ogy Language Profiles. W3C Recommendation (2009) 16. m.c. schraefel, Smith, D.A., Owens, A., Russell, A., Harris, C., Wilson, M.L.: The Evolving mSpace Platform: Leveraging the Semantic Web on the Trail of the Memex. In: Hypertext (2005) 17. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: WWW (2007) 18. Tunkelang, D.: Faceted Search. Synthesis Lectures on Information Concepts, Retrieval, and Services, Morgan & Claypool Publishers (2009)