=Paper=
{{Paper
|id=Vol-1690/paper75
|storemode=property
|title=SemFacet: Faceted Search over Ontology Enhanced Knowledge Graphs
|pdfUrl=https://ceur-ws.org/Vol-1690/paper75.pdf
|volume=Vol-1690
|authors=Bernardo Cuenca Grau,Evgeny Kharlamov,Sarunas Marciuska,Dmitriy Zheleznyakov,Marcelo Arenas
|dblpUrl=https://dblp.org/rec/conf/semweb/GrauKMZA16
}}
==SemFacet: Faceted Search over Ontology Enhanced Knowledge Graphs==
SemFacet: Faceted Search over
Ontology Enhanced Knowledge Graphs?
B. Cuenca Grau1 E. Kharlamov1 Š. Marciuška2 D. Zheleznyakov1 M. Arenas3
1 2 3
University of Oxford Microsoft Bing Pontificia Universidad Catolica de Chile
Abstract. In this demo we present the SemFacet system for faceted search over
ontology enhanced Knowledge Graphs (KGs) stored in RDF. SemFacet allows
users to query KGs with relatively complex SPARQL queries via an intuitive
Amazon-like interface. SemFacet can compute faceted interfaces over large scale
RDF datasets by relying on incremental algorithms and over large ontologies by
exploiting ontology projection techniques. SemFacet relies on an in-memory triple
store and current implementation bundles JRDFox, Sesame, Stardog, and PAGOdA.
During the demonstration the attendees can try SemFacet by exploring Yago KG.
1 Introduction
Knowledge graphs (KGs) such as Yago and Freebase have become a powerful asset
for enhancing search and are being intensively used in both academia and industry.
Many existing KGs are either available as Linked Open Data, or they can be exported as
RDF datasets enhanced with background knowledge in the form of an OWL 2 ontology.
Formulating queries over data in this format is, however, a challenging task for end users,
which has been noticed for, e.g., industry [9, 10, 12] and Life Science [6].
Faceted search is the de facto approach for exploratory search in many online
applications such as Amazon and Booking.com. Faceted search allows for querying
collections of entities where users can narrow down the search results by progressively
applying filters, called facets [14]. A facet typically consists of a predicate (e.g., ‘gender’
or ‘occupation’ when querying entities about people) and a set of possible string values
(e.g., ‘female’ or ‘research’), and entities in the collection are annotated with predicate-
value pairs. During faceted search users iteratively select facet values and the entities
annotated according to the selection are returned as the search result.
Faceted search in the context of RDF has received significant attention and a number
of systems have been developed (see [3] for an overview of these systems). In our previous
studies [2, 3, 7] we have developed a rigorous theoretical underpinnings for faceted
search in the context of RDF-based KGs enhanced with OWL 2 ontologies: we identified
well-defined fragments of SPARQL that can be naturally captured using faceted search
as a query paradigm, and established the computational complexity of answering such
queries; we also studied the problem of updating faceted interfaces (FIs), which is critical
for guiding users in the formulation of meaningful queries during exploratory search.
In this demonstration we present our faceted search system SemFacet that implements
thiese techniques and demonstrate it on the Yago KG [13].
?
This demo is accompanying our ISWC’16 presentation selected at the journal papers track [3].
We note that SemFacet presented here extends the one demonstrated earlier [1, 4, 5]: we have
extended the backend with new reasoners, implemented our keyword search engine, improved
the front-end, and implemented incremental update of faceted interfaces. This research was
supported by the Royal Society, the EPSRC projects Score!, DBOnto, MaSI3 , and ED3 and the
EU FP7 project Optique (n. 318338).
Client
Start SemFacet search
politicians Search User enters keywords
type http://en.wikipedia.org/wiki/Bill_Clinton
William Jefferson "Bill" Clinton (born William
USpres
Country
Jefferson Blythe III; August 19, 1946) is an
American politician who served as the 42nd
Relevant object IDs are computed
President of the United States from 1993 to
has child 2001. Inaugurated at age 46, he was the third-
ANY
youngest president. He took office at the end
of the Cold War, and was the first president of
Facets of initial FI Snippets are computed
the baby boomer generation...
grad from are computed
Stanford Uni.
grad from
Stanford Uni. Initial FI and snippets are displayed
Harvard Uni.
Georgetown Uni.
User (un)selects facet values
Composer of
Query Snippet
Faceted Facets of FI is updated Snippets are updated
Converter Composer
Interfaces
Server Updated FI and snippets are displayed
Query Reaso Facet Snippet
Answering ners Generator Generator User refocusses
Search
Engine End SemFacet search
Inverted Index RDF Data, Query
Ontology, Answers user's input
on DRF Data
Materialisation Rules, server component
Facet Graph Triple Store
client component
Fig. 1: Left: SemFacet workslow; Right: SemFacet architecture and screenshot
2 SemFacet System
SemFacet is implemented in Java and available for download under an academic license.
The system can be obtained from our project website [11], where we also provide a
collection of test data and detailed installation and configuration instructions. SemFacet
is also available on GitHub [8].
System Architecture. SemFacet is based on a modular architecture, which is depicted in
Figure 1 (Left). On the client side, SemFacet implements a GUI developed using HTML
5 consisting of three main parts: a free text search box for keywords, a hierarchically
organised faceted interface, and a scrollable panel containing snippet-shaped answers.
In the upper part of Figure 1 (Left) we present a screenshot of SemFacet GUI that
corresponds to the following search scenario: the user is looking for politicians who are
US presidents, graduated from Harward or Georgetown, and whose children graduated
from Stanford. User keywords such as ‘politicians’ are sent by the client to the server
where they are processed by the search engine. For efficiency reasons, we implemented
our own simple engine based on an inverted index, and also allowed for the possibility of
delegating keyword search to Lucene.
User selections in the faceted interface are compiled into a SPARQL query using
the query converter and then sent to the back-end reasoner for evaluation. The snippet
and interface composers receive information about facets and answers that should be
displayed to the user and update the currently displayed interface and query answers. In
Figure 1(Left) the answer is Bill Clinton and it is displayed in the form of a snippet with a
photograph and wiki description. The system updates the faceted interface incrementally:
only the parts of the interface that are affected by users’ actions are updated, which
allows for a significantly faster response time. On the server side, the system relies
on an in-memory triple store to store the inverted index, input data and ontology and
other important information. The current implementation bundles JRDFox, Sesame,
Stardog, PAGOdA, and HermiT. Any other in-memory triple store providing similar
functionality can be seamlessly integrated with SemFacet. The facet generator is the
back-end component responsible for constructing the interface in response to user actions,
while the query answering component of the back-end executes the SPARQL query
obtained from the query converter using the reasoning engine selected by the user.
SemFacet Workflow SemFacet’s workflow is summarised in Figure 1 (Right). The
user initiates the search by entering a set of keywords (e.g., ‘politicians’), which are
then matched to textual information associated to URIs in the data (such as labels and
descriptions) resulting in an initial set of relevant URIs.1 SemFacet then computes the
initial interface (with no value selections) based on these relevant URIs, which constitutes
the starting point for faceted navigation. The main tasks performed by SemFacet are
realised in the system as follows:
– Matching of keywords. SemFacet exploits the values of annotation properties to
determine whether a URI is relevant to a set of keywords. Intuitively, a URI u is
relevant to a keyword k w.r.t. an annotation property R if the input data has a triple
of the form (u, R, w), where w is a string containing k; and u is relevant to a set of
keywords if at least one of them occurs in w.
– Interface generation and update. In order to generate and update faceted interfaces
SemFacet relies on a so-called facet graph that consists of the input RDF data and
a projection of the input ontology on a graph structure (see [3] for details). The
part of this graph that corresponds to entailed RDF data is materialised offline
at loading time, while the part corresponding to the ontology is computed in the
online phase by querying the materialised graph. Faceted interfaces are updated
in response to user actions incrementally moreover, SemFacet minimises faceted
interfaces by hiding facet values whose selection leads to empty answer sets or does
not affect the currently computed answer set. Finally, the current version of the
system can be customised so that facet values are hierarchically arranged according
to a user-specified predicate, which greatly facilitates navigation in the presence of a
large number of values per facet.
– Query generation and execution. SemFacet compiles user selections from the faceted
interface into SPARQL queries, which are then evaluated using a reasoner. Our
system currently bundles several reasoning engines with different capabilities, and
users can select the reasoner that is deemed more appropriate for their application at
hand. Answers to SPARQL queries are typically returned by reasoners in the form
of a URI. This may not be very informative for end users; hence, SemFacet also
displays the annotations associated to the answer URIs and displays them in the
form of a snippet.
Configuring the System. SemFacet offers a range of options for system administrators
to deploy and configure the system These include (i) the reasoning engine of choice
(JRDFox, PAGOdA, Sesame, Stardog, or HermiT); (ii) the annotation properties relevant
for keyword search and displaying of query answers; and (iii) the facet that is first
displayed to the user. By default, values within a facet are interpreted disjunctively;
however, SemFacet provides advanced configuration capabilities for specifying which
facets must be interpreted conjunctively. Additionally, the hierarchical display of facet
values can also be configured by specifying the property used to construct the hierarchy
(typically rdfs:subClassOf or a property capturing a partonomy relation).
3 Demonstration Scenarios
The demo attendees will experience Yago with SemFacet. They will be able to search
Yago either using their own search tasks or using the tasks that we prepare for them. We
now discuss the dataset that we prepare for the demonstration.
1
If the given set of keywords is empty, the system considers all URIs in the data as relevant.
Yago comes in several slices available for download and since the current version
of SemFacet relies on main memory triple stores, we took only some slices of Yago that
could fit in the main memory of our machine. In particular, we took the Taxonomy slice,
which consists of domain and range restrictions as well as subclass relations, and the Core
slice, which contains instances of object and annotation properties. The axioms from
Taxonomy constitute the ontology that we prepared for the demo. We refer to this ontology
together with the Taxonomy and Core data slices as FYago. In order to generate snippets
we included in both slices DBpedia abstracts, thumbnails, and links to Wikipedia articles.
FYago contains 97 million triples involving over 3 million URIs among which 55%
of triples relate entities via object properties, 16% relate entities to numbers, 2% relate
entities to dates, and 27% relate entities to other kinds of strings.
FYago has 89 predicate URIs which gives an upper bound to the number of facets
that the user will see during faceted search. We analysed popularity of these predicates,
that is how ofter the users will see them during faceted search and found out that 8
facet predicates have popularity exceeding 1 million entities and include hasLongitude,
hasLatitude, and rdf:type, thus, a facet involving such predicate will occur in most search
sessions. Then, 12 facet predicates with popularity between 100,000 and 1 million, which
implies that they will occur rather often. The remaining 69 facet predicates have popularity
below 100,000, and thus they will occur rarely; e.g., a facet predicate with popularity
1,000 is relevant to 1,000 entities only, and hence only to 0.025% of all data triples.
4 References
[1] M. Arenas, B. Cuenca Grau, E. Kharlamov, Š. Marciuška, and D. Zheleznyakov. Enabling
Faceted Search over OWL 2 with SemFacet. In: OWLED. 2014.
[2] M. Arenas, B. Cuenca Grau, E. Kharlamov, Š. Marciuška, and D. Zheleznyakov. Faceted
Search over Ontology-Enhanced RDF Data. In: CIKM. 2014.
[3] M. Arenas, B. Cuenca Grau, E. Kharlamov, Š. Marciuška, and D. Zheleznyakov. Faceted
Search over RDF-based Knowledge Graphs. In: J. Web Sem. 37 (2016).
[4] M. Arenas, B. Cuenca Grau, E. Kharlamov, Š. Marciuška, and D. Zheleznyakov. Towards
Semantic Faceted Search. In: WWW (Companion Volume). 2014.
[5] M. Arenas, B. Cuenca Grau, E. Kharlamov, Š. Marciuška, D. Zheleznyakov, and E. Jiménez-
Ruiz. SemFacet: Semantic Faceted Search over Yago. In: WWW (Companion Volume).
2014.
[6] B. Cuenca Grau, E. Kharlamov, Š. Marciuška, D. Zheleznyakov, and Y. Zhou. Querying
Life Science Ontologies with SemFacet. In: SWAT4LS. 2014.
[7] B. Cuenca Grau, E. Kharlamov, D. Zheleznyakov, M. Arenas, and Š. Marciuška. On Faceted
Search over Knowledge Bases. In: DL. 2014.
[8] GitHub of SemFacet. https://github.com/semfacet.
[9] E. Kharlamov, D. Hovland, E. Jimenez-Ruiz, D. Lanti, H. Lie, et al. Ontology Based Access
to Exploration Data at Statoil. In: ISWC. 2015.
[10] E. Kharlamov, N. Solomakhina, Ö. L. Özçep, D. Zheleznyakov, T. Hubauer, et al. How
Semantic Technologies Can Enhance Data Access at Siemens Energy. In: ISWC. 2014.
[11] SemFacet Project Page. http://www.cs.ox.ac.uk/isg/tools/SemFacet/.
[12] A. Soylu, E. Kharlamov, D. Zheleznyakov, E. Jiménez-Ruiz, M. Giese, and I. Horrocks.
Ontology-Based Visual Query Formulation: An Industry Experience. In: ISVC. 2015.
[13] F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A Core of Semantic Knowledge. In:
WWW. 2007.
[14] D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009.