=Paper=
{{Paper
|id=Vol-1963/paper650
|storemode=property
|title=Ranking, Aggregation, and Reachability in Faceted Search with SemFacet
|pdfUrl=https://ceur-ws.org/Vol-1963/paper650.pdf
|volume=Vol-1963
|authors=Evgeny Kharlamov,Luca Giacomelli,Evgeny Sherkhonov,Bernardo Cuenca Grau,Egor Kostylev,Ian Horrocks
|dblpUrl=https://dblp.org/rec/conf/semweb/KharlamovGSGKH17
}}
==Ranking, Aggregation, and Reachability in Faceted Search with SemFacet==
Ranking, Aggregation, and Reachability in
Faceted Search with SemFacet?
Evgeny Kharlamov1 Luca Giacomelli2 Evgeny Sherkhonov1
Bernardo Cuenca Grau1 Egor V. Kostylev1 Ian Horrocks1
1 2
University of Oxford, UK Sapienza University of Rome, Italy
1 Introduction
Motivation. Faceted search is a prominent search paradigm that became the standard in
many Web applications including online shopping and real-estate portals, where users
can progressively narrow down the search results by applying filters, called facets [9]
which are organised in faceted interfaces. Faceted search has also been proposed in the
Semantic Web context for exploring and querying RDF graphs and a number of RDF-
based faceted search systems have been developed (see [3,7,4,5,2,6] for an overview).
One of the main challenges that hampers usability of faceted search systems is in-
formation overload [9]: when the size of the faceted interface becomes comparable to
the size of data over which the search is performed. The overload is already a challenge
in the case of the classical faceted search and it becomes even stronger when faceted
search is applied to RDF data. Indeed, observe that in both classical and RDF settings
the search is over annotated entities, at the same time, in the latter case the number of
annotations and possible values is typically much larger: any predicate occurring in an
RDF data set can give a facet during a search, and any entity or value that it points to in
the RDF data can become a value in this facet. At the same time, in the classical case
the list of possible facets is typically predefined and controlled.
Moreover, differently from the classical case, in the RDF settings entities are also
interconnected. Thus, in an RDF driven faceted interface one has to nest facets in order
to reflect this interconnections. In Figure 1, left, one can see a possible way to nest
facets which is implemented in our SemFacet system. This not only raises a non-trivial
problem of how to arrange nested facets within an interface in an intuitive and user
oriented way, but also leads to an unavoidable overload of the interface with various
paths of nested facets. Thus, faceted navigation over RDF requires from users to be
experts in the structure of the underlying RDF graph in order to be able to find the
required facet which can be deeply nested.
In order to see why it might be hard to find a required nested facet, consider in
Figure 2 an example RDF data. Assume that one is looking for a smartphone with
the price within £500–£900 and the processor that is manufactured by a company with
headquarters in Asia. One can see in Figure 2 that Samsung S8 is such a smartphone. At
the same time, finding this smartphone via a faceted interface requires one to traverse
the data via 4 nested facets (Figure 1, left).
SemFacet System. In our RDF-based faceted search system SemFacet we propose to
address the information overload by
– ranking facets and their values and thus offering to users top-k most prominent
facets and values;
?
This work was supported by the Royal Society under a University Research Fellowship and
the EPSRC under an IAA award and the projects DBOnto, MaSI3 , ED3 , and VADA.
phone keywords Search phone Search
type Samsung Galaxy S8 type Exycon 8 Processor
Galaxy S8 is a phone that offers an exceptional ANY Galaxy S8 is built around the new Exynos 8895,
SmartPhone experience for any user. The large screen is a real SmartPhone (8.270) featuring new custom CPU cores, new Mali GPU, and
Laptop turning point in flagship phone design and should usher Laptop with the whole thing fabricated via a cutting-edge 10nm
(5.850)
in the end of large bezels, and the camera and slick process intended to maximize performance and power
performance work brilliantly under the finger. efficiency.
hasPart ANY hasPart
Processor Processor (2.490) refocusing refocussed answer
producedBy producedBy Search Task:
ANY ANY Show me processors of smartphones
(1.800)
answer ▪ whose processor
withHQ withHQ ▪ is manufactured by a company
ANY ANY ▪ with headquarters in Asia
(630)
Reachable facets ▪ and whose max price among
inCountry San Diego (8)
providers is £500-£900
Suwon (5)
ANY
inContinent facet predicate enter facet inContinent
Search Task:
ANY price max aggregate facets Asia (1)
Find smartphones
Asia facet values ▪ whose processor N. America (3)
▪ is manufactured by a 0 500 900 1.600
North America
company
price ▪ with headquarters in Asia
▪ and whose price is within
£500-£900
0 500 900 1.600
Fig. 1. Left: SemFacet with deep nesting over RDF data; Right: SemFacet enhanced with rank-
ing, aggregation, and reachability
– minimising the number of values within a facet by first grouping them according to
the corresponding entities and then aggregating them with the standard aggregate
functions max, min, count, sum, avg;
– shortcutting paths of nested facets with the help of a reachability operator.
Observe in Figure 1, right, the SemFacet interface where all these three features
are incorporated. The facets and values are now ranked and only the top 10 of them
are displayed. The users can choose whether to look for a maximal price of entities by
activating aggregate functions within facets with numerical values. In the screenshot
the user selected max and thus the system is searching for smartphones whose maximal
price across providers is within the range £500–£900. Finally, the users can enter the
name of a desired predicate instead of choosing a facet value in order to ask the system
to find a shortcut (a reachable facet). In the screenshot the user entered inContinent
instead of selecting SanDiego or Suwon.
Demo Overview. The goal of the demo is to show the attendees how our novel features,
that is, ranking, aggregation, and shortcutting, make hard faceted search—over RDF
datasets that are highly interconnected and with many data values—easier. In order to
experience the impact of these features the attendees will be able to explore the demo
dataset using two versions of our SemFacet systems: with and without the features.
2 SemFacet System
We first give an overview of SemFacet with the focus on its novel components and then
present technical details behind ranking, aggregation, and shortcuts.
System Overview. SemFacet is implemented in Java and available under an academic
license [1]. In Figure 2 there is the architecture of SemFacet where the arrows denote
the data flow between the systems’ components.
On the client side, SemFacet has an HTML 5 based GUI consisting of three main
parts: a free text search box for keywords, a hierarchically organised faceted interface,
and a scrollable panel containing snippet-shaped answers. User keywords are sent by
the client to the server, evaluated and this gives the initial faceted interface. User se-
lections in the faceted interface are compiled into a SPARQL query using the SPARQL
query constructor and then sent to the server for evaluation. The snippet composer and
facet composer receive information about facets and answers that should be displayed
to the user and update the currently displayed interface and query answers. The sys-
tem updates the faceted interface incrementally: only the parts of the interface that are
affected by users’ actions are updated, which allows for a significantly faster response
2
4
800 900 290 270 type
SmartPhone (8.270)
Exycon 8 Processor
Galaxy S8 is built around the new Exynos 8895,
featuring new custom CPU cores, new Mali GPU, and
Laptop with the whole thing fabricated via a cutting-edge 10nm
(5.850) process intended to maximize performance and power
phone phone efficiency.
price price price price hasPart
Processor (2.490)
hasPart producedBy withHQ inCountry
:SamsungS8 :Exynos :Samsung :Suwon :S .Korea
Facet SPARQL Q. Snippet
Composer Constructor Composer
type type type inContinent Client
Search Snippet Server
Smartphone Processor Company :Asia :NorthAmerica Facet Generator
Engine Generator
type type type inContinent Aggregation Shortcuts
Handler Handler
hasPart producedBy withHQ inCountry
:HP Elite X3 :Snapdragon :Qualcomm :SanDiego :USA Ranking Range
Handler Handler
price price price
RDF Data, Rules
730 280 300 RDFox: Storage and Query Execution
Fig. 2. Example RDF graph about products
Fig. 2. Left: Example RDF graph about products; Right: Architecture of SemFacet
Let O = [F1 , . . . , Fn ] be a possible ordering of S. Our function f computes the
score of O time.
by combiningThe user is ablethree
the following to do refocusing.
characteristics of O:This feature of SemFacet can be observed in
Figure 1, right, where the user can click on a box in the hasPart facet to change the
(1) selectivity of facets in O,
answers
(2) diversity of facets from phones to their parts, that is, from Samsung S8 to its processor Exycon 8.
in O, and
(3) nesting depthOnof the
facetsserver
in O. side, the system relies on an in-memory triple store RDFox [8] to
store
In particular, theweinverted
for (1) index,
prefer those input
facets that are RDF data, query
more selective, answers, and all necessary auxiliary
i.e., ticking
values in the facet narrow such
information down the search result more rapidly
as materialisation than doing
rules which wesodiscuss later in this section. One of
in other facets. For (2), we prefer those facets that lead to results which are
not coveredthe server other
by selecting components is an
facets. Finally, inverted
for (3), we preferindex based
facets that allowfull-text search engine; in order to
ensure
deeper nesting a betterexplore
thus allowing integration
the graphbetween
structure offull-text anddata.
the underlying faceted search and thus achieve good
We now define
efficiency of SemFacet
the functions correspondingweeach of (1)-(3) andour
implemented thenowncombine
search engine. Another backend com-
them into the final scoring function f .
Let F denote a facet, A the set of current answers, r(F ) ✓ A the setentities
ponent is snippet composer that for given answer of retrieves their textual descrip-
tions, by
answers obtained images,
selectingand links.
any value The
in F , andnext
d > 0component is facet generator that constructs faceted
a predefined threshold
parameter interfaces
for nesting depth. To account to
in response for user
(1), weactions.
compute the
Thisselectivity
componentscore is backed by four handlers: ag-
of F given A as follows:
gregation, shortcuts, ranking, and ranges. The range handler computes and stores left
and right boundssel(F, A)for
= 1 the
logrange sliders for numerical facet values, like the price-slider
|A| |r(F )|.
in Figure 1, that correspond to the lower and upper bounds of possible values for this
In particular, the less search results is obtained by applying F , the higher the
selectivity property name
score of F is. in theforunderlying
To account (2), we consider RDF data. of the Set-
a variation
Cover ranking [5] to compute the overlap score of Fi 2 O w.r.t. O as follows:
Ranking facets. Whenever n
SemFacetj
updates the interface it should decide in which
1 X [
order toi ,present
overlap(F O) = relevant | facets.
r(Fi ) \ This r(F is mdone
)| in two steps. First, we compute the set
n ⇥ |r(Fi )| j=1
S of all relevant facets that should m=1,m6be
=i displayed in the same level of nesting. Assume
In particular, S has
thata high n facets.
overlap score of Then,
facet Fi for
means each
that of n! possible
selecting orderings of S we find the optimal
values in it
order
leads to results thatusing
are notour scoring
present function
in the first f and
highly ranked SemFacet
facets. displays the facets from S (or some
Thus, highly
positioned offacets (according
them) to the overlap
according to thisscore)
order.provide
Ourmore diversefresults.
function computes the score of a given ordering
To account for (3), we compute the depth score depth(F, d) of F given d as the
of facet by combining the following three characteristics of the order: (1) selectivity
of its facets, (2) diversity of its facets, and (3) nesting depth of its facets. In particular,
for (1) we prefer those facets that are more selective, i.e., ticking values in the facet
narrow down the search result more rapidly than doing so in other facets. For (2), we
prefer those facets that lead to results which are not covered by selecting other facets.
Finally, for (3), we prefer facets that allow deeper nesting thus allowing explore the
graph structure of the underlying data. We now define the functions corresponding each
of (1)-(3) and then combine them into the final scoring function f .
Ranking facet values. Besides ranking of facets, it is important to rank facet values
as well. We adopt the count-based ranking (see also [10]): facet values that lead to a
larger number of results are ranked higher. Although the computation of counts is con-
ceptually trivial, the main challenge is in their update. Indeed, the integers associated
to each facet value in the interface must be updated every time the user interacts with
the faceted interface without affecting the performance of the system. Additional chal-
lenges come from (i) graph structure of the data and (ii) interplay of conjunctive and
disjunctive interpretation of facet values. In our experience, implementing the straight-
3
forward approach to updating counts leads to a significant increase of the user interface
response time. To mitigate this problem, we adopted a multi-threading solution: each
thread receives a set of facet values for which counts need to be updated and, addition-
ally, the load among the threads is balanced. For instance, we avoid the situation when
one thread is busy with updating counts for values with a high number of results only,
while another gets away with updating counts for values with a low number of results.
This approach led to a significant response time decrease.
Aggregation. Recall that every interface update performed by the user (i.e., refocusing,
selection of a facet or a facet value) results in formulating a corresponding SPARQL
query on the fly that is then issued to RDFox. Then the interface is updated (i.e., with
search results and available facets at this point) depending on the result of this query.
When aggregate facets are considered, it is possible to follow the same approach and
formulate aggregate SPARQL queries. However, we decided to do materialisation of
aggregate information at loading time instead since (1) RDFox, our back-end system,
supports efficient materialisation of aggregate information and, more importantly, (2)
non-aggregate SPARQL queries are usually faster to answer which is essential for re-
sponsive user interface updates.
Reachability. SemFacet provides the shortcut functionality described in Section 1. In
the user interface, SemFacet offers a search box within each facet that allows users to
search for reachable facets. As the user has typed in a facet name F in the box, the
system checks if such a facet is reachable from the current facet. For this we perform
breadth-first search to find all reachable nodes with an outgoing property F and we
store corresponding witnessing paths. These paths are important in constructing the
corresponding SPARQL query for fetching possible facet values for F and for further
facet navigation. For a faster response time, we predefine a parameter B and check
facet reachability up to length B instead. In our example, in Figure 1 (right) the user
can search for the inContinent facet, which in this case is reachable within 2 steps, and
then select ‘Asia’, thus selecting processors produced by an asian company.
References
1. SemFacet Project Page. http://www.cs.ox.ac.uk/isg/tools/SemFacet/
2. Arenas, M., Cuenca Grau, B., Kharlamov, E., Marciuska, S., Zheleznyakov, D.: Towards
Semantic Faceted Search. In: Proc. of WWW (Companion Volume). pp. 219–220 (2014)
3. Arenas, M., Cuenca Grau, B., Kharlamov, E., Marciuška, Š., Zheleznyakov, D.: Faceted
search over RDF-based knowledge graphs. J. Web Semantics 37, 55–74 (2016)
4. Arenas, M., Cuenca Grau, B., Kharlamov, E., Marciuška, Š., Zheleznyakov, D.: Enabling
Faceted Search over OWL 2 with SemFacet. In: Proc. of OWLED. pp. 121–132 (2014)
5. Arenas, M., Cuenca Grau, B., Kharlamov, E., Marciuška, Š., Zheleznyakov, D., Jiménez-
Ruiz, E.: SemFacet: Semantic Faceted Search over Yago. In: Proc. of WWW (Companion
Volume). pp. 123–126 (2014)
6. Cuenca Grau, B., Kharlamov, E., Marciuška, Š., Zheleznyakov, D., Zhou, Y.: Querying Life
Science Ontologies with SemFacet. In: Proc. of SWAT4LS (2014)
7. Grau, B.C., Kharlamov, E., Marciuska, S., Zheleznyakov, D., Arenas, M.: Semfacet: Faceted
search over ontology enhanced knowledge graphs. In: ISWC Posters & Demos (2016)
8. Motik, B., Nenov, Y., Piro, R., Horrocks, I., Olteanu, D.: Parallel Materialisation of Datalog
Programs in Centralised, Main-Memory RDF Systems. In: Proc. of AAAI (2014)
9. Tunkelang, D.: Faceted Search. Synthesis Lectures on Information Concepts, Retrieval, and
Services, Morgan & Claypool Publishers (2009)
10. Wagner, A.J.: Faceted semantic search, technical report. Tech. rep.
4