<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ranking, Aggregation, and Reachability in Faceted Search with SemFacet?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Evgeny Kharlamov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Giacomelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evgeny Sherkhonov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Egor V. Kostylev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ian Horrocks</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sapienza University of Rome</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Motivation. Faceted search is a prominent search paradigm that became the standard in many Web applications including online shopping and real-estate portals, where users can progressively narrow down the search results by applying filters, called facets [9] which are organised in faceted interfaces. Faceted search has also been proposed in the Semantic Web context for exploring and querying RDF graphs and a number of RDFbased faceted search systems have been developed (see [3,7,4,5,2,6] for an overview). One of the main challenges that hampers usability of faceted search systems is information overload [9]: when the size of the faceted interface becomes comparable to the size of data over which the search is performed. The overload is already a challenge in the case of the classical faceted search and it becomes even stronger when faceted search is applied to RDF data. Indeed, observe that in both classical and RDF settings the search is over annotated entities, at the same time, in the latter case the number of annotations and possible values is typically much larger: any predicate occurring in an RDF data set can give a facet during a search, and any entity or value that it points to in the RDF data can become a value in this facet. At the same time, in the classical case the list of possible facets is typically predefined and controlled. Moreover, differently from the classical case, in the RDF settings entities are also interconnected. Thus, in an RDF driven faceted interface one has to nest facets in order to reflect this interconnections. In Figure 1, left, one can see a possible way to nest facets which is implemented in our SemFacet system. This not only raises a non-trivial problem of how to arrange nested facets within an interface in an intuitive and user oriented way, but also leads to an unavoidable overload of the interface with various paths of nested facets. Thus, faceted navigation over RDF requires from users to be experts in the structure of the underlying RDF graph in order to be able to find the required facet which can be deeply nested. In order to see why it might be hard to find a required nested facet, consider in Figure 2 an example RDF data. Assume that one is looking for a smartphone with the price within £500-£900 and the processor that is manufactured by a company with headquarters in Asia. One can see in Figure 2 that Samsung S8 is such a smartphone. At the same time, finding this smartphone via a faceted interface requires one to traverse the data via 4 nested facets (Figure 1, left). SemFacet System. In our RDF-based faceted search system SemFacet we propose to address the information overload by - ranking facets and their values and thus offering to users top-k most prominent facets and values; ? This work was supported by the Royal Society under a University Research Fellowship and the EPSRC under an IAA award and the projects DBOnto, MaSI3, ED3, and VADA.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Introduction
phone
keywords</p>
    </sec>
    <sec id="sec-2">
      <title>Search</title>
      <p>phone</p>
    </sec>
    <sec id="sec-3">
      <title>Search</title>
      <p>type
SmartPhone
Laptop</p>
      <p>hasPart
Processor
producedBy
ANY</p>
      <p>withHQ
ANY
inCountry
ANY
inContinent
ANY
Asia</p>
      <p>North America
price
0 500 900 1.600</p>
      <sec id="sec-3-1">
        <title>Samsung Galaxy S8</title>
        <p>Galaxy S8 is a phone that of ers an exceptional
experience for any user. The large screen is a real
turning point in flagship phone design and should usher
in the end of large bezels, and the camera and slick
performance work bril iantly under the finger. ANY
facet predicate
facet values
answer
Search Task:
Fin▪d smwahrotpsheopnreoscessor
▪ icsommapnaunfyactured by a
▪▪ a£wn5itd0h0wh-£eh9ao0ds0equparricteersisinwAithsiina
type
SmartPhone (8.270) ANY
Laptop (5.850)</p>
        <p>hasPart
Processor (2.490)
producedBy
ANY (1.800)</p>
        <p>withHQ
SSAuaNnwYoDniego (63((580)))
enter facet
price max
0 500 900 1.600
Reachable facets
aggregate facets</p>
      </sec>
      <sec id="sec-3-2">
        <title>Exycon 8 Processor</title>
        <p>Galaxy S8 is built around the new Exynos 8895,
featuring new custom CPU cores, new Mali GPU, and
with the whole thing fabricated via a cut ing-edge 10nm
process intended to maximize performance and power
ef iciency.
refocusing</p>
        <p>
          refocussed answer
Search Task:
Sh▪ow mwehopsreocpersoscoersssoofrsmartphones
▪▪▪ iapwsnriotmdhvawihdneheuaorfsdasceqistumu£ar5aretx0de0rpbs-ry£iicn9ae0Ac0asomiamopnagny
inContinent
Asia (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
N. America (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
– minimising the number of values within a facet by first grouping them according to
the corresponding entities and then aggregating them with the standard aggregate
functions max, min, count, sum, avg;
– shortcutting paths of nested facets with the help of a reachability operator.
        </p>
        <p>Observe in Figure 1, right, the SemFacet interface where all these three features
are incorporated. The facets and values are now ranked and only the top 10 of them
are displayed. The users can choose whether to look for a maximal price of entities by
activating aggregate functions within facets with numerical values. In the screenshot
the user selected max and thus the system is searching for smartphones whose maximal
price across providers is within the range £500–£900. Finally, the users can enter the
name of a desired predicate instead of choosing a facet value in order to ask the system
to find a shortcut (a reachable facet). In the screenshot the user entered inContinent
instead of selecting SanDiego or Suwon.</p>
        <p>Demo Overview. The goal of the demo is to show the attendees how our novel features,
that is, ranking, aggregation, and shortcutting, make hard faceted search—over RDF
datasets that are highly interconnected and with many data values—easier. In order to
experience the impact of these features the attendees will be able to explore the demo
dataset using two versions of our SemFacet systems: with and without the features.
2 SemFacet System
We first give an overview of SemFacet with the focus on its novel components and then
present technical details behind ranking, aggregation, and shortcuts.</p>
        <p>System Overview. SemFacet is implemented in Java and available under an academic
license [1]. In Figure 2 there is the architecture of SemFacet where the arrows denote
the data flow between the systems’ components.</p>
        <p>On the client side, SemFacet has an HTML 5 based GUI consisting of three main
parts: a free text search box for keywords, a hierarchically organised faceted interface,
and a scrollable panel containing snippet-shaped answers. User keywords are sent by
the client to the server, evaluated and this gives the initial faceted interface. User
selections in the faceted interface are compiled into a SPARQL query using the SPARQL
query constructor and then sent to the server for evaluation. The snippet composer and
facet composer receive information about facets and answers that should be displayed
to the user and update the currently displayed interface and query answers. The
system updates the faceted interface incrementally: only the parts of the interface that are
affected by users’ actions are updated, which allows for a significantly faster response</p>
        <p>800 900 290 270
price price price price phone
:SamsungS8 hasPart :Exynos producedBy :Samsung withHQ :Suwon inCountry :S.Korea
phone
type
type
type</p>
        <p>inContinent
Smartphone Processor Company :Asia :NorthAmerica
type type type inContinent
:HP Elite X3 hasPart :Snapdragon producedBy :Qualcomm withHQ :SanDiego inCountry :USA</p>
        <sec id="sec-3-2-1">
          <title>Search Engine</title>
          <p>price
730
price price
280
300
type
SmartPhone (8.270)
PLaropcthoeapssPorart ((25..489500))</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Facet</title>
          <p>Composer</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>SPARQL Q. Constructor</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>Facet Generator</title>
          <p>AgHgarengdaletiron SHhaonrtdcluetrs
RHaannkdilnegr HRaanndgleer
RDF Data, Rules
Exycon8Proces or
Galaxy S8 is built around the new Exynos 8 95,
featuringnewcustomCPUcores,newMaliGPU,and
withthewholethingfabricatedviaacuting-edge10nm
proces intendedtomaximizeperformanceandpower
eficiency.</p>
        </sec>
        <sec id="sec-3-2-5">
          <title>Snippet</title>
          <p>Composer Client
GSennieprpaetotr Server</p>
        </sec>
        <sec id="sec-3-2-6">
          <title>RDFox: Storage and Query Execution</title>
          <p>Ranking facet values. Besides ranking of facets, it is important to rank facet values
as well. We adopt the count-based ranking (see also [10]): facet values that lead to a
larger number of results are ranked higher. Although the computation of counts is
conceptually trivial, the main challenge is in their update. Indeed, the integers associated
to each facet value in the interface must be updated every time the user interacts with
the faceted interface without affecting the performance of the system. Additional
challenges come from (i) graph structure of the data and (ii) interplay of conjunctive and
disjunctive interpretation of facet values. In our experience, implementing the
straightforward approach to updating counts leads to a significant increase of the user interface
response time. To mitigate this problem, we adopted a multi-threading solution: each
thread receives a set of facet values for which counts need to be updated and,
additionally, the load among the threads is balanced. For instance, we avoid the situation when
one thread is busy with updating counts for values with a high number of results only,
while another gets away with updating counts for values with a low number of results.
This approach led to a significant response time decrease.</p>
          <p>
            Aggregation. Recall that every interface update performed by the user (i.e., refocusing,
selection of a facet or a facet value) results in formulating a corresponding SPARQL
query on the fly that is then issued to RDFox. Then the interface is updated (i.e., with
search results and available facets at this point) depending on the result of this query.
When aggregate facets are considered, it is possible to follow the same approach and
formulate aggregate SPARQL queries. However, we decided to do materialisation of
aggregate information at loading time instead since (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) RDFox, our back-end system,
supports efficient materialisation of aggregate information and, more importantly, (
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
non-aggregate SPARQL queries are usually faster to answer which is essential for
responsive user interface updates.
          </p>
          <p>Reachability. SemFacet provides the shortcut functionality described in Section 1. In
the user interface, SemFacet offers a search box within each facet that allows users to
search for reachable facets. As the user has typed in a facet name F in the box, the
system checks if such a facet is reachable from the current facet. For this we perform
breadth-first search to find all reachable nodes with an outgoing property F and we
store corresponding witnessing paths. These paths are important in constructing the
corresponding SPARQL query for fetching possible facet values for F and for further
facet navigation. For a faster response time, we predefine a parameter B and check
facet reachability up to length B instead. In our example, in Figure 1 (right) the user
can search for the inContinent facet, which in this case is reachable within 2 steps, and
then select ‘Asia’, thus selecting processors produced by an asian company.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>SemFacet</given-names>
            <surname>Project</surname>
          </string-name>
          <article-title>Page</article-title>
          . http://www.cs.ox.ac.uk/isg/tools/SemFacet/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Marciuska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Zheleznyakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Towards Semantic Faceted Search</article-title>
          .
          <source>In: Proc. of WWW (Companion Volume)</source>
          . pp.
          <fpage>219</fpage>
          -
          <lpage>220</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            , Marciusˇka, Sˇ .,
            <surname>Zheleznyakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Faceted search over RDF-based knowledge graphs</article-title>
          .
          <source>J. Web Semantics</source>
          <volume>37</volume>
          ,
          <fpage>55</fpage>
          -
          <lpage>74</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            , Marciusˇka, Sˇ .,
            <surname>Zheleznyakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          :
          <article-title>Enabling Faceted Search over OWL 2 with SemFacet</article-title>
          .
          <source>In: Proc. of OWLED</source>
          . pp.
          <fpage>121</fpage>
          -
          <lpage>132</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            , Marciusˇka, Sˇ.,
            <surname>Zheleznyakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          , Jime´nezRuiz, E.:
          <article-title>SemFacet: Semantic Faceted Search over Yago</article-title>
          .
          <source>In: Proc. of WWW (Companion Volume)</source>
          . pp.
          <fpage>123</fpage>
          -
          <lpage>126</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            , Marciusˇka, Sˇ .,
            <surname>Zheleznyakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Querying Life Science Ontologies with SemFacet</article-title>
          .
          <source>In: Proc. of SWAT4LS</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kharlamov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marciuska</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheleznyakov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Semfacet:
          <article-title>Faceted search over ontology enhanced knowledge graphs</article-title>
          .
          <source>In: ISWC Posters &amp; Demos</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olteanu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems</article-title>
          .
          <source>In: Proc. of AAAI</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Tunkelang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Faceted Search.
          <source>Synthesis Lectures on Information Concepts</source>
          , Retrieval, and Services, Morgan &amp; Claypool Publishers (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>A.J.:</given-names>
          </string-name>
          <article-title>Faceted semantic search</article-title>
          ,
          <source>technical report. Tech. rep.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>