<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Faceted Search Index for OptiqueVQS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vidar N. Klungre</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Giese</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Oslo</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present an end-user oriented visual query system that combines faceted search with graph queries. Unlike previous systems, the architecture is designed to scale gracefully to both very large datasets and complex queries. This is achieved by setting up an indexing structure for facet values that can easily be scaled out arbitrarily. In return, it compromises some precision in computing sets of available facet values, but it does so in a highly con gurable manner.</p>
      </abstract>
      <kwd-group>
        <kwd>Faceted search</kwd>
        <kwd>RDF</kwd>
        <kwd>Index</kwd>
        <kwd>Visual Query Interface</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Introduction
Faceted search [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a popular search and exploration paradigm which allows
users to apply search lters to multiple orthogonal dimensions (facets) of the
data. Filters can be added, removed or modi ed in any order, and every time
this is done, the system immediately updates the list of results, giving the user
instant feedback. To support this functionality, the system needs fast access to
the underlying data. This is often provided by search engines like Lucene1 or
Sphinx2 which provide better performance for the queries required by faceted
search than RDB-based implementations.
      </p>
      <p>
        Faceted search has also been extended to semantics-based data, e.g. in
SemFacet [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Rhizomer [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In these systems, datatype properties are treated
as facets, but they also include some ability to query the graph structure via
object properties. Combining faceted search with graph queries results in more
expressive queries, but implementing faceted search also becomes
computationally harder. The challenging task is to update the set of available values for the
facets after each user interaction. The straightforward way of computing this set
involves evaluating a query of similar complexity to the whole query built so far,
and that needs to be done for each facet. For queries with large graph patterns,
and large datasets, this can become too time consuming for an interactive
system. The usual approach of using a search engine style index will not help, since
these engines do not support graph queries.
      </p>
      <p>In this demo, we present a system that combines faceted search with graph
queries, and that uses an indexing structure for facet values that can easily be
scaled out arbitrarily. In return, it compromises some precision in computing
sets of available facet values, in a highly con gurable manner.
1 https://lucene.apache.org/
2 http://sphinxsearch.com/
,
s
n
itoa se
l m
e a
,trs ten
p c
ce fa
n
o
c</p>
    </sec>
    <sec id="sec-2">
      <title>OptiqueVQS front end</title>
      <p>te se</p>
      <p>u
fca lav</p>
      <p>Faceted Search Index
controls
s
l
o
r
t
n
o
c
Index con g.
lls
n
a
l
q
u
e
r
y</p>
    </sec>
    <sec id="sec-3">
      <title>Annotated Ontology</title>
    </sec>
    <sec id="sec-4">
      <title>Data (SPARQL endpoint) Fig. 1. Architecture of OptiqueVQS extended with the faceted search index.</title>
      <p>
        We implemented this new functionality as part of OptiqueVQS [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], an
ontologybased visual query system, intended for users with little IT expertise.
OptiqueVQS combines graph querying with ltering on datatype properties.
However, in previous versions of OptiqueVQS, the available values for each facet have
been static, determined entirely by a suitably annotated ontology, and did not
depend on the underlying data. Neither did they change in reaction to the lters
on other facets. This new version adds a server side component that reads data
from a SPARQL endpoint and stores information needed for e cient faceted
search in a scalable index. This index, instead of the original SPARQL endpoint,
is queried in reaction to user interactions to update the interface.
      </p>
      <p>The essence of the index structure is to use one table (that could be stored
e.g. in a Lucene index) per concept available in the query interface. In addition to
the facet values applicable to that concept, the index can also store information
about the existence of object property links to other resources, facet values of
such neighbouring resources, links from those to further resources, etc. How much
information about neighbouring resources is to be stored in the index can be
congured. Independently of the con guration, the index will provide an
approximation from above (a superset) of the possible remaining values for each facet.
2</p>
      <p>Approximating Facet Value Computation</p>
      <p>We therefore approximate this computation, possibly computing a superset
of the possible values, by considering only a part of Q, a neighbourhood of the
focus variable ?x. In the simplest case, object properties are ignored, and classical
faceted search on the data properties for the variable is performed, using only
data lters on ?x. To do this e ectively, an index table with one column per facet
is created. Standard search engine technology can perform the required ltering
and gathering of possible values in a scalable way.</p>
      <p>However, usability studies with OptiqueVQS have shown that users react to
facet values that they consider to be obviously excluded by some of the object
properties in their partial query. To accommodate an object property o, we can
add a boolean-valued facet representing the existence of a triple ?x o ?y to some
resource ?y in the data. The index data structure remains a simple table of facet
values. As an example, consider the following data:
:kona a :Drink; :kind "Coffee"; :suppliedBy :americanDrinks .
:sencha a :Drink; :kind "Tea"; :suppliedBy :asianDrinks .
:orangeJuice a :Drink; :kind "Juice" .
:americanDrinks a :Supplier; :basedIn "America" .
:asianDrinks a :Supplier; :basedIn "Asia" .</p>
      <p>The user builds a query for drinks supplied by companies based in America,
see screenshot (a) in Fig. 2. In screenshot (b), the index is con gured not to
take the suppliedBy relation into account. Therefore, all three kinds of drinks
are available as options for the Kind facet. In (c), the existence of a supplier is
added to the index, and the choice Juice is disabled, since there is no product
of that kind that has a supplier.</p>
      <p>This can be further re ned by adding columns for datatype properties on the
resources connected by object properties. These can then be taking into account
when ltering for combinations of values permitted by Q. In the example, adding
the based in facet of the supplier to the index table for drinks means that only
Co ee will be available, as screenshot (d) shows.</p>
      <p>
        Which combinations of object and datatype properties should be added as
columns to the search engine table, depending on the type of ?x, is controlled by
the index con guration. The details of this con guration, how it controls index
generation, and how the index is used to compute sets of possible facet values is
explained in a technical report, together with a more elaborate example [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Independently of the con guration, the index has the shape of a single table
that can be scaled out using standard methods to arbitrarily large datasets. If
necessary, index storage and processing can be parallelized, which is considerably
easier for single tables than for relational or graph databases.</p>
      <p>
        We will demonstrate our tool on an instance of the NPD benchmark [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>An interesting remaining question concerns the relationship between
perceived usability and index con guration. The larger the neighbourhood
considered by the index, the sharper the approximation of possible facet values, but
the more space will also be required by the index. We surmise that users
expectations of possible choices can be met by considering rather small con gurations.
To show this empirically in usability studies, and to get an idea of the trade-o
between perceived usability and index size, are future work.</p>
      <p>(a) example query
(b) without supplier
(c) existence of supplier
(d) location of supplier</p>
      <p>Acknowledgements This work has been funded by the Norwegian Research
Council through SIRIUS (NFR 237898).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          et al.
          <article-title>Faceted search over RDF-based knowledge graphs</article-title>
          .
          <source>J. Web Semantics</source>
          ,
          <volume>37</volume>
          :
          <fpage>55</fpage>
          {
          <fpage>74</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Josep</given-names>
            <surname>Maria</surname>
          </string-name>
          <string-name>
            <surname>Brunetti</surname>
          </string-name>
          ,
          <article-title>Roberto Garc a, and Soren Auer. From overview to facets and pivoting for interactive exploration of semantic web data</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems (IJSWIS)</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>20</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Vidar N Klungre</surname>
          </string-name>
          .
          <article-title>A faceted search index for graph queries</article-title>
          .
          <source>Technical Report 469</source>
          , Department of Informatics, University of Oslo,
          <year>2017</year>
          . http://heim.ifi.uio.no/ martingi/pub/IndexReport.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Davide</given-names>
            <surname>Lanti</surname>
          </string-name>
          , Martin Rezk, Mindaugas Slusnys, Guohui Xiao, and Diego Calvanese.
          <article-title>The npd benchmark for obda systems</article-title>
          .
          <source>In Proc. of the 10th Int. Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS</source>
          <year>2014</year>
          ), volume
          <volume>1261</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <volume>3</volume>
          {
          <fpage>18</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Ahmet</given-names>
            <surname>Soylu</surname>
          </string-name>
          et al.
          <article-title>Experiencing OptiqueVQS: a multi-paradigm and ontologybased visual query system for end users</article-title>
          .
          <source>UAIS</source>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ):
          <volume>129</volume>
          {
          <fpage>152</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Tunkelang</surname>
          </string-name>
          .
          <article-title>Faceted search</article-title>
          .
          <source>Synthesis lectures on information concepts</source>
          ,
          <source>retrieval, and services</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>80</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>