<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enabling Faceted Search over OWL 2 with SemFacet?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcelo Arenas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evgeny Kharlamov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sˇ aru¯nas Marciusˇka</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitriy Zheleznyakov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Pontificia Universidad Cato ́lica de Chile</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oxford</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Lots of applications nowadays rely on RDF, OWL 2, and SPARQL 1.1 for storing, publishing and querying data. However, SPARQL 1.1 is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach to facilitate end-users query formulation which is widely used in information systems. This approach was recently adapted to the context of RDF, however, the proposed solutions lack rigorous theoretical underpinning and essentially ignore OWL 2 axioms. We develop a clean theoretical framework for faceted search that accounts for both RDF data and OWL 2 axioms. We implemented and tested some of our solutions in the SemFacet platform.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>In the last decade we have witnessed a constant increase in the number of
applications that store and publish data in RDF, in the of use OWL 2 ontologies for providing
background knowledge about the application domain and enriching query answers with
information not explicitly given in the data, and in the use of SPARQL 1.1 for
querying this data. It was acknowledged by many that writing SPARQL 1.1 queries requires
special training and is not well-suited for the majority of end users. Thus, an important
challenge is the development of simple yet powerful query-formulation interfaces that
capture well-defined fragments of SPARQL 1.1.</p>
      <p>This challenge was acknowledged by the community, and a number of approaches
to facilitate query formulation over RDF and OWL 2 have been proposed in the last
decade. Proposed solutions rely on different query formulation paradigms and include
controlled natural language, e.g., [4–8], diagram based approaches, e.g., [9–15], faceted
search interfaces,e.g., [16–18], and others. In this work we focus on faceted search due
to its importance in Web bases information systems and intuitiveness to end users.</p>
      <p>Faceted search is a prominent approach for querying document1 collections where
users can narrow down the search results by progressively applying filters, called
facets [19]. A facet typically consists of a property (e.g., ‘gender’ or ‘occupation’ when
querying documents about people) and a set of possible string values (e.g., ‘female’ or
‘research’), and documents in the collection are annotated with property-value pairs.
During faceted search users iteratively select facet values, and the documents
annotated according to the selection are returned as the search result. Several authors have
proposed faceted search for querying document collections annotated with RDF, and a
number of RDF-based faceted search systems have been developed, e.g. [16–18, 20–
26]. Although there has been intensive efforts in system development, the theoretical
underpinnings received less attention [27–29].
? Work supported by the Royal Society, the EPSRC projects Score!, Exoda, and MaSI3, and the</p>
      <p>FP7 project OPTIQUE [1–3] under the grant agreement 318338.
1 We use the term ‘document’ to refer to any resource that can be referenced using a URI.</p>
      <p>In particular, it is unclear what fragments of SPARQL 1.1 can be naturally captured
using faceted search as a query paradigm, and what is the complexity of answering such
queries. Moreover, faceted search interfaces are typically generated and updated based
only on RDF data graphs. We see this as an important limitation as OWL 2 axioms are
essentially ignored by the existing solutions, thus overlooking that ontological axioms
can be used to enrich query answers with implicit information. Besides, these axioms
provide schema-level structure that can be exploited to improve faceted interfaces.
Finally, purely RDF-based faceted search systems are data-centric, and hence cannot be
exploited to browse large ontologies or to pose meaningful queries at the schema level.</p>
      <p>We address these limitations of existing solutions by formalising faceted interfaces
that are tailored towards RDF and OWL 2, and which capture the key functionality
implemented in existing faceted search systems. Our interfaces capture both the
combination of facets displayed during search, and the facet values selected by users. In this
way, an interface encodes both a query, whose answers constitute the current search
results, and the facet values available for further selection. Analogously to existing work
on RDF-based faceted search, and in contrast to traditional faceted search, our notion of
interface allows users to ‘navigate’ across interconnected collections of documents and
establish filters to each of them. Furthermore, it abstracts from considerations specific
to GUI design (e.g., facet and value ranking), while at the same time reflecting the core
functionality of existing systems. For faceted queries, i.e., queries that can be encoded
by faceted interfaces, we study the expressivity and complexity of evaluation over RDF
data enhanced with OWL 2 axioms. Finally, we study interface generation and update.
Existing techniques for RDF are based on exploration of the underlying RDF graph: by
generating facets according to the RDF graph, systems can guide users in the
formulation of ‘meaningful’ queries. We lift this approach by proposing a special graph-based
representation of OWL 2 ontologies and their logical entailments for the purpose of
faceted navigation. Further details on our techniques can be found in [30].</p>
      <p>To put our ideas in practice we developed a faceted search platform, called
SemFacet (available for download and installation, and as a Web service [31, 32]), for
generating and updating faceted interfaces. SemFacet relies on an external triple store with
OWL 2 reasoning capabilities, and it is compatible with faceted search GUIs and text
search engines for retrieving documents from keywords. For the demonstration purpose
we bundled SemFacet with the triple store JRDFox, the search engine Lucene, and our
own faceted search GUI. We have tested SemFacet over synthetic ontologies as well as
Yago [33] extended with DBpedia (dbpedia.org/) abstracts with encouraging results.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Preliminaries</title>
      <p>We use standard notions from first-order logic. We assume pairwise disjoint infinite
sets of constants, unary predicates, and binary predicates. A fact is a ground atom and
a dataset is a finite set of facts. A rule is a sentence of the form 8x8z ['(x; z) !
9y (x; y)], where x, z, and y are pairwise disjoint tuples of variables, '(x; z) is a
conjunction of atoms with variables in x [ z, and 9y (x; y) is an existentially
quantified non-empty conjunction of atoms (x; y) with variables in x [ y. Universal
quantifiers are omitted. The restriction of (x; y) being non-empty ensures satisfiability of
any set of rules and facts, which makes query results meaningful. A rule is Datalog if
9y (x; y) has at most one atom and all variables are universally quantified. OWL 2
defines three profiles: RL, EL, and QL, weaker languages with favourable
computational properties [34]. Every profile ontology can be normalised as a set of rules and
facts using the correspondence of OWL 2 axioms with first-order logic and a variant of
the structural transformation (e.g., see [35]). Thus, we define an ontology O as a finite
set of rules and facts. We refer to [30, 35] for the details of rules corresponding to the
OWL 2 profiles. A positive existential query (PEQ) is a formula that may have free
variables which is constructed using ^, _ and 9. A PEQ s monadic if it has one free
variable, and it is a conjunctive query (CQ) if it is _-free.</p>
      <p>We consider two different semantics for query answering. Under the classical
semantics, given a query Q(x) = 9y'(x; y), we have that a tuple t of constants is an
answer to Q(x) w.r.t. an ontology O if O j= 9y'(t; y). Under the active domain
semantics, t is an answer to Q w.r.t. O if there is a tuple t0 of constants from O such
that O j= '(t; t0). The evaluation problem under the classical (resp. active domain)
semantics is to decide, given a tuple of constants t, a PEQ Q and an ontology O in
a language L, whether t is an answer to Q w.r.t. O under the classical (resp. active
domain) semantics. The classical semantics is the default in first-order logic, whereas
active domain is the default semantics of the SPARQL 1.1 entailment regimes [36]. The
difference between both semantics manifests itself only in the presence of existentially
quantified rules and queries; thus, both semantics coincide if either the input ontology
is Datalog, or if all variables in the input query are free.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Faceted Interfaces</title>
      <p>Our notion of faceted interface comes with a clean first-order logic semantics, and
provides a rigorous foundation for faceted search over RDF graphs enhanced with OWL 2
ontologies. We start with an example based on an excerpt of from Yago [33]. Our goal
is to find presidents who graduated from St. Petersburg or Georgetown and have a child
who graduated from some university.</p>
      <p>Example 1. The document dvp and dbc for Vladimir Putin and Bill Clinton are
annotated with the category ‘president’, and dvp is also annotated with ‘Russian’. Putin’s
daughter Yekaterina dyp and Clinton’s daughter Chelsea dcc are categorised as
‘person’. The documents for St. Petersburg State Uni. dsp , Stanford Uni. ds , and
Georgetown Uni. dg are categorised under ‘university’. These annotations are given in RDF
and correspond to the following facts:</p>
      <sec id="sec-3-1">
        <title>President(dvp ),</title>
      </sec>
      <sec id="sec-3-2">
        <title>Person(dcc),</title>
      </sec>
      <sec id="sec-3-3">
        <title>President(dbc),</title>
      </sec>
      <sec id="sec-3-4">
        <title>Univ(dsp ),</title>
      </sec>
      <sec id="sec-3-5">
        <title>Russian(dvp ),</title>
      </sec>
      <sec id="sec-3-6">
        <title>Univ(ds ),</title>
      </sec>
      <sec id="sec-3-7">
        <title>Person(dyp ),</title>
      </sec>
      <sec id="sec-3-8">
        <title>Univ(dg ).</title>
        <p>In RDF specific information about documents is represented using literals, e.g.,
Vladimir Putin’s date of birth is encoded as dateOfBirth(dvp ; 1952-10-07). Most
importantly, documents are also annotated with other documents:</p>
        <p>child(dvp ; dyp ), child(dbc; dcc), grad(dvp ; dsp ), grad(dbc; dg ), grad(dcc; ds ).</p>
        <p>Moreover, Yago can be extended with ontological rules. Consider for example the
following rule that says that at least one child of each Russian president graduate from
some university, which is a reasonable assumption:</p>
        <sec id="sec-3-8-1">
          <title>Russian(x) ^ President(x) ^ child(x; y) ! child(x; z) ^ grad(z; w) ^ Univ(w):</title>
          <p>(1)</p>
          <p>Analogously to traditional faceted search, we represent a facet as a pair of a
predicate (or facet name) and a set of values. In the context of RDF, however, documents (and
not just strings) can be used to annotate other documents, and thus annotations form a
graph, rather than a tree. Consequently, facet values can be either document URIs or
literals. Examples of facet names are the relations ‘grad’ and ‘dateOfBirth’, and example
values are documents such as ‘ds’ (Stanford) and literals such as ‘1952-10-07’.
Selection of multiple values within a facet can be interpreted conjunctively or disjunctively,
and hence we distinguish between conjunctive and disjunctive facets. Furthermore, we
distinguish a special facet type, whose values are categories (i.e., unary predicates)
rather than specific documents or literals. Finally, a special value any denotes the set of
all values compatible with the facet name.</p>
          <p>Definition 1. Let type and any be special symbols. A facet is a pair (X; ), with
2 f^; _g, a non-empty set, and either
(i) X = type and is a set of unary predicates, or
(ii) X is a binary predicate and is a set that consists of any and either some
constants or some unary predicates.</p>
          <p>A facet of the form (X; ^ ) is conjunctive, and a facet of the form (X; _ ) is
disjunctive. In a facet F = (X; ), X is the facet name, denoted by F j1, and contains the
facet values and it is denoted by F j2.</p>
          <p>Example 2. Consider the following facets: F1 = (type; _fPresident; Univg), F2 =
(child; _fany; dyp ; dcc g) and F3 = (grad; _fany; dsp; dg; dsg). The disjunctive facet
F1 can be exploited to select the categories to which the relevant documents belong
(president or university). Facet F2 can be used to narrow down search results to those
individuals with children dyp or dcc ; furthermore, the value any can be used to state that
we are not looking for any specific child. F3 allows to select graduation places.
tu
3.1 The Notion of Faceted Interface
Our faceted interfaces encode both a query (whose answers determine the search
results) and the choices of facet values available for further refinement.</p>
          <p>Definition 2. A basic faceted interface (BFI) is a pair (F; ), with F a facet and
F j2 the set of selected values. The set of faceted interfaces (or interfaces, for short) is
given by the following grammar, where I0 and I1 = (F; ) are BFIs and F j1 2 BP:
I ::= path j (path ^ path) j (path _ path);
path ::= I0 j (I1=I):</p>
          <p>A BFI encodes user choices for a specific facet, e.g., the BFI (F1; fPresidentg)
selects the documents categorised as presidents. BFIs are put together in paths: sequences
of nested facets that capture navigation between sets of documents. Documents are
annotated with other documents by means of binary relations (e.g., child connects parents
to their children); thus, nesting (I1=I) requires the BFI I1 to have a binary relation as
facet name. With nesting we can capture queries such as ‘people with a child who
graduated from Stanford’ by using the interface (F2; fanyg)=(F3; fdsg) which first selects
people having (any) children and then those children with a Stanford degree. Finally,
two types of branching can be applied: (path1 ^path2) indicates that search results must
satisfy the conditions specified by both path1 and path2, while (path1 _ path2) indicates
that they must satisfy those in path1 or path2.</p>
          <p>Example 3. Consider the interface Iex:</p>
          <p>(F1; fPresidentg) ^ (F3; fdsp ; dgg) ^ (F2; fanyg)=(F3; fanyg) :
It could be visualises it in our system SemFacet as in Figure 1, left. The interface
consists of three paths connected by ^-branching. In the first path, we select documents
categorised as ‘president’. In the second path, we select graduates of St. Petersburg
or Georgetown. In the third path, we first select individuals with a child who have
some degree. Since paths are combined conjunctively, the constraints encoded in them
apply simultaneously. Thus, we obtain the presidents who graduated from either St.
Petersburg or Georgetown and who have a child graduated from some Uni. tu</p>
          <p>We formally specify the semantics of the queries encoded by the selected values in
an interface in terms of first-order logic, see [30] for details. Intuitively, Our semantics
assigns to each interface a PEQ with one free variable, which retrieves a set of
documents. For each facet F we have that no restriction is imposed by F if no facet value is
selected. BFIs with a type-facet are interpreted as the conjunction (disjunction) of unary
atoms over the same variable. BFIs having as facet name a binary predicate result in
either an atom whose second argument is existentially quantified (if any is selected), or in
a conjunction (disjunction) of binary atoms having a variable as second argument that
must be equal to a constant or belong to a unary predicate. Branching (path1 path2)
with 2 f^; _g is interpreted in the natural way by constructing the conjunction
(disjunction) of the queries for each pathi; furthermore, if for some pathi we have that no
value from the facets occurring in pathi is selected, then pathi is ignored. Finally,
nesting involves a “shift” of variable from the parent BFI to the nested subexpression. A
first order formula ' is a faceted query if there exists a faceted interface I such that '
is are identical modulo renaming of variables to the semantics of I
Example 4. Our example interface Iex encodes the positive existential query Qex(x):</p>
        </sec>
        <sec id="sec-3-8-2">
          <title>President(x) ^ 9y1 grad(x; y1) ^ y1</title>
          <p>dsp _ 9y2 grad(x; y2) ^ y2</p>
          <p>dg
^9z child(x; z) ^ grad(z; w):
If we consider only facts, the answer is Bill, since we do not know whether Yekaterina
graduated from some university. If we also consider Rule (1), then both Vladimir and
Bill are answers, since it says that one of the Vladimir’s children has a degree. tu</p>
          <p>Note that our notion of interface abstracts from several considerations that are
critical to GUI design. For instance, our notion is insensitive to the order of BFIs composed
by ^- or _- branching, as well as to the order of facet values (which are carefully
ranked in practice). Furthermore, we model type-facet values as ‘flat’, whereas in
applications categories are organised hierarchically. Although these issues are important
from a front-end perspective, they are immaterial to our technical results.
Minimisation of Faceted Interfaces. An important issue in the design of faceted
interfaces is to avoid the overload of users with redundant facets or facet values. Intuitively,
an (unselected) facet value v is redundant if selecting v either leads to a ‘dead end’
(i.e., an empty set of answers) or it does not have an effect on query answers. Then, a
faceted interface is minimal if none of its component BFIs contains redundant values.
In [30] we provide interface minimisation algorithms. In Figure 1, center, we present a
minimised version of the interface in Figure 1, left.</p>
          <p>Faceted Interfaces with Refocussing. The interface in Example 3 finds two presidents.
If we want to know who their children who guaranteed that these presidents returned
by the query, (i.e., see Chelsea Clinton as an answer), we must provide refocussing
(or pivoting) functionality [25, 26] that allows to change the output variable of faceted
queries. In [30] we provide and extension of faceted interfaces to support refocussing.
In Figure 1, right, we have a screenshot of SemFacet where the refocusing is set to the
children and Chelsea is returned by the system. Note that Yekaterina is not in the answer
set since we do not know whether she is the child of Vladimir with a degree.
3.2</p>
          <p>Answering Faceted Queries
Each time a user selects a facet value to refine the search results, a faceted search
system must compute the answers to a query. Thus, query evaluation is a key reasoning
problem affecting efficiency of such systems. As discussed above, faceted queries are
monadic PEQs resulting from the selection of facet values in an interface. By standard
results for relational databases, PEQ evaluation is an NP-hard problem, even for CQs
over datasets. Our main result is that, in contrast to PEQs (and even CQs), faceted query
evaluation over datasets is tractable, feasible in polynomial time; furthermore, the
problem remains tractable in most cases if we consider ontologies belonging to the OWL 2
profiles. Our tractability results concern combined complexity, which takes into account
the size of the entire input: ontological rules, RDF data and queries.</p>
          <p>Active Domain Semantics. In practice, queries over ontology-enhanced RDF data are
typically represented in SPARQL 1.1 and executed using off-the-shelf reasoning
engines with SPARQL 1.1 support. The specification of SPARQL 1.1 under entailment
regimes [36] is based on active domain semantics, which requires existentially
quantified variables in the query Q to map to actual constants in the input ontology O. In this
case, we can answer queries by first computing the dataset D of all facts entailed by O
and then answers Q w.r.t. the dataset D. Fact entailment is tractable for all the OWL
2 profiles; thus, by committing to the active domain semantics of SPARQL 1.1 we can
achieve tractability without emasculating the ontology language.</p>
          <p>Theorem 1. Active domain evaluation of faceted queries is in PTIME w.r.t. all
normative OWL 2 profiles. Furthermore, it is PTIME-complete w.r.t. the EL and RL profiles.
Classical Semantics. Classical and active domain semantics coincide if we restrict
ourselves to Datalog ontologies. Thus, the above result holds for query answering under
classical semantics if the input ontology is Datalog. An immediate consequence is that
our results in Theorem 1 transfer to OWL 2 RL ontologies under classical semantics.</p>
          <p>In contrast to RL, the EL and QL profiles can capture existentially quantified
knowledge and hence active domain and classical semantics may diverge for queries with
existentially quantified variables. To deal with EL, we exploit techniques developed for the
combined approach to query answering [37, 38]. These techniques are currently
applicable to so-called guarded EL ontologies. The idea is to rewrite (EL) rules into Datalog
by Skolemising existential variables. Although this transformation strengthens the
ontology, it preserves the entailment of facts [37]. We showed [30] that the evaluation of
faceted queries is also preserved. Thus, we conclude tractability of faceted query
evaluation for EL. In contrast, we can show that faceted query evaluation is NP-complete
for QL under classical semantics. The following theorem summarises our results.
Theorem 2. Faceted query evaluation under classical semantics is (i) PTIME-complete
for RL and guarded EL ontologies; and (ii) NP-complete for QL ontologies.</p>
          <p>In [30] we showed that theorems above hold for faceted queries with refocussing.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Interface Generation and Update</title>
      <p>Faceted navigation is an interactive process. Starting with an initial interface generated
from a keyword search, users ‘tick’ or ‘untick’ facet values and the system reacts by
updating both search results (query answers) and facets available for further navigation.
We propose to generate and update interfaces by ‘traversing’ the (explicit and implicit)
information in O. Our approach is based on a general principle: each element of the
initial interface (resp. each change in an interface as a response of an action) must be
‘justified’ by a suitable entailment in O. In this way, by exploring the ontology, we can
guide users in the formulation of meaningful queries.</p>
      <p>There is an inherent degree of non-determinism in faceted navigation: if a user
selects a facet value, it is unclear whether the next facet generated by the system should
be conjunctive or disjunctive, and whether it should be incorporated in the interface
by means of conjunctive or disjunctive branching. In applications, however, different
values in a facet are typically interpreted disjunctively, whereas constraints imposed by
different facets are interpreted conjunctively. In [30] we resolve this ambiguities and
devise fully deterministic algorithms for a restricted class of interfaces where
conjunctive facets and disjunctive branching are disallowed. Example of such an interface is in
Example 3 and in the screenshots in Figure 1.</p>
      <p>The Ontology Facet Graph. We capture facets relevant to an ontology O in what we
call a facet graph. The graph can be seen as a concise representation of O, and our
interface generation and update algorithms are parameterised by such graph rather than
O. The nodes of a facet graph are possible facet values (unary predicates and constants),
and edges are labelled with possible facet names (binary predicates and type). The key
property of a facet graph is that every X -labelled edge (v; w) is justified by a rule or
fact entailed by O which ‘semantically relates’ v to w via X . We distinguish three kinds
of semantic relations: existential, where X is a binary predicate and (each instance of)
v must be X -related to (an instance of) w in the models of O; universal, where (each
instance of) v is X -related only to (instances of) w in the models of O; and typing where
X = type, and (the constant) v is entailed to be an instance of (the unary predicate) w.
Definition 3. A facet graph for O is a directed labelled multigraph G having as nodes
unary predicates or constants from O and s.t. each edge is labelled with a binary
predicate from O or type, where the latter labels only edges from a constant to a unary
predicate. Each edge e is justified by a fact or rule e s.t. O j= e and e is of the form
given next, where c; d are constants, A; B unary predicates and R a binary predicate:
(i) if e is c !R d, then e is of the form R(c; d) or R(c; y) ! y d;
(ii) if e is c !R A, then e is a rule of the form &gt;(c) ! 9y:[R(c; y) ^ A(y)] or</p>
      <p>R(c; y) ! A(y);
(iii) if e is A !R c, then e is a rule of either of the form A(x) ! R(x; c) or A(x) ^</p>
      <p>R(x; y) ! y c;
(iv) if e is A !R B, then e is a rule of the form A(x) ! 9y:[R(x; y) ^ B(y)] or</p>
      <p>A(x) ^ R(x; y) ! B(y);
(v) if e is c typ!e A, then e = A(c).</p>
      <p>The first (resp. second) option for each e in (i)-(iv) encodes the existential (resp.
universal) R-relation between nodes in e, whereas (v) encodes typing. A graph may not
contain all justifiable edges, but rather those that needed for a given application.
Example 5. A facet graph for the ontology in Example 1 may contain nodes for dbc and
dcc, as well as for predicates such as President and Univ. Example edges are: a
childedge linking dbc to dcc, which is justified by the fact child(dbc; dcc); and a grad-edge
from dvp to Univ due to Rule (1) in Example 1 and the facts about dvp.
tu</p>
      <p>It follows from the following proposition that facet graph computation can be
efficiently implemented. In practice, the graph can be precomputed when first loading
data and ontology, stored in RDF, and accessed using SPARQL 1.1 queries. In this way,
reasoning tasks associated to faceted search are performed offline.</p>
      <p>Proposition 1. Checking whether a directed labelled multigraph is a facet graph for O
is feasible in polynomial time if O is in any of the OWL 2 profiles.</p>
      <p>To realise the idea of ontology-guided faceted navigation, we require that any faceted
interface over an ontology O conforms to the facet graph of O , in the sense that the
presence of every facet and value in the interface is supported by edges of the graph. In
this way, we can ensure that interfaces mimic the structure of (and implicit information
in) the ontology and, hence, that the interface does not contain irrelevant (combinations
of) facets. Since a given facet or facet value can occur in many different places in an
interface, we need a mechanism that allows us to refer to the elements of an interface in an
unambiguous way. We refer to [30] for a formal definition of conformance. In [30] we
also provide algorithms that by relying on facet graphs allow to create faceted interfaces
(conformant to the graph), e.g., from sets of keywords, and to update such interfaces as
a reaction on users actions, i.e., ‘ticking’ and ‘unticking’ operations on facet values.</p>
    </sec>
    <sec id="sec-5">
      <title>5 SemFacet Platform</title>
      <p>We have developed a faceted search platform SemFacet providing the following main
functionality: (i) computation of facet graphs from an ontology; (ii) interface generation
from facet graphs; and (iii) interface update in response to user actions. Our platform
relies on an external triple store for querying and OWL reasoning.</p>
      <p>We bundled the platform in a prototypical system which powers faceted search over
a fragment of Yago RDF data [33] enriched with DBPedia abstracts and OWL 2 RL Our
system is available for downloading and installation, as well as a Web service [31]. In
the remainder of this section, we will describe the system’s workflow and architecture
and present the results of SemFacet evaluation over both Yago and synthetic data.</p>
      <p>Keyword
Based Search</p>
      <p>Lucene</p>
      <p>Faceted Query</p>
      <p>Interface</p>
      <p>Facet
Generator</p>
      <p>Query
Converter</p>
      <p>Answers as
Snippets
Snippet</p>
      <p>Generator
Inverted Index
on Ontology</p>
      <p>RDFox</p>
      <p>Ontology:
Facts, Rules</p>
      <p>GUI
Platform
Data
Start SemFacet search</p>
      <p>User enters keywords</p>
      <p>Lucene returns relevant docs
Snippets and initial FI are displayed</p>
      <p>Facet value (un)selected
FI and snippets are displayed</p>
      <p>End SemFacet search
FI is updated</p>
      <p>Snippets are updated
Initial FI is generated</p>
      <p>Snippets are generated
5.1 System Architecture and Implementation
If Figure 2, left, we present the workflow diagram for SemFacet. The steps relevant
to users’ activity are depicted in ovals, and those relevant to SemFacet’s activity are
depicted as boxes. Grey boxes represent front-end activity and the remaining ones
represent back-end tasks. Users start their interaction with SemFacet by entering a set of
keywords, and the system returns as initial answers those documents that are relevant to
the keywords; e.g., in Yago we considered as relevant those document URIs whose
abstract contains at least one of the keywords. SemFacet displays the answers as snippets,
combining the relevant URIs with the corresponding images and abstracts. The initial
faceted interface is generated from the relevant URIs. Users can then perform faceted
navigation by applying actions on the interface. SemFacet responds to each action by
first updating the interface and then recomputing the query answers. Observe that in
Figure 1 we nest facets to visualise the = operator and users can perform refocusing by
clicking on the button ‘focus’ attached to facets.</p>
      <p>The architecture of SemFacet is in Figure 2, right, where the components are
arranged in three layers. SemFacet relies on RDFox [39]2 for storing RDF triples, and
computing answers for SPARQL 1.1 queries. SemFacet converts faceted interfaces in
SPARQL 1.1 queries and executes them using RDFox. Note that RDFox supports only
conjunctive queries; thus, to answer faceted queries, we extended its query module with
a support of UNION. To support keyword search, we load DBpedia abstracts in Lucene
(lucene.apache.org/). Note that the implementation of SemFacet allows to substitute
both Lucene an RDFox with any other software that provide the same functionality.
5.2 Experiments
We evaluated only the back-end of SemFacet (c.f. Figure 2), left. We excluded front-end
activities since they highly depend on the client machine and network connection. In our
experiments we evaluated the runtime performance to compute answer URIs and their
associated snippet, as well as to update the interface. Our system presents ten snippets
at time, thus, snippet generation time is negligible. Since we use Lucene as a black box,
Lucene-related experiments are out of the scope of this paper.</p>
      <p>Experimental Setup To evaluate the runtime performance of SemFacet, we used a
MacBook Pro laptop with OS X 10.8.5, 2.4 GHz Intel Core i5 processor, and 8GB 1333
2 RDFox is a parallel in-memory RDF triple store; it implements reasoning for OWL 2 RL; it
materialises all implicit data via forward chaining and evaluates CQs over the materialisation.
# of queries
MHz DDR3 memory. The laptop runs the Apache Tomcat 7.0, with 4 GB of allocated
memory. Each experiment presented in the following section was executed over 10
different runs to avoid cashing of objects. Each run was repeated 10 times in a FOR-loop
to obtain a bigger sample size. Therefore, each experiment, having different
parameters, was executed 100 times it total. We report average and median of running time
performance results for each experiment.</p>
      <p>Generation of Initial Interface SemFacet computes the initial interface by (i) gathering
relevant facet names and values, and (ii) combining these in an interface.To perform
Step (i), SemFacet sends to RDFox one atomic query for each relevant URI. Thus, to
evaluate Step (i), we sent from 10 to 100,000 atomic queries to RDFox and measured
response time. For this, we generated atomic queries and synthetic data containing exactly
one answer per query Note that due to the RDFox indexing, evaluation time of atomic
queries is constant regardless data size. The results are presented in Table 1a. Times
are promising for up to 1,000 atomic queries (0.14 s), which corresponds to a keyword
search that returns 1000 URIs; for 10; 000 queries, average runtime increases to 1:16s
and for 100; 000 queries to 30:8s. Thus, one may need to restrict the number of URIs
returned by Lucene to guarantee reasonable response time for interface construction; in
our online version of SemFacet, the output of Lucene is restricted to 10,000 URIs.</p>
      <p>To evaluate Step (ii), we conducted two sets of experiments. In the first experiment
we fixed the number of facets to 10 and increased the number of values in each facet
from 1 to 10,000. In the second, we fixed the number of values per facet to 10 and
increased the number of facets from 1 to 10,000. Results are presented in Tables 1b
and 1c. Generation times for up to 1000 facets having 10 values each is promising (0.37
s). Generation times for 10,000 facets having 10 values each is increased to 3.3s on
average. Moreover, the interface construction time is similar for n facets with m values each
or for m facets with n values each. Note that the time in Tables 1b and 1c grows linearly.
Faceted Interface Update The experiments required to evaluate update of faceted
interfaces are the same as those described for the generation of the initial interface [30].
Hence, we focus here on interface minimisation. We have evaluated our minimisation
algorithm by generating faceted interfaces of growing sizes from 10 to 100,000 total
facet values. Since RDFox performs reasoning by first materialising implicit data (a
step performed only once) and then answering queries over the materialisation, we did
not use an ontology in our experiments, and relied on synthetic RDF data. Results are
presented in Table 1d: the time to minimise 1000 values is promising (0.18s), and grows
linearly upto 100,000 values (22.6s).</p>
      <p>Experiments with Real Data Previous experiments were conducted to evaluate
particular components of SemFacet in isolation. To evaluate the overall performance, we tested
the system’s running time on a fragment of Yago data enriched with DBpedia (dbpedia.
org/) abstracts and OWL 2 RL axioms (around 5 million RDF triples in total). To mimic
a real world scenario, we selected keywords that return from 10 to 10,000 answers. We
measured the number of facets, the total number of values, and the response time of
the system. The results of the experiments are presented in Table 1d: the time to obtain
1000 answers and to generate the corresponding interface is promising (0.86 s), and
grows to 3.5 s for 10,000 answers. About one third of this time is spent by Lucene, one
third by RDFox, and one third by the interface construction component of SemFacet.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Future Work</title>
      <p>We presented a solid theoretical foundations for faceted search in the context of RDF
and OWL. We have analysed the expressive power and complexity of queries stemming
from faceted interfaces, and developed practical faceted navigation algorithms. Our
results suggest many interesting problems for future work. In particular, we will explore
extensions of our navigation algorithms beyond simple interfaces, as well as the design
of more scalable interface minimisation algorithms. Concerning system design,
substantial work is needed to improve GUI design, in particular with respect to advanced
features such as refocusing. Finally, we are planning to conduct further experiments
with knowledge bases other than Yago and perform user studies.
7</p>
      <p>
        A. Fadhil and V. Haarslev. OntoVQL: A Graphical Query Language for OWL Ontologies.
In:
        <xref ref-type="bibr" rid="ref10">DL. 2007</xref>
        .
iSPARQL QBE. http://dbpedia.org/isparql/.
      </p>
      <p>
        D. Calvanese, C. M. Keet, W. Nutt, M. Rodriguez-Muro, and G. Stefanoni. Web-based
graphical querying of databases through an ontology: th
        <xref ref-type="bibr" rid="ref5">e Wonder system. In: SAC. 2010</xref>
        .
A. Russell and P. R. Smart. NITELIGHT: A Graphical Editor for SPARQL Queries. In:
ISWC (Posters &amp; Demos). 2008.
      </p>
      <p>
        T. Berners-Lee, J. Hollenbach, K. Lu, J. Presbrey, E. Prudhommeaux, and M. M. C.
Schraefel. Tabulator Redux: Browsing and Writing Linked Data. In: LDOW. 2008.
P. Fafalios and Y. Tzitzikas. X-ENS: Semantic Enrichment of Web Search Results at
RealTime.
        <xref ref-type="bibr" rid="ref2">In: SIGIR. 2013</xref>
        .
      </p>
      <p>
        R. Hahn, C. Bizer, C. Sahnwaldt, C. Herta, S. Robinson, et al. Facet
        <xref ref-type="bibr" rid="ref5">ed Wikipedia Search.
In: BIS. 2010</xref>
        .
      </p>
      <p>D. Tunkelang. Faceted Search. Morgan &amp; Claypool Publishers, 2009.</p>
      <p>E. Hyvo¨nen, S. Saarela, and K. Viljanen. Ontogator: Combining View- and
OntologyBased Search with Semantic Browsing. In: XML Finland. 2003.
m.c. schraefel, D. A. Smith, A. Owens, A. Russell, C. Harris, and M. L. Wilson. The
Evolving mSpace Platform: Leveraging the Semantic Web on the Trail of the Memex. In:
Hypertext. 2005.</p>
      <p>P. Heim, J. Ziegler, and S. Lohmann. gFacet: A Browser for the Web of Data. In:
IMCSSW. Vol. 417. 2008.</p>
      <p>M. Hildebrand, J. van Ossenbruggen, and L. Hardman. /facet: A Browser for
Heterogeneous Semantic Web Repositories. In: ISWC. 2006.</p>
      <p>D. Huynh, S. Mazzocchi, and D. R. Karger. Piggy Bank: Experience the Semantic Web
Inside Your Web Browser. In: J. Web Sem. 5.1 (2007).</p>
      <p>
        G. Kobilarov and I. Dickinson. Humboldt: Exploring Linked Data. In: LDOW. 2008.
D. F. Huynh and D. R. Karger. Parallax and Companion: Set-based Brows
        <xref ref-type="bibr" rid="ref2">ing for the Data
Web. 2013</xref>
        .
      </p>
      <p>E. Oren, R. Delbru, and S. Decker. Extending Faceted Navigation for RDF Data. In: ISWC.
2006.</p>
      <p>
        S. Ferre´ and A. Hermann. Semantic Search: Reconciling Expressive Querying and
        <xref ref-type="bibr" rid="ref7">Exploratory Search. In: ISWC. 2011</xref>
        .
      </p>
      <p>
        A. Wagner, G. Ladwig, and T. Tran. Browsing-oriented Semantic Fac
        <xref ref-type="bibr" rid="ref7">eted Search. In:
DEXA. 2011</xref>
        .
      </p>
      <p>
        M. Arenas, B. C. Grau, E. Kharlamov, S. Marciuska, and D. Zheleznyakov. Faceted Search
over Ontology-
        <xref ref-type="bibr" rid="ref4">Enhanced RDF Data. In: CIKM. 2014</xref>
        .
      </p>
      <p>SemFacet. http://www.cs.ox.ac.uk/isg/tools/SemFacet/.</p>
      <p>
        B. C. Grau, E. Kharlamov, S. Marciuska, D. Zheleznyakov, M. Arenas, and E.
JimenezRuiz. SemFacet: Semantic Faceted S
        <xref ref-type="bibr" rid="ref4">earch over Yago. In: WWW Demo. 2014</xref>
        .
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A Core of Semantic Knowle
        <xref ref-type="bibr" rid="ref10">dge. In:
WWW. 2007</xref>
        .
      </p>
      <p>B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz. OWL 2 Web
Ontology Language Profiles. In: W3C Recommendation (2009).</p>
      <p>B. Motik, R. Shearer, and I. Horrocks. Hypertableau Reasoning for Description Logics.
In: J. Artif. Int. Res. (2009).</p>
      <p>
        W3C: SPARQL 1.1 Entailment Regimes. www.w3.org/TR/sparql11-entailment/.
G. Stefanoni, B. Motik, and I. Horrocks. Introducing Nominals to the Combined Query
Answering Approaches for EL.
        <xref ref-type="bibr" rid="ref2">In: AAAI. 2013</xref>
        .
      </p>
      <p>
        R. Kontchakov, C. Lutz, D. Toman, F. Wolter, and M. Zakharyaschev. The Combined
Approach to Ontology-Bas
        <xref ref-type="bibr" rid="ref7">ed Data Access. In: IJCAI. 2011</xref>
        .
      </p>
      <p>RDFox. www.cs.ox.ac.uk/isg/tools/RDFox/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Giese</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Jime´nez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>M. G.</given-names>
          </string-name>
          <string-name>
            <surname>Skjaeveland</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Soylu</surname>
          </string-name>
          , et al.
          <source>Optique</source>
          <volume>1</volume>
          .
          <article-title>0: Semantic Access to Big Data: The Case of Norwegian Petroleum Directorate's FactPages.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>In: ISWC (Posters &amp; Demos)</source>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Jime´nez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zheleznyakov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Bilidas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Giese</surname>
          </string-name>
          , et al.
          <article-title>Optique: Towards OBDA Systems for Industry</article-title>
          .
          <source>In: ESWC (Satellite Events)</source>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Solomakhina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ozcep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zheleznyakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hubauer</surname>
          </string-name>
          , et al.
          <article-title>How Semantic Technologies can Enhance Data Access at Siemens Energy</article-title>
          . In: ISWC.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Kaufmann</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>Evaluating the usability of natural language query languages and interfaces to Semantic Web knowledge bases</article-title>
          .
          <source>In: J. Web Sem. 8</source>
          .
          <issue>4</issue>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , E. Kaufmann, A. Go¨ hring, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Kiefer</surname>
          </string-name>
          .
          <article-title>Querying Ontologies: A Controlled English Interface for End-Users</article-title>
          . In: ISWC.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Franconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Guagliardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Trevisan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Tessaris</surname>
          </string-name>
          .
          <article-title>Quelo: an Ontology-Driven Query Interface</article-title>
          . In: DL.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>U.</given-names>
            <surname>Waltinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tecuci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Olteanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mocanu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sullivan</surname>
          </string-name>
          .
          <article-title>Natural Language Access to Enterprise Data</article-title>
          .
          <source>In: AI Magazine 35.1</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Soylu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Skjaeveland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Giese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Jime´nez-</article-title>
          <string-name>
            <surname>Ruiz</surname>
          </string-name>
          , et al.
          <article-title>A Preliminary Approach on Ontology-Based Visual Query Formulation for Big Data</article-title>
          .
          <source>In: MTSR</source>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Beneventano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bergamaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guerra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vincini</surname>
          </string-name>
          .
          <article-title>The SEWASIE Network of Mediator Agents for Semantic Search</article-title>
          . In:
          <volume>13</volume>
          .12 (
          <issue>Dec</issue>
          . 1,
          <year>2007</year>
          ). http://www.jucs.org/ jucs_13_12/the_sewasie_network_of.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>