<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SPARQL Faceter|Client-side Faceted Search Based on SPARQL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mikko Koho</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erkki Heino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eero Hyvonen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Semantic Computing Research Group (SeCo), Aalto University, Department of Computer Science</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The faceted search paradigm is widely used in web applications, and there are various tools available for implementing it on the server side. In contrast, this paper presents an HTML based component tool on the client side that can be plugged on virtually any public SPARQL endpoint on the web, using only SPARQL API for data retrieval. To test and demonstrate the idea and the tool, application of the tool in a large in-use semantic portal is presented.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Faceted search [
        <xref ref-type="bibr" rid="ref12 ref2">2, 12</xref>
        ], known also as view-based search [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and dynamic
hierarchies [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], is based on indexing data items along orthogonal category hierarchies,
i.e., facets (e.g., places, times, document types etc.). In searching, the user selects
in free order categories on facets, and the data items included in the selected
categories are considered search results. After each selection, a count is
computed for each category showing the number of results, if the user next makes
that selection. In this way, search is guided by avoiding annoying "no hits"
results. The idea of faceted search is especially useful on the Semantic Web where
hierarhical ontologies used for data annotation provide a natural basis for facets,
and reasoning can be used for mapping heterogeneous data to facets [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Faceted search can be implemented with popular server-side solutions, such
as Solr1, Sphinx2, and ElasticSearch3. However there is a lack of light-weight
client-side faceted search tools or components that search data directly from a
SPARQL endpoint. Such a tool would be very useful, because it could be used
on virtually any open SPARQL endpoint on the web without any need for server
side programming and access rights. This paper presents a web component for
implementing faceted search applications in a browser, based only on a standard
SPARQL API.</p>
    </sec>
    <sec id="sec-2">
      <title>The Tool: SPARQL Faceter</title>
      <p>
        Design Requirements The requirements for our tool, SPARQL Faceter, are
based on our earlier experience on tediously implementing server side faceted
search over and over again in di erent applications, as well as on the generic
requirements of the search paradigm [
        <xref ref-type="bibr" rid="ref12 ref3">12, 3</xref>
        ]: there is a need for a lightweight
browser-based faceted search engine tool that is easy to use and adapt for di
erent RDF datasets published in SPARQL data services on the web. A particular
demand of our own Linked Data Finland data service4 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is to support external
users of the data in application development, and here the notion of the Rich
Internet Application, where the data service is completely detached using
standard SPARQL API was deemed very useful. The tool should therefore ful ll the
following requirements:
1. Data read from a SPARQL endpoint. It is a common practice to
publish RDF datasets as SPARQL services, and build applications that use the
SPARQL endpoint to query for information. The data can reside anywhere
on the web and be managed by anybody on the server side. No extra e ort
is required for managing the data from the application developer side.
2. Easily adaptable to di erent datasets. Di erent datasets have di erent
data models and the application should be easy to con gure for these.
3. Easy integration to web pages. A developer should be able to integrate
a faceted search on some web page and modify the appearance of the faceted
search freely.
4. Fluent user experience. When a user makes selections of the facets, the
application should show results in a reasonable time. The UI should be easy
to use and should support the exploratory analysis of datasets.
5. Support for hierarchical facets. The tool should be able to handle
hierarchical concepts in the facets, displaying the hierarchy in the facet selections.
When a user selects a category upper in a hierarchy, the results should
include all results for its' sub categories.
6. Easy to maintain. The tool is planned to be actively used, and thus should
be easy to maintain and develop further.
      </p>
      <p>In short, the goal was to create a tool that would be easy to use
out-ofthe-box, and require minimal con guration, yet provide enough con guration
options to be useful in di erent contexts.</p>
      <p>Design AngularJS5 was chosen as the implementation framework for our tool.
SPARQL Faceter is developed as open source with source code available on
GitHub.</p>
      <p>SPARQL Faceter consists of two distinct components. One is the main
Semantic Faceted Search component6, which contains the basic functionalities of</p>
      <sec id="sec-2-1">
        <title>4 http://ldf. 5 https://angularjs.org/ 6 https://github.com/SemanticComputing/angular-semantic-faceted-search</title>
        <p>the tool, and is dependent on the second component: an AngularJS service for
querying SPARQL endpoints with paging and object mapping7.</p>
        <p>The Semantic Faceted Search component is comprised of di erent services,
and most importantly, the facet selector directive. This directive is the main
component of the tool, and provides the user interface for using the faceted
search.</p>
        <p>The facet selector directive is independent of any results the facet selections
might imply: it's role is to keep track of what values are selected in what facets,
and to display available selections to the user including the number of results
each selection implies. The tool communicates the facet selections to whatever
application is using it by calling a callback function whenever the selections
change. Thus, SPARQL Faceter places no restrictions on what can be done with
the user's selections. The tool does, however, provide services for integrating the
facet selections to SPARQL queries, handling SPARQL results, and updating
the URL based on current selections.</p>
        <p>The Semantic Faceted Search component works by building a single SPARQL
query for the facets it controls, and updating the query each time the user
makes a selection. This has the advantage of keeping the states of the facets
synchronized at all times. The downside is that with very many facets querying
can become slow, and the user experience may su er as the facets are unusable
while their states are being queried. In order to make it feasible to have many
facets available, facets can be enabled and disabled: only enabled facets are
included in the query. Whether or not a facet is initially enabled is con gurable.
The tool shows a spinner on the components when querying for data, to convey
to the user that the information in the elements is being updated.</p>
        <p>For enabled facets, the tool shows all possible values for the facets' property,
and the amount of instances that the selection results in. The amount of results
is calculated based on all the current selections in all of the facets. Values that
would lead to no results are omitted. In addition to the list of possible values,
there is also a text input for searching the facet values.</p>
        <p>The tool also supports hierarchical facets, in which selecting a category upper
in the hierarchy also shows results for all the categories that are below it in
the hierarchy. The hierarchy speci cation is not limited to using a prede ned
property or vocabulary, but is based on con guring a property path that captures
the hierarchy, and giving the top categories explicitly. The facet element displays
a two-level hierarchy of the categories. For categories below the current category
level, the hierarchy is attened and all the sub categories are listed below the
upper category. The category level is initially on the topmost categories, and
selecting a value changes category level to the level of the selected value.</p>
        <p>The proposed way to use the facet selections, as returned by the tool, is to
build another SPARQL query for retrieving the results according to the user's
selections. By using the tool's services, the results can then be mapped into
JavaScript objects with pagination and client-side caching of the received results.
7 https://github.com/SemanticComputing/angular-paging-sparql-service
Installation and Con guration At its simplest, the only con guration
options needed by SPARQL Faceter are the URL of a SPARQL endpoint, a callback
function to be called with the facet selections when they change, the RDF class
URI of the resources which are the target of the search, and the facet con
gurations. A basic facet can be con gured with just the URI of the property and
a name for the facet to be displayed to the user. The callback function de nes
what happens when the facet selections are updated, which typically is to update
a results display according to user selections.</p>
        <p>Other types of facets require some additional con guration options, but the
amount of con guration needed for any facet type has been kept to a minimum.
The other facet types include a keyword search facet, a time-span facet, and a
hierarchical facet. More details about the con guration is given in the repository
of the Semantic Faceted Search component.</p>
        <p>As for the results, the user of the tool is expected to build a query for
retrieving results based on the facet selections. This makes the tool extremely exible
to di erent data models, but requires that the user of the tool is familiar with
SPARQL syntax.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Applications</title>
      <p>
        We tested and evaluated SPARQL Faceter on the WarSampo portal [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
portal8 consists of di erent application perspectives built on top of interlinked
Finnish Second World War data published on a SPARQL endpoint. The SPARQL
endpoint is provided by an Apache Fuseki Server9. The applications are tested
to work with all major browsers (Chrome, Firefox and Internet Explorer).
Use case 1: Death Records Perspective The WarSampo death records
perspective10 uses the application to provide an interface for searching and exploring
the death records of Finnish WW2 casualties [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The dataset consists of about
95,000 death records, and almost 2.4 million RDF triples.
      </p>
      <p>Fig. 1 shows a screenshot of the faceted search of death records using SPARQL
Faceter. The application interface contains 12 facets and a table-like view of the
results. The facets include a keyword search facet for the persons' names, a
timespan facet for the time of death and a hierarchical military rank facet, and 9
basic facets of the annotated properties of the death records. The source code
of the application is available online11.</p>
      <p>A dataset this large provides a benchmark for the performance of the SPARQL
Faceter, as SPARQL queries and also data handling on the client-side could take
long enough to deteriorate the user experience. Quite much e ort had to be put
on optimizing the SPARQL queries for speed, while also making sure that no</p>
      <sec id="sec-3-1">
        <title>8 http://www.sotasampo. / 9 https://jena.apache.org/documentation/fuseki2/ 10 http://www.sotasampo. /en/casualties/ 11 https://github.com/SemanticComputing/WarSampo-death-records</title>
        <p>unnecessary processing of the results happens on the client-side. Normally when
searching and browsing the dataset, results are displayed in a few seconds after
the user makes a selection. With some selections that result in a large result set,
and if additionally many facets are enabled, the user may have to wait more
than ten seconds to see the results and updated facets. Most of the time goes to
waiting for results from the SPARQL endpoint.</p>
        <p>Use case 2: Photographs Perspective The WarSampo photographs
perspective12 uses the application to create a faceted search interface for wartime
photographs. The dataset consists of about 159,000 photographs and a total of
about 1.6 million triples. The interface contains only 5 facets, which ensures fast
response times. The facets include a time-span facet for the date the photograph
was taken, a keyword search for descriptions, and basic facets for related people
and places. Fig. 2 shows a screenshot of the photograph perspective.</p>
        <p>The perspective uses \in nite scrolling" to display the photographs: more
results are retrieved as the user scrolls down for more photographs. Clicking
on a photograph shows the user more information about the photograph, and
provides hyperlinks to other WarSampo perspectives via linked entities.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>In this paper we have presented the SPARQL Faceter for creating client-side
faceted search applications. The con guration of the facets has been kept as
12 http://www.sotasampo. /en/photographs/
simple as possible. SPARQL Faceter handles querying the SPARQL endpoint
when needed, mapping the SPARQL results to JavaScript objects, and creating
and updating the facet elements. To make use of the facet selection values, the
developer using the tool has to retrieve results based on selected values, which
usually includes writing SPARQL queries.</p>
      <p>Results Section 2 presented the design requirements for a Semantic Web tool
that provides client-side faceted search of RDF datasets published as SPARQL
services. Here is a detailed view of how the SPARQL Faceter manages to satisfy
these requirements:
1. Data read from a SPARQL endpoint. All data is retrieved via SPARQL,
from an endpoint de ned by the user.
2. Easily adaptable to di erent datasets. Con guration of the tool is
required for every dataset. The con guration required for setting up facets is
kept to a minimum.
3. Easy integration to web pages. A faceted search can be easily added
to a web page by using the developed AngularJS components. The facet
selections have a default look that can be customized using CSS.
4. Fluent user experience. For datasets that are distinctly smaller than
the ones used in the use cases, the tool is almost certainly fast enough for
good user experience. For the faceted search applications in the use cases,
some user selections can be slow, but common usage gives results based on
user selections in a few seconds. For larger datasets, the amount of facets
may have to be restricted, depending on the processing capabilities of the
SPARQL server.
5. Support for hierarchical facets. Two-level hierarchical facets are
supported by the tool, and demonstrated in the rst use case. The dataset can
contain more than two levels, but the hierarchical facet attens the hierarchy
to two levels for display in the select element.
6. Easy to maintain. The separation of concerns design principle is used in
development to simplify code structure and maintenance. Automated tests
are used to help maintain good code quality.</p>
      <p>Our application is able to satisfy the given requirements, and thus provides
a usable tool for faceted search of RDF datasets.</p>
      <p>
        Related Work Jassa [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a JavaScript suite for SPARQL access, which
among other things provides support for client-side faceted search. For creating
simple faceted search applications Jassa is, however, a much larger and more
complex solution than SPARQL Faceter.
      </p>
      <p>
        Sparklis [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is an application for exploring and querying data from a SPARQL
endpoint using a natural language interface13, which also implements a faceted
search. Sparklis focuses on data exploration and requires no prior knowledge of
the data model of the target endpoint, whereas SPARQL Faceter aims to support
development of applications tailored for a speci c dataset.
      </p>
      <p>
        Oren et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] have developed a prototype for faceted navigation of arbitrary
RDF data with automatic facet selection, as a server-side solution.
Future Work Plans for future development include:
1. Semi-automatic adaptation to new datasets. The tool could create a
basic con guration for a new SPARQL endpoint, possibly based on only a
class selection by the user.
2. Customizing facet appearance. To support customizing the appearance
of the facets further , we are planning a con guration option for the user to
de ne their own template for the facets.
      </p>
      <p>Acknowledgements Kasper Apajalahti and Jouni Tuominen contributed to an
earlier version of our tool. This work was supported by the Linked Open Data
Science project14 funded by the Ministry of Education and Culture in Finland.
13 http://www.irisa.fr/LIS/ferre/sparklis/
14 http:/seco.cs.aalto. /projects/lodsci/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ferre</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Expressive and scalable query-based faceted search over sparql endpoints</article-title>
          .
          <source>In: The Semantic Web{ISWC</source>
          <year>2014</year>
          , pp.
          <volume>438</volume>
          {
          <fpage>453</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hearst</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elliott</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>English</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sinha</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swearingen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.P.</given-names>
          </string-name>
          :
          <article-title>Finding the ow in web site search</article-title>
          .
          <source>CACM</source>
          <volume>45</volume>
          (
          <issue>9</issue>
          ),
          <volume>42</volume>
          {
          <fpage>49</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hearst</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Design recommendations for hierarchical faceted search interfaces</article-title>
          .
          <source>In: ACM SIGIR workshop on faceted search</source>
          . pp.
          <volume>1</volume>
          {
          <issue>5</issue>
          . Seattle, WA (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Hyvonen, E.,
          <string-name>
            <surname>Saarela</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viljanen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Application of ontology-based techniques to view-based semantic search and browsing</article-title>
          . In:
          <article-title>The semantic web: research and applications</article-title>
          .
          <source>First European Semantic Web Symposium (ESWS</source>
          <year>2004</year>
          ). pp.
          <volume>92</volume>
          {
          <fpage>106</fpage>
          . Springer-Verlag (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Hyvonen, E.,
          <string-name>
            <surname>Tuominen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alonen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Makela, E.:
          <article-title>Linked Data Finland: A 7-star model and platform for publishing and re-using linked datasets</article-title>
          .
          <source>In: The Semantic Web: ESWC 2014 Satellite Events, Revised Selected Papers</source>
          . pp.
          <volume>226</volume>
          {
          <fpage>230</fpage>
          .
          <string-name>
            <surname>SpringerVerlag</surname>
          </string-name>
          (May
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Hyvonen, E.,
          <string-name>
            <surname>Heino</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leskinen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ikkala</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koho</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tamper</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuominen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Makela, E.:
          <article-title>WarSampo data service and semantic portal for publishing linked open data about the Second World War history</article-title>
          .
          <source>In: The Semantic Web|Latest Advances and New Domains (ESWC</source>
          <year>2016</year>
          ). Springer-Verlag (May
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Koho</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Hyvonen, E.,
          <string-name>
            <surname>Heino</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuominen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leskinen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Makela, E.:
          <article-title>Linked death|representing, publishing, and using Second World War death records as linked open data</article-title>
          .
          <source>In: Proceedings of the 1st Workshop on Humanities in the Semantic Web - WHiSe (at ESWC</source>
          <year>2016</year>
          ) (May
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Oren</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delbru</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Extending faceted navigation for RDF data</article-title>
          .
          <source>In: The Semantic Web|ISWC</source>
          <year>2006</year>
          , pp.
          <volume>559</volume>
          {
          <fpage>572</fpage>
          . Springer-Verlag (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pollitt</surname>
            ,
            <given-names>A.S.:</given-names>
          </string-name>
          <article-title>The key role of classi cation and indexing in viewbased searching</article-title>
          .
          <source>Tech. rep.</source>
          , University of Hudders eld,
          <source>UK</source>
          (
          <year>1998</year>
          ), http://www.i a.
          <source>org/IV/i a63/63polst</source>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sacco</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          :
          <article-title>Dynamic taxonomies: guided interactive diagnostic assistance</article-title>
          . In: Wickramasinghe, N. (ed.)
          <article-title>Encyclopedia of Healthcare Information Systems</article-title>
          . Idea Group (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Stadler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Westphal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
          </string-name>
          , J.:
          <article-title>Jassa: a JavaScript suite for SPARQLbased faceted search</article-title>
          .
          <source>In: Proceedings of the 2014 International Conference on Developers-Volume</source>
          <volume>1268</volume>
          . pp.
          <volume>31</volume>
          {
          <fpage>36</fpage>
          .
          <string-name>
            <surname>CEUR-WS. org</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tunkelang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Faceted search,
          <source>Synthesis lectures on information concepts</source>
          ,
          <source>retrieval, and services</source>
          , vol.
          <volume>1</volume>
          . Morgan &amp; Claypool Publishers (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>