=Paper= {{Paper |id=Vol-2187/paper5 |storemode=property |title=Combining Faceted Search with Data-analytic Visualizations on Top of a SPARQL Endpoint |pdfUrl=https://ceur-ws.org/Vol-2187/paper5.pdf |volume=Vol-2187 |authors=Petri Leskinen,Goki Miyakita,Mikko Koho,Eero Hyvönen |dblpUrl=https://dblp.org/rec/conf/semweb/LeskinenMKH18 }} ==Combining Faceted Search with Data-analytic Visualizations on Top of a SPARQL Endpoint== https://ceur-ws.org/Vol-2187/paper5.pdf
        Combining Faceted Search with Data-analytic
        Visualizations on Top of a SPARQL Endpoint

        Petri Leskinen1 , Goki Miyakita2 , Mikko Koho1 , and Eero Hyvönen1,3
          1
          Semantic Computing Research Group (SeCo), Aalto University, Finland
                  2
                     KMD Research Institute, Keio University, Japan
     3
       HELDIG – Helsinki Centre for Digital Humanities, University of Helsinki, Finland
                          http://seco.cs.aalto.fi, http://heldig.fi



       Abstract. This paper discusses practical experiences on creating data-analytic
       visualizations in a browser, on top of a SPARQL endpoint based on the results of
       faceted search. Four use cases related to Digital Humanities research in proposog-
       raphy are discussed where the SPARQL Faceter tool was used and extended in
       different ways. The Faceter tool allows the user to select a group of people with
       shared properties, e.g., people with the same place of birth, gender, profession,
       or employer. The filtered data can then be visualized, e.g., as column charts, with
       business graphics, sankey diagrams, or on a map. The use cases examine the
       potential of visualization as well as automated knowledge discovery in Digital
       Humanities research.

       Keywords: Linked Data, Visualization, Biography, Prosopography, Knowledge
       Discovery


1   Client-side Faceted Search on a SPARQL Endpoint

Faceted search and browsing [5,21], known also as view-based search [19] and dynamic
hierarchies [20], has become a norm in web applications. The idea here is to index data
items along orthogonal category hierarchies, i.e., facets 4 (e.g., places, times, document
types etc.) and use them for searching and browsing: the user selects categories on facets
in free order, and the data items included in the selected categories are considered search
results. After each selection, a count is computed for each category showing the number
of results, if the user next makes that selection. In this way, search is guided by avoiding
annoying ”no hits” results. Moreover, hit distributions on facets provide the end-user
with data-analytic views on what kind of items there are in the underlying database.
Faceted search is especially useful on the Semantic Web where hierarchical ontologies
used for data annotation provide a natural basis for facets, and reasoning can be used
for mapping heterogeneous data to facets [8]. The idea of combining faceted search and
visualizations has been applied, e.g., in ePistolarium5 . However, this application is not
based on Linked Data unlike ours [10,11,16,18].
     4
       The idea of facets dates back to the Colon Classification system of S. R. Ranganathan in
library science, published in 1933.
     5
       http://ckcc.huygens.knaw.nl/epistolarium/



                                             53
Faceted Search and Data-analytic Visualizations on Top of a SPARQL Endpoint

      Faceted search can be implemented with server-side solutions, such as Solr6 , Sphinx7 ,
  and ElasticSearch8 , and higher level tools, such as vuFind9 . However there is a lack of
  light-weight client-side faceted search tools or components that are able to search large
  datasets directly from a SPARQL endpoint. Such a tool is useful, because it can be used
  easily on virtually any open SPARQL endpoint on the web without any need for server
  side programming and access rights. This paper presents such a tool, SPARQL Faceter,
  a web component for implementing faceted search applications efficiently in a browser,
  based only on a standard SPARQL API. We extend our earlier short paper of the tool
  [14] by 1) showing detail about how the tool is used and works, by 2) explaining novel-
  ties in its latest version, and 3) especially show how the tool and faceted search can be
  extended with different kind of data-analytic visualizations.
      As a proof of concept, four use case studies of data visualization are discussed
  from a SPARQL Faceter perspective: 1) WarSampo, using cultural heritage materials
  of World War II in Finland [10]. 2) Norssit, on top of a Finnish high school alumni
  registry data [11]. 3) Semantic National Biography of Finland, based on the National
  biography of the Finnish literature society [16]. 4) U.S. Congress Prosopographer, uti-
  lizing biographical records of U.S. Congress legislators [18]. In these cases, the fol-
  lowing two-step prosopographical research method [22, p. 47] is supported where the
  goal is to find out some kind of commonness or average in selected target groups of
  people. First, a target group of people is selected that share desired characteristics for
  solving the research question at hand. Second, the target group is analyzed, and possibly
  compared with other groups, in order to solve the research question. For finding target
  groups, faceted search is used, and then visualizations are created in order to analyze
  their characteristics.
      The rest of the paper is organized as follows. First, characteristics of SPARQL
  Faceter are explained with a focus on showing how it is used in practice in applica-
  tions. After this, extending the tool with visualizations is in focus. In conclusion, lessons
  learned and directions for further research are discussed.


  2       Using and Extending SPARQL Faceter

  SPARQL Faceter uses AngularJS10 as the implementation framework. The GitHub
  page11 gives instructions how to install it, and how to define the application with facets
  of desired type in the source code. The page provides demo examples with queries
  to DBpedia and WarSampo databases. The developer can adopt it to any Linked Data
  publication by configuring the endpoint, property paths for facets, and queries. The
  SPARQL Faceter is documented in detail12 .
      6
         http://lucene.apache.org/solr/
      7
         http://sphinxsearch.com/blog/2013/06/21/faceted-search-with-sphinx/
       8
         https://www.elastic.co/
       9
         https://vufind.org/
      10
         https://angularjs.org/
      11
         https://github.com/SemanticComputing/angular-semantic-faceted-search
      12
         http://semanticcomputing.github.io/angular-semantic-faceted-search/#/api



                                               54
Faceted Search and Data-analytic Visualizations on Top of a SPARQL Endpoint


  PREFIX xsd:      
  PREFIX schema:   
  PREFIX skos:     
  PREFIX skosxl:   
  PREFIX nbf:      
  PREFIX crm:      
  PREFIX foaf:     
  PREFIX gvp:      

  SELECT DISTINCT (?id AS ?id__uri) ?id__name ?value WHERE {
    # Restraints set in Faceter
    { ?id a nbf:PersonConcept ;
          foaf:focus/ˆcrm:P98_brought_into_life/
          nbf:time/gvp:estStart ?slider_2 .
      FILTER (1800<=year(?slider_2) && year(?slider_2)<=2018)
      }

    # Query person’s age
    ?id foaf:focus/ˆcrm:P100_was_death_of/nbf:time
        [ gvp:estStart ?time ; gvp:estEnd ?time2 ] ;
        foaf:focus/ˆcrm:P98_brought_into_life/nbf:time
        [ gvp:estStart ?birth ; gvp:estEnd ?birth2 ] .
    BIND (xsd:integer(0.5*
        (year(?time)+year(?time2)-year(?birth)-year(?birth2)))
      AS ?value)
    # Filter out erroneous cases
    FILTER (-1