<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LODeX: A tool for Visual Querying Linked Open Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabio Benedetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sonia Bergamaschi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laura Po</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Ingegneria "Enzo Ferrari" - Università di Modena e Reggio Emilia -</institution>
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Formulating a query on a Linked Open Data (LOD) source is not an easy task; a technical knowledge of the query language, and, the awareness of the structure of the dataset are essential to create a query. We present a revised version of LODeX that provides the user an easy way for building queries in a fast and interactive manner. When a user decides to explore a LOD source, he/she can take advantage of the Schema Summary produced by LODeX (i.e. a synthetic view of the dataset's structure) and he/she can pick graphical elements from it to create a visual query. The tool also supports the user in browsing the results and, eventually, in refining the query. The prototype has been evaluated on hundreds of public SPARQL endpoints (listed in Data Hub1) and it is available online at http://dbgroup.unimo.it/lodex2. A survey conducted on 27 users has demonstrated that our tool can effectively support both unskilled and skilled users in exploring and querying LOD datasets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Since 2008, SPARQL has been defined as the W3C standard to query LOD sources.
Today, it is the major query language for the Semantic Web, thus also the unique way to
access and explore the constantly growing LOD Cloud. Several tools have attempted
to facilitate exploring and querying LOD datasets, but it still remains complex and
restricted to skilled users.</p>
      <p>For facilitating the end users in the creation of queries on a new LOD source, we
argue that it is necessary to provide a comprehensive view of the source itself. In this
way, the structure of the dataset is revealed and the composition of correct and
consistent queries becomes an easy task.</p>
      <p>
        LODeX implements this idea: on the basis of the Schema Summary provided by the
first version of the tool [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]2, it now supports visual querying of a LOD source. The new
functionalities are: (1) a new interface that allows to compose a visual query browsing
the Schema Summary of a source; (2) a SPARQL compiler automatically produces and
submits the corresponding SPARQL query to the SPARQL endpoint and the result is
shown in a tabular view; (3) a refinement panel that allows to refine the visual query by
adding or removing attributes, by defining some filter and ordering conditions.
      </p>
      <p>In the literature, we found different visual querying tools that exploit different
strategies. Some tools provides a visual query language (a graph that describes the SPARQL
This work has been accomplished in the framework of a PhD program organized by the Global
Grant Spinner 2013 and funded by the European Social Fund and the Emilia Romagna Region.
1 http://datahub.io
2 http://dbgroup.unimo.it/lodex
query expression) like NITELIGHT, iSPARQL, RDF-GL. Even if they supply a visual
composition, this is still strongly coupled to the syntax of the query language, thus,
they remain unusable by unskilled users. Other approaches implement visual filter/flow
graphs (series of nodes that implement operations, i.e. filter, sort, transform) like
DERIPipes, MashQL, SparqlFilterFlow. Although these tools do not require any SPARQL
skills, they do not provide any high level representation of the source’s structure.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Backend</title>
      <p>The architecture of LODeX relies on the composition of two main modules: Extraction
&amp; Summarization and Visualization &amp; Querying, as shown in the Figure below.</p>
      <p>
        The extraction of the statistical information and the production of the Schema
Summary is implemented by a Python script and it uses a NoSQL database (MongoDB) to
store these information. The process takes as input the URL of a SPARQL endpoint
and it generates the queries needed to obtain structural and statistical information, i.e.
Statistical Indexes [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], of the LOD source and then it combines these information to
produce its Schema Summary.
      </p>
      <p>We used different Javascript libraries to implement the Visualization &amp; Querying
module and to produce an interactive single page web application: Polymer, a new
library based on Web Components, to manage the GUI; Data Driven Documents3 in
order to create the interactive Schema Summary; Sgvizler4 to allow the querying of the
remote endpoints. The crucial component of this module is the Query Orchestrator. It
manages the interaction between the GUI and the user and it translates the visual query
into a SPARQL query through a compiler. The compiler exploits an iterative algorithm
that traverses the query to produce the SPARQL query, it supports a subset of SPARQL
operators representing the core of the language: AND, OPTIONAL, FILTER, ORDER
BY, LIMIT and OFFSET5.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Frontend</title>
      <p>Using the GUI of LODeX, the user is driven in the composition of a visual query by
selecting classes, properties and attributes defined in the Schema Summary. A video
tutorial is available at www.dbgroup.unimo.it/lodex2_video. The interface of LODeX
3 http://d3js.org/
4 http://dev.data2000.no/sgvizler/
5 Major details about the LODeX model can be found at: http://goo.gl/FQIYLU
is composed of three panels. The first panel lists all the datasets with some of their
characteristics (number of triples, classes, properties etc.). After selecting a dataset,
the browsable Schema Summary is displayed in the second panel and it is possible to
start building a visual query (Fig. 1 A) by selecting the first node (i.e. a class) and by
adding some of its attributes or properties (M: mandatory, O: optional). The color of
the nodes, attributes and properties determine the belonging of the element to one of
the vocabularies listed in the caption on the right of Fig 1. By positioning the mouse on
a node, more information about the class are shown such as the class name, the
number of instances (B), the properties (C) and the attributes (D). Each time a property is
chosen, the connected class is added to the visual query. By repeating this process, the
user might extend the query. By clicking on the “generate” button, the user is driven
in the Refinement Panel where he/she can visualize the results and the corresponding
SPARQL query. Here, the user may modify the query (F) adding/removing filter
conditions, modifying the attribute list, sorting the records. After any change, the SPARQL
query is compiled and the results updated.</p>
      <p>
        Datasets: LODeX has been tested on 206 SPARQL endpoints compatible with
SPARQL 1.1 listed in Datahub. We were able to make available online the Schema
Summary of 110 endpoints for visual querying. A portability evaluation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] has reduced
the number of endpoints available online on LODeX to 1106.
6 Major details about this evaluation can be found at: http://goo.gl/NhJHRf
      </p>
      <p>
        Usability evaluation: We designed a usability test6 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to verify both if the graph
visualization of the Schema Summary is a valid documentation for a dataset and if the
process of building and refining a visual query is effective. We enrolled 27 anonymous
users to fill an online survey; each functionality has been tested through micro-tasks
followed by a SUS questionnaire. We obtained a median SUS [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] score of 82.5 that
classifies LODeX as Excellent, according to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Demonstration Scenario</title>
      <p>The LODeX functionalities can be better emphasized by using datasets of different
complexity, thus we will concentrate our attention on three datasets: the Linked Clean
Energy Data, which is a comprehensive set of linked clean energy data; the European
Television Heritage dataset, that contains information about the European television
archives and acts as a domain aggregator for Europeana, Europes digital library; the
World War 1 as Linked Open Data dataset, which contains information on events,
actors and places related to the First World War. The demo is open to the active role of
the audience, therefore any LOD data source available in LODeX can be used.</p>
      <p>Three scenarios will be used to explore the benefits of the tool: 1 - Exploration of
a LOD source: we will show how a user can navigate through the Schema Summary
of the three datasets to capture the underlying knowledge; 2 - Creation of a visual
query: we will start the composition of a simple query by selecting a class (and some
attributes) and we invite the audience to try composing a visual query; 3 - Refinement
of a query: we will show how to refine a query by adding or removing some attributes,
by filtering the query results with particular conditions and by ordering the results.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We described and demonstrated LODeX as a valuable tool to explore LOD sources and
produce meaningful queries on them. Formulating a query based on the Schema
Summary of a LOD source is easy and the user can avoid problems in naming classes,
properties and he can be sure to generate consistent queries. The development and
improvement of LODeX is still ongoing: we are working on clustering techniques to
show an effective Schema Summary for large datasets; new query result formats (i.e.
chart, plain text etc.), as well as adding more SPARQL operators (i.e. GROUP BY and
UNION).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bangor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kortum</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Determining what individual sus scores mean: Adding an adjective rating scale</article-title>
          .
          <source>Journal of usability studies</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <fpage>114</fpage>
          -
          <lpage>123</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>F.</given-names>
            <surname>Benedetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bergamaschi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Po</surname>
          </string-name>
          .
          <article-title>Online index extraction from linked open data sources</article-title>
          .
          <source>In Proc.of the LD4IE Workshop</source>
          <year>2014</year>
          co
          <article-title>-located with the ISWC 2014</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>F.</given-names>
            <surname>Benedetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bergamaschi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Po</surname>
          </string-name>
          .
          <article-title>A visual summary for linked open data sources</article-title>
          .
          <source>In ISWC (Posters &amp; Demos)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>F.</given-names>
            <surname>Benedetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bergamaschi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Po</surname>
          </string-name>
          .
          <article-title>Visual querying lod sources with lodex</article-title>
          . To appear
          <source>in Proceedings of the 8th international conference on Knowledge capture (K-CAP</source>
          <year>2015</year>
          ),
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Brooke</surname>
          </string-name>
          .
          <article-title>Sus-a quick and dirty usability scale</article-title>
          .
          <source>Usability evaluation in industry</source>
          ,
          <volume>189</volume>
          (
          <issue>194</issue>
          ):
          <fpage>4</fpage>
          -
          <lpage>7</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>