<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SPARQture: A More Welcoming Entry to SPARQL Endpoints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fadi Maali</string-name>
          <email>fadi.maali@insight-centre.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight Centre for Data Analytics, National University of Ireland Galway</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The semi-structured nature of Semantic Web data and its use of multiple vocabularies make it challenging to describe the structure of some RDF dataset. When trying to understand the content behind some SPARQL endpoint, users usually need to write a number of SPARQL queries to \mine" the types contained in the dataset and their relations. We argue that this is a tedious, error-prone and time consuming task to be carried out manually. SPARQture, the tool described in this paper, automates this process and renders the structure of a particular RDF data available behind some SPARQL endpoint using a number of visually rich components. Moreover, SPARQture supports user interaction to further explore the data. We describe SPARQture and discuss its t to the IESD Challenge. The reader is encouraged to view the demo at: http://sparqture.appspot.com</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The Semantic Web community succeeded in making large amounts of structured
data available on the Web in the last decade. This data, represented using the
RDF data model, spans various domains from social networks, to bioinformatics
to geography. Many of these datasets are made available via SPARQL endpoints.
SPARQL [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provides a standardised expressive query language that supports
slicing and dicing RDF data. More than 500 SPARQL endpoints are currently
listed on datahub.io.
      </p>
      <p>SPARQL endpoints are commonly accessible over the Web. They can be
queried using any tool that supports the HTTP protocol or via an HTML form,
typically with a large input text. The user is then left with the daunitng task of
discovering the content of the endpoint in order to be able to issue meaningful
queries. This task is particularly challenging for RDF as it is semi-structured
data that tends to be heterogeneous and uses multiple vocabularies.</p>
      <p>A common way to scrutinise the content of some SPARQL endpoint is via
issuing a series of SPARQL queries1. These usually go from listing classes in the
1 For example see http://dallemang.typepad.com/
my_weblog/2008/08/rdf-as-self-describing-data.html
and http://answers.semanticweb.com/questions/25696/
extract-ontology-schema-for-a-given-sparql-endpoint-data-set
endpoint, to properties of each class, to sample instances of some classes, etc.
Furthermore, to get an overview of the structure and relationships in the data,
a more complex sequence of SPARQL queries are needed. We argue that this
is a tedious, error-prone and time consuming task to be carried out manually.
SPARQture, the tool described in this paper, automates this process.</p>
      <p>SPARQture extracts and explicitly presents the schema used in a particular
RDF data available behind some SPARQL endpoint. The tool issues a number
of queries and then renders the information it collects using a number of visually
rich and well-known information visualisation components. Furthermore,
SPARQture supports user interaction to further explore the data. In the remainder of
this paper, we provide more details about SPARQture (section 2) and discuss its
t to the IESD Challenge by addressing the three key themes of the workshop:
human factors, computational models and application domains (section 3).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Overview of the System</title>
      <p>Figure 1 shows the main interface of our tool, SPARQture. The main interface
comprises two main panels: the tag cloud panel (top part) and the structure
panel (bottom part). The tag cloud panel provides an overview of the data by
displaying the types of entities in the endpoint. On the other hand, the structure
panel allows getting detailed view of the structure of entities of a particular type.
The structure is presented as a tree and can be further navigated by clicking on
the nodes.</p>
      <p>
        The user starts by entering the URL of a sparql endpoint or a VoID2 le
location. SPARQture presents a tag cloud of types in the endpoint where the size
of a tag re ects the number of instances of that type. By clicking on some type,
a set of sample resources are shown in the tag cloud panel and the structure of
some of these resources is plotted in the structure panel. It is worth mentioning
here that the root of the tree does not have to be a single resource but can
represent multiple resources. In the process of navigation, a node can represent
multiple values. By clicking on such node, SPARQture combines the structure
of the relations of these resources collectively, hence, supporting what can be
referred to as set-based browsing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Finally, we summarise what we believe are the novel aspects of SPARQture:
{ Extracts and explicitly presents the schema used in the underlying RDF
data.
{ Provides multiple visually-rich views over the data that allows exploring it
at various levels of abstraction.</p>
      <p>{ Can run on top of any endpoint that supports SPARQL 1.1.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Addressing Key Themes of the Workshop</title>
      <p>3.1</p>
      <sec id="sec-3-1">
        <title>Human Factors</title>
        <p>
          SPARQture combines a number of familiar components to support presenting
and exploring information. Speci cally, tag clouds and tree visualisations are
used. Tag clouds utilise font size and colour as visual indicators of the
importance of an item as recommended in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. On the other hand, the tree component
provides a comprehensible way of representing data structure. Colour of a node
in the tree encodes its type (resources vs. literals). Furthermore, for nodes that
represent multiple items, a thick border is added. To emphasise the structure of
the data, we do not show the values of the tree nodes (resources and literals),
but make them available via a tooltip upon hovering over nodes. We believe that
these visual clues allow getting an idea about the data structure in a short time.
In both the tag cloud and the tree view, URIs are presented in the shortened
form of prefix:local-name where pre xes come from prefix.cc service.
        </p>
        <p>
          The choice of a tree instead of a graph to present data structure is based
on our belief that trees are easier to comprehend. By allowing duplicate nodes
in trees, graphs that contain cycles can still be represented as trees. A similar
decision was made in the Tabulator browser [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
2 http://www.w3.org/TR/void/
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Computational Model</title>
        <p>The core of SPARQture is an engine that translates user's interactions into a
series of SPARQL queries. The results of the SPARQL queries are serialised
as JSON objects that are rendered using JavaScript on the client side. We use
Jquery TagCloud library3 to render tag clouds and reuse Javascript code from
Open Re ne 4 project to render trees.</p>
        <p>In the following we highlight a number of aspects of the computational model:
SPARQL queries generation: We try to group SPARQL queries together to
reduce the number of queries issued at the SPARQL endpoint and enhances
the interface responsiveness at the same time. For example, the resource
representing Dublin in DBpedia5 has 261 related resource (i.e., a triple with
a URI object). Nevertheless, SPARQture issues only 8 queries to compute
the tree structure with depth 3 rooted at Dublin. This is achieved via sending
queries based on unique properties and grouping multiple properties in one
query.</p>
        <p>Content discovery: The entry point for the tool can be a SPARQL endpoint
URL or a VoID le. SPARQture tries to locate the SPARQL endpoint
location from the VoID le by searching for void:sparqlEndpoint property.
We are currently working on extending this discovery feature to any resource
URI by consulting the Endpoint Lookup Service6.</p>
        <p>A few heuristics: We embedded a number of heuristics in the tool to select
resources shown as sample resources or used as a root in the tree.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Application Domain</title>
        <p>SPARQture is independent from the application domain. It can support any
domain as long as an endpoint that supports SPARQL 1.1 is provided.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion &amp; Conclusion</title>
      <p>
        Extracting the schema of some semi-structured data has also been studied in the
context of XML data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This same task is more challenging in RDF data because
it is a graph data, contrary to the tree structure of XML data. Tree structures
have a designated root that provides an entry point to access all the items in the
tree. A tree root has no guaranteed equivalent in graphs. Our initial work tried
to use void:rootResource as a root for the RDF graph data. However, this
proved to be infeasible as this property is very rarely used and, when it exists,
computing the structure starting from it is prohibitively expensive. There exists
some work in the Semantic Web community to summarise RDF graphs based
3 https://github.com/addywaddy/jquery.tagcloud.js/
4 http://openrefine.org/
5 http://dbpedia.org/resource/Dublin
6 http://void.rkbexplorer.com/endpoint-search/
on clustering resources by types and then extracting relationships between the
computed clusters [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. SPARQture can be used on top of RDF graph summary
as a presentation layer that utilises the precomputed structure. We hope to
implement this functionality in the future.
      </p>
      <p>To conclude, we described SPARQture, a tool that aims at enhancing
accessibility of SPARQL endpoints by providing a more welcoming interactive entry
point. SPARQture uses tag clouds and trees to provide an overview of the data
and supports interactive exploration of the data behind some SPARQL endpoint.
Acknowledgements. Fadi Maali is funded by the Irish Research Council,
Embark Postgraduate Scholarship Scheme.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Scott</given-names>
            <surname>Bateman</surname>
          </string-name>
          , Carl Gutwin, and
          <string-name>
            <given-names>Miguel</given-names>
            <surname>Nacenta</surname>
          </string-name>
          .
          <article-title>Seeing Things in the Clouds: The E ect of Visual Features on Tag Cloud Selections</article-title>
          .
          <source>In Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia</source>
          ,
          <source>HT '08</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Tim</surname>
          </string-name>
          Berners-lee, Yuhsin Chen, Lydia Chilton, Dan Connolly, Ruth Dhanaraj, James Hollenbach, Adam Lerer, and David Sheets.
          <article-title>Tabulator: Exploring and Analyzing Linked Data on the Semantic Web</article-title>
          .
          <source>In In Procedings of the 3rd International Semantic Web User Interaction Workshop (SWUI06)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Stephane</given-names>
            <surname>Campinas</surname>
          </string-name>
          , Thomas E. Perry, Diego Ceccarelli, Renaud Delbru, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Tummarello</surname>
          </string-name>
          .
          <article-title>Introducing RDF Graph Summary With Application to Assisted SPARQL Formulation</article-title>
          .
          <source>In Proceedings of the 23rd International Workshop on Database and Expert Systems Applications</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Steve</given-names>
            <surname>Harris</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andy</given-names>
            <surname>Seaborne</surname>
          </string-name>
          .
          <source>SPARQL 1</source>
          .
          <article-title>1 Query Language</article-title>
          .
          <source>W3C Recommendation 21 March</source>
          <year>2013</year>
          . http://www.w3.org/TR/sparql11-query/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. David F Huynh and
          <string-name>
            <given-names>David</given-names>
            <surname>Karger</surname>
          </string-name>
          .
          <article-title>Parallax and Companion: Set-based Browsing for the Data Web</article-title>
          .
          <source>Proceedings of WWW'09</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Svetlozar</given-names>
            <surname>Nestorov</surname>
          </string-name>
          , Je rey
          <string-name>
            <given-names>D.</given-names>
            <surname>Ullman</surname>
          </string-name>
          , Janet L.
          <string-name>
            <surname>Wiener</surname>
          </string-name>
          , and
          <string-name>
            <surname>Sudarshan</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chawathe</surname>
          </string-name>
          . Representative Objects:
          <article-title>Concise Representations of Semistructured, Hierarchial Data</article-title>
          .
          <source>In Proceedings of the Thirteenth International Conference on Data Engineering</source>
          , ICDE '
          <volume>97</volume>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>