<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ELLIS: Interactive Exploration of Linked Data on the Level of Induced Schema Patterns</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Gottron</string-name>
          <email>Thomas.Gottron@schufa.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Malte Knauf</string-name>
          <email>mknauf@uni-koblenz.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ansgar Scherp</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johann Schaible</string-name>
          <email>johann.schaible@gesis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>GESIS - Leibniz Institute for the Social Sciences</institution>
          ,
          <addr-line>Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Innovation Lab, SCHUFA Holding AG</institution>
          ,
          <addr-line>Wiesbaden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Web Science and Technologies, University of Koblenz-Landau</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>ZBW - Leibniz Information Center for Economics, Kiel University</institution>
          ,
          <addr-line>Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present ELLIS, a demo to browse the Linked Data cloud on the level of induced schema patterns. To this end, we define schema-level patterns of RDF types and properties to identify how entities described by type sets are connected by property sets. We show that schema-level patterns can be aggregated and extracted from large Linked Data sets using efficient algorithms for mining frequent item sets. A subsequent visualisation of such patterns enables users to quickly understand which type of information is modelled on the Linked Data cloud and how this information is interconnected.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The Linked Open Data (LOD) cloud does not have a fixed or pre-defined schema.
However, the use of RDF types and properties to describe the data provides an emerging
schema. This implicit schema can be induced from data observations on the Web and,
thereby, can be made explicit. A subsequent visualisation of the induced schema
information enables users to investigate the structure of Linked Data in an interactive
and exploratory way. The insights and understanding of the data gained in this way
are beneficial for several applications. It can help users in finding relevant vocabulary
terms when modelling data as LOD [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] or in programming a Linked Data application
that requires to obtain data of specific type and with specific properties [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Furthermore, it allows users to understand what type of information is available on the LOD
cloud and how this information is interconnected on the Web of Data. In this paper, we
present ELLIS, a graph-based approach for visualising and exploring induced schema
information for Linked Data on the basis of schema-level patterns.
There are various approaches of different granularity for inferring schema information
from observations made on the Linked Data cloud. For the purpose of providing a
consistent and browsable view of schema-level information, we need to describe (at least)
two aspects: an aggregated representation of the entities modelled in the Linked Data
graph as well as a notion of the relations connecting them. The entities can be grouped
together on the basis of the sets of RDF types associated with them. Likewise, the sets
of RDF properties interlinking the entities can serve to describe the relations between
groups of entities of the same type. Hence, we model schema-level patterns (SLP) as a
combination of subject type sets sts and object type sets ots (i. e., sets of RDF types T
of entities modelled on the Linked Data cloud) which are connected by property sets ps
(i. e., sets of predicates P ). Formally, an SLP is defined as a triple
(sts; ps; ots) 2 P(T )
      </p>
      <p>P(P )</p>
      <p>P(T )
(1)</p>
      <p>This schema-level representation of Linked Data lends itself for a graph-based
interpretation and visualisation. As the subject and object type sets follow the same formal
definition, they can be seen as nodes connected by edges consisting of property sets.</p>
      <p>When computing SLPs for a (potentially distributed) segment R of the RDF data
graph on the LOD cloud, we consider all URIs appearing in the subject position and
object position of RDF triples (s; p; o), extract their RDF types and the unified set of
all predicates used to model a relation between them. Formally, we define the set of
observed SLPs over an RDF data set R:</p>
      <p>SLP(R) =f(sts; ps; ots) j 9s; o : (8ts 2 sts : (s; rdf:type; ts) 2 R)
(2)
^ (8p 2 ps : (s; p; o) 2 R) ^ (8to 2 ots : (o; rdf:type; to) 2 R)g</p>
      <p>The set SLP(R) can be computed with relatively little overhead from large data sets
using the Apriori algorithm for frequent item set mining. As a result, we obtain the
above mentioned graph structure over induced schema-level patterns.
3</p>
    </sec>
    <sec id="sec-2">
      <title>ELLIS</title>
      <p>Based on the definition of SLPs, we implemented the ELLIS prototype for visualising
and navigating the LOD cloud on a schema level5. The system provides four essential
functionalities: (a) a visualisation of SLPs as a graph, (b) browsable rendering of the
graph nodes together with annotations of the relevant schema information, (c) a history
trace to keep track of previous steps in the exploration path, and (d) a search
functionality to find relevant entry points for browsing the SLP graph.</p>
      <p>
        The graph visualisation represents the type set information as well as the property
set information as nodes in a graph as shown in Figure 1. The edges connect the nodes
in a directed way to indicate the order of the triple in an SLP starting from the subject
type set over the connecting property set to the object type set. Representing all relevant
information as nodes in a browsable graph has two advantages. First, it condenses
information on a high level. This enables users to quickly grasp the structure of the data.
When needed and requested, additional information can be revealed and displayed. In
ELLIS we use hover info boxes and an additional info field in the menu to indicate the
5 A screencast of ELLIS is publicly available at https://www.youtube.com/watch?
v=q47YFKyf32I&amp;feature=youtu.be.
type and property sets associated with nodes of the SLP graph. Second, the graph can
easily be navigated by selecting any of the displayed type set nodes. Upon selection of a
node, the visualisation interface updates the graph by retrieving all connected property
sets and type sets as given by the SLPs. A history trace [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] allows the users to identify
the path they took in the exploration of the LOD cloud on a schema level. SLPs in the
history trace older than the last three steps are removed from the visualisation. This
provides orientation and context without overloading the interface with all previously
visited schema-level patterns. Finally, a search functionality permits the users to search
for specific RDF types. Subsequently, ELLIS lists all type sets containing these types.
In this way, it is possible to flexibly chose an entry point type set and the embedding
SLPs for starting to browse the schema graph.
      </p>
      <p>ELLIS is designed following a classical three-tier architecture. The Web front end
visualises the graph constructed from SLPs, displays additional information, and
provides interaction functionality. Figure 1 illustrates the graph visualisation in ELLIS.
The middle tier encapsulates functions for search and navigation. In particular, it allows
to resolve for a given type set node all relevant SLPs containing this type set as subject
type set and object type set. The backend tier consists of a database containing all SLPs
obtained from a Linked Data set. In our ELLIS demo, we constructed the SLPs from
the BTC 2012 dataset, containing approximately 1.4 billion triples.</p>
      <p>Figure 1 shows the result of an initial query about Greek philosophers to ELLIS.
The best matching type set of the query is marked in red and shown in the middle of
the graph. The related sets of RDF resources with a similar set of properties and types
are connected via relations. In the example shown in Figure 1, these are properties like
dbpo:influencedBy and dbpo:influenced. The user hovers with the mouse over a type
set TS1195275161. It mainly contains German philosophers that are dbpo:influenced
by the Greek philosophers. Subsequently, the user clicks on this type set of German
philosophers in order to further navigate through the induced SLPs in ELLIS. The result
is shown in Figure 2. The clicked type set is now indicated in red and moved to the
center of the graph visualization. Further properties, such as the birthplace and place of
death of the philosophers, of this node are shown and can be explored further.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        There are numerous approaches for inducing schema information from Linked Open
Data. The applications vary from statistical schema inferencing [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] over
cardinality estimation for query result sets [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and analytics of the dynamics of Linked Data
sources [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to schema-level indices [
        <xref ref-type="bibr" rid="ref6 ref8">6,8</xref>
        ]. Most similar to the presented SLPs are the
equivalence classes in SchemEX [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or the Node-Collection Layer from the RDF graph
summary [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which capture even more fine grained schema information. Regarding
the visualisation of Linked Data, most approaches address visualisation on an instance
level [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In contrast, Katifori et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] present a survey of different approaches for
visualising ontologies, i. e, schema-level information (so-called T-Box). A more recent
visualisation approach involving schematic information on the LOD cloud is
LODSight [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It uses a dataset summarization algorithm which induces the schema from
a dataset via SPARQL queries. Such SPARQL queries can get quite complicated.
ELLIS induces the schema via SLPs which are computed in a less complicated manner by
using the Apriori algorithm for mining frequent item sets.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>With schema-level patterns, we have defined a structure which is suitable for
inducing and aggregating schema-level information from Linked Data. The ELLIS demo
visualises schema-level patterns as a graph structure and allows for an interactive
exploration and browsing of the schema information induced from the Linked Data cloud.</p>
      <p>
        As future work, we plan to integrate the visualisation technique with a novel tool
for modelling data as LOD [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It will allow data engineers to not only conduct textual
queries to find relevant vocabulary terms for reuse but also enable them to visually
explore terms that are related with the model they are working on.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Campbell</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Interactive evaluation of the ostensive model using a new test collection of images with multiple relevance assessments</article-title>
          .
          <source>Information Retrieval</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>89</fpage>
          -
          <lpage>114</lpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Campinas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perry</surname>
            ,
            <given-names>T.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceccarelli</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delbru</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tummarello</surname>
          </string-name>
          , G.:
          <article-title>Introducing RDF graph summary with application to assisted SPARQL formulation</article-title>
          .
          <source>In: 23rd International Workshop on Database and Expert Systems Applications</source>
          . pp.
          <fpage>261</fpage>
          -
          <lpage>266</lpage>
          . IEEE (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dadzie</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rowe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Approaches to visualising linked data: A survey</article-title>
          .
          <source>Semant. web 2</source>
          (
          <issue>2</issue>
          ),
          <fpage>89</fpage>
          -
          <lpage>124</lpage>
          (
          <year>Apr 2011</year>
          ), http://dx.doi.org/10.3233/SW-2011-0037
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dividino</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottron</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Strategies for efficiently keeping local linked open data caches up-to-date</article-title>
          .
          <source>In: The Semantic Web-ISWC</source>
          <year>2015</year>
          , pp.
          <fpage>356</fpage>
          -
          <lpage>373</lpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dudáš</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svátek</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mynarz</surname>
          </string-name>
          , J.:
          <article-title>Dataset summary visualization with lodsight</article-title>
          .
          <source>In: The Semantic Web: ESWC 2015 Satellite Events</source>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>40</lpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gottron</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottron</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Perplexity of Index Models over Evolving Linked Data</article-title>
          .
          <source>In: ESWC'14: Proceedings of the Extended Semantic Web Conference</source>
          . pp.
          <fpage>161</fpage>
          -
          <lpage>175</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Katifori</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halatsis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lepouras</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vassilakis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giannopoulou</surname>
          </string-name>
          , E.:
          <article-title>Ontology visualization methods&amp;mdash;a survey</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>39</volume>
          (
          <issue>4</issue>
          ) (
          <year>Nov 2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Konrath</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottron</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>SchemEX-Efficient Construction of a Data Catalogue by Stream-based Indexing of Linked Data</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>16</volume>
          (
          <issue>5</issue>
          ),
          <fpage>52</fpage>
          -
          <lpage>58</lpage>
          (
          <year>2012</year>
          ),
          <source>the Semantic Web Challenge 2011</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moerkotte</surname>
          </string-name>
          , G.:
          <article-title>Characteristic sets: Accurate cardinality estimation for rdf queries with multiple joins</article-title>
          .
          <source>In: Proceedings of the 27th International Conference on Data Engineering</source>
          ,
          <string-name>
            <surname>ICDE</surname>
          </string-name>
          <year>2011</year>
          . pp.
          <fpage>984</fpage>
          -
          <lpage>994</lpage>
          . IEEE Computer Society (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Schaible</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottron</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scheglmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Lover: support for modeling data using linked open vocabularies</article-title>
          .
          <source>In: Joint</source>
          <year>2013</year>
          EDBT/ICDT Conferences, EDBT/ICDT '13,
          <string-name>
            <surname>Genoa</surname>
          </string-name>
          , Italy, March
          <volume>22</volume>
          ,
          <year>2013</year>
          ,
          <string-name>
            <given-names>Workshop</given-names>
            <surname>Proceedings</surname>
          </string-name>
          . pp.
          <fpage>89</fpage>
          -
          <lpage>92</lpage>
          . ACM (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Scheglmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leinberger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gottron</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lämmel</surname>
          </string-name>
          , R.: Sepal:
          <article-title>Schema enhanced programming for linked data</article-title>
          .
          <source>KI-Künstliche</source>
          Intelligenz pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Völker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niepert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Statistical schema induction</article-title>
          .
          <source>In: The Semantic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC</source>
          <year>2011</year>
          , Heraklion, Crete, Greece, May 29-June 2,
          <year>2011</year>
          , Proceedings,
          <source>Part I. Lecture Notes in Computer Science</source>
          , vol.
          <volume>6643</volume>
          , pp.
          <fpage>124</fpage>
          -
          <lpage>138</lpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>