<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Protege Plugin with Swift Linked Data Miner</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jedrzej Potoniec</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agnieszka Lawrynowicz</string-name>
          <email>alawrynowiczg@cs.put.poznan.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Computing, Poznan University of Technology ul. Piotrowo 3</institution>
          ,
          <addr-line>60-965 Poznan</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>5</lpage>
      <abstract>
        <p>We present a Protege plugin implementing Swift Linked Data Miner, an anytime algorithm for extending an ontology with new subsumptions. The algorithm mines an RDF graph accessible via a SPARQL endpoint and proposes new SubClassOf axioms to the user.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        It is not uncommon for a Linked Data dataset to provide only a very shallow
ontology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which does not cover more complex aspects of the underlying
conceptual model. On the other hand, the data in the dataset follows the model and
thus the model is re ected in the data in a form of patterns. Such patterns can
be detected using pattern mining techniques and used to extend the ontology
with new knowledge.
      </p>
      <p>
        To address this use case, we developed Swift Linked Data Miner (SLDM) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
SLDM is a pattern mining algorithm, which can discover new partial de nitions,
i.e. SubClassOf axioms, for a given class. The mined axioms are expressed in
OWL 2 EL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. SLDM is an anytime algorithm, which means that it delivers
patterns once they are mined and then re nes them. In other words, the longer
the algorithm works, the more complex patterns are mined. The algorithm does
not require an access to the whole RDF graph at the same time. Instead, it
downloads on-demand necessary parts by querying the SPARQL endpoint. To
avoid issues with high load put on the endpoint by a complex query, the queries
are very simple, consisting only of a single triple pattern and a values clause.
Direct usage of a SPARQL endpoint without overloading it is the main di erence
between SLDM and former approaches to mining ontologies from RDF graphs
(e.g. [
        <xref ref-type="bibr" rid="ref2 ref9">9,2</xref>
        ]), which require accessing the whole graph at the same time or posing
complex SPARQL queries to the endpoint.
Swift Linked Data Miner was implemented as a Java library using Apache Jena
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to interact with a SPARQL endpoint and OWL API [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to deliver mined
axioms. On top of the library, we built a plugin for Protege [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] simplifying the use
of SLDM to just few clicks. Both the source code and a JAR le with the plugin
are available in a Git 1 repository at https://bitbucket.org/jpotoniec/sldm.
      </p>
      <p>A typical work ow with the plugin is presented in Figure 1. First, the user
con gures SLDM with the basic interface of the plugin (Figure 2) by choosing
a class in the class hierarchy view and entering the address of the SPARQL
endpoint, and starts SLDM with Run button. SLDM constructs a set of URIs
belonging to the selected class, e.g. Astronaut, by posing a SPARQL query with
a triple pattern ?x a Astronaut and then operates in two alternating phases
of querying the endpoint and mining the obtained triples. In the rst phase,
the endpoint is queried about all the triples having an URI from the set in a
subject position, by using the following WHERE clause: ?s ?p ?o . VALUES
?s f&lt;&lt;URIs&gt;&gt;g. The triples are then organized into a three-level index, with
predicates in the rst level, objects in the second and subjects in the third. Such
an order enables e cient access e.g. to all URIs occurring in triples with predicate
rdf:type and object Person. During the second phase, the index is scanned to
discover the axioms, e.g. if for predicate rdf:type and object Person there are
many subjects in the third level, an axiom Astronaut subClassOf Person is
mined. If, for a given predicate, no pattern can be found, the corresponding
subjects are used as an input to the rst phase of SLDM, to mine more complex
axioms. The mined axioms are displayed using a standard Protege interface. An
axiom is accompanied there by two buttons: the one with @ symbol to display
additional information about the axiom (e.g. corresponding value of the measure
used by SLDM) and the one with 3 to add the axiom to the ontology.</p>
      <p>Figure 3 presents the Expert interface of the plugin, where the user can
netune parameters of SLDM. Field Minimal support sets up a minimal value of
support (i.e. the measure used during the mining) which an axiom must achieve
in order to be presented to the user. Field Maximum level speci es a maximal
number of nested some expressions in a mined pattern. Field Ignored properties
enables the user to use regular expressions to specify a set of predicates to ignore,
1 https://git-scm.com/
1
e.g. if a predicate conveys provenance information and is not directly relevant to
the semantics of the selected class. If the graph in the SPARQL endpoint is big,
it may be useful to use random sampling to decrease the load and increase the
speed of the mining. Field Sample size allows the user to set how many di erent
URIs from a given class will be considered. To ensure repeatability of sampling,
eld Random seed enables the user to set a seed for a random number generator.
Field Max VALUES size limits the number of URIs speci ed in a values clause
of a single query.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Proposed Demo</title>
      <p>
        During the demo we will present how to use the plugin using the DBpedia
ontology [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and a SPARQL endpoint with the DBpedia dataset loaded. To ensure
smooth operation, we will provide our own SPARQL endpoint. We will also
discuss how various settings of the parameters a ect the obtained axioms. The
interested attendees will be able to use the plugin to mine an ontology of their
choice. A short video similar to what will be presented during the demo is
available at https://www.youtube.com/watch?v=ENdNQ8ESlEk.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>
        In this paper, we presented a plugin to Protege enabling the user to discover new
SubClassOf axioms directly from an on-line Linked Data dataset accessible via
a SPARQL endpoint using Swif Linked Data Miner [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The basic con guration
of the plugin is very simple and does not require prior expertise in data mining
from the user. The plugin along with its source code is freely available at https:
//bitbucket.org/jpotoniec/sldm.
      </p>
      <p>Acknowledgement. Jedrzej Potoniec acknowledges the support from the
Polish National Science Center (Grant No 2013/11/N/ST6/03065). This work was
partially supported by the PARENT-BRIDGE program of Foundation for Polish
Science, co- nanced from European Union, Regional Development Fund (Grant
No POMOST/2013-7/8). Agnieszka Lawrynowicz acknowledges the support
from the Polish National Science Center (Grant No 2014/13/D/ST6/02076).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>DBpedia - A crystallization point for the Web of Data</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>7</volume>
          (
          <issue>3</issue>
          ),
          <volume>154</volume>
          {
          <fpage>165</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Buhmann, L.,
          <string-name>
            <surname>Lehmann</surname>
          </string-name>
          , J.:
          <article-title>Pattern based knowledge base enrichment</article-title>
          . In: Alani,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Kagal</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          , et al. (eds.)
          <source>The Semantic Web - ISWC 2013. Lecture Notes in Computer Science</source>
          , vol.
          <volume>8218</volume>
          , pp.
          <volume>33</volume>
          {
          <fpage>48</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Krotzsch,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>OWL: yet to arrive on the web of data? In: LDOW</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>937</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Horridge</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bechhofer</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The OWL API: A java API for OWL ontologies</article-title>
          .
          <source>Semantic Web</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <volume>11</volume>
          {
          <fpage>21</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>McBride</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Jena: A semantic web toolkit</article-title>
          .
          <source>IEEE Internet Computing</source>
          <volume>6</volume>
          (
          <issue>6</issue>
          ),
          <volume>55</volume>
          {
          <fpage>59</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokoue</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>OWL 2 web ontology language pro les (second edition)</article-title>
          .
          <source>W3C recommendation</source>
          ,
          <source>W3C (Dec</source>
          <year>2012</year>
          ), http://www.w3.org/TR/2012/REC-owl2-pro les-
          <volume>20121211</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sintek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <source>Creating Semantic Web Contents with Protege-2000. IEEE Intelligent Systems</source>
          <volume>16</volume>
          (
          <issue>2</issue>
          ),
          <volume>60</volume>
          {
          <fpage>71</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Potoniec</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakubowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawrynowicz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Swift Linked Data Miner: Anytime Algorithm for Mining OWL 2 EL Class Expressions Directly from On-Line Linked Data, submitted to the J</article-title>
          . of Web Semantics, available at https://goo.gl/HFghXp
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Volker, J.,
          <string-name>
            <surname>Niepert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Statistical schema induction</article-title>
          . In: Antoniou,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Grobelnik</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , et al. (eds.)
          <source>The Semantic Web: Research and Applications. Lecture Notes in Computer Science</source>
          , vol.
          <volume>6643</volume>
          , pp.
          <volume>124</volume>
          {
          <fpage>138</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>