<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Embedding-based Approach to Constructing OWL ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lijing Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaowang Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leyuan Zhao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiachen Tian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shizhan Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hong Wu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kewen Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiyong Feng</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science and Technology, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Software, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tianjin Key Laboratory of Cognitive Computing and Application</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a novel system OWLearner for automatically extracting axioms for OWL ontologies from RDF data using embedding models. In this system, ontology construction is transformed to the classification problem in machine learning and thus off-the-shelf tools can be employed to learn axioms in OWL. There are mainly three modules, namely, embedding, sampling, and training &amp; learning. Large ontologies DBpedia and YAGO are used to validate the proposed approach. The experimental results show that OWLearner is able to learn high-quality expressive OWL axioms automatically and efficiently.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>An ontology is a formal representation of objects and their relationships in a domain
of interest. OWL, with its latest version OWL 2, is the W3C standard for ontology
languages. Formally, an ontology is expressed as a pair of an RDF dataset and a TBox.</p>
      <p>
        Automatic construction of ontologies is an important but challenging task in
ontology engineering. Specifically, given an RDF data, the task of constructing ontologies
we are interested in is to extract DL axioms. DL-Learner [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is a leading system for
enriching DL ontologies, which is based on techniques, such as refinement operator, in
inductive logic programming (ILP) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, ILP-based systems are usually unable
to handle very large ontologies.
      </p>
      <p>
        We tackle this challenge by providing a scalable method for learning DL axioms
using machine learning techniques. Based on the embedding methods in representation
learning, an RDF dataset is embedded into a continuous vector space and the inherent
structure of the original data is preserved [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Thus, the ontology construction can be
accomplished in the vector space via supervised machine learning, which in essence is
to learn a function for each axiom pattern to predict the correctness of input axioms.
To achieve this goal, labeled samples are obtained through SPARQL queries, and are
transformed into vector space using embedding and feature engineering techniques.
      </p>
      <p>In this paper, we have implemented a system prototype OWLearner and
compared it with the state of art system DL-Learner on major benchmarks such as
DBpedia and YAGO for the DL axiom constructing task. Our experimental results show
that OWLearner outperformed DL-Learner in both time efficiency and the quality of
axioms.</p>
    </sec>
    <sec id="sec-2">
      <title>An Overview of OWLearner</title>
      <p>The framework and workflow of OWLearner are shown in the left and right
subfigure of Figure 1, respectively. The OWLearner contains five components as follows:
Data preprocessing This component specifies how to retrieve data and how to convert
various formats of RDF data to a unified format (e.g., N-Triples) so that most RDF
serialization formats are supported conveniently.</p>
      <p>
        Embedding This component embeds entities and relations into continuous vector spaces
as their features by employing effective embedding models, such as TransE [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The
selection mechanism of embedding models depends on the scores of benchmark
datasets. Then, embeddings of axioms are constructed by applying some feature
engineering methods based on the original embeddings.
      </p>
      <p>Sampling This component generates labeled samples for training, which fall into three
parts: positive samples, negative samples, and unknown samples to be further judged.
This procedure of generating samples consists of three steps: SPARQL querying,
statistical analysis, and OWL reasoning (rule-based and ontology reasoning).
Learning This component trains supervised learning models via positive/negative
samples and then predicts axioms in unknown samples by applying the trained models.
OWLearner supports most of the supervised learning models, and provides many
principal metrics such as accuracy (ACC) and Area Under ROC curve (AUC) to
evaluate these models. Moreover, we define some metrics to evaluate the learned
axioms, including Standard Confidence(SC), Head Coverage(HC) and Partial
Completeness Assumption(PCA) based Confidence(PCAconf ).</p>
      <p>Building This part constructs all axioms generated/predicted in our model as an OWL
ontology. OWLearner provides a plugin API which supports most off-the-shelf
ontology editors.</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Evaluation</title>
      <p>In this section, we evaluate OWLeaner on three data sets, namely, DBpedia, YAGO1k
(a fragment of YAGO containing all classes with over 1000 entities), and Chinese
Symptom Database (SIC) shown in Table 1. We conducted four sets of experiments
and explain them in detail as follows.</p>
      <sec id="sec-3-1">
        <title>Set 1. Suitability of Classifiers based on ACC and AUC In this set of experiments,</title>
        <p>for each of those 12 axiom patterns, we tested which machine learning model is most
effective based on two metrics. The results are shown in Table 3, which show that no
single learning model is most effective for all axiom patterns. This experiment provides
a guideline for selecting a suitable learning model for a given axiom pattern.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Set 2. Accuracy of learned axioms We used three metrics, namely, Standard Con</title>
        <p>fidence(SC), Head Coverage(HC) and PCA-based Confidence(PCA-conf), to test the
quality (accuracy) of learned axioms by OWLearner. The results are shown in Table 4,
which indicates that OWLearner can learn axioms with high quality.</p>
        <p>Set 3. Precision of learned axioms We used DBpedia as the benchmark to evaluate
the precision of OWLearner. As only axioms of patterns P1; : : : ; P6 allows in
DBpedia, the precisions for such axioms are obtained. The precision value for each axiom
pattern represents the proportion of the axioms that can be matched. The results are
shown in Table 2. The results show that relatively high precisions are obtained for
axiom patterns P1; P2; P5 and P6, but the precisions of P3 and P4 are low. A major reason
for this is that there is little data available for these two axiom patterns.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Set 4. Comparison of OWLearner with DL-Learner In this set of experiments, we</title>
        <p>compared the performance of OWLearner with DL-Learner based on four metrics
Runtime, Standard Confidence(SC), Head Coverage(HC) and PCA-based
Confidence(PCAconf). The results, shown in Table 5, indicate that the quality of learned axioms by
OWLearner are comparable to that by DL-Learner, although OWLearner is superior
to DL-Learner in terms of HC degree. The major advantage of OWLearner is time
efficiency. OWLearner does not need to specify a class name but DL-Learner requires
to specify a target class before axioms can be learned.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>We have proposed a novel method of learning axioms for OWL/DL axioms. The method
is based on the technique of embedding in representation learning. Based on the
proposed method, we have implemented a system OWLearner for automatic axiom
extraction in OWL ontologies. Our experiments show that OWLearner is much more
efficient than DL-Learner, state of the art system for ontology axiom learning, and the
quality of learned axioms for these two methods is comparable.
1.0
1.0
1.0
1.0
1.0
0.89
0.87
0.8
0.78
0.89
1.0
1.0
1.0
1.0
1.0</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Natural Science Foundation of China (61502336),
the National Key R&amp;D Program of China (2016YFB1000603,2017YFC0908401), and
the Seed Foundation of Tianjin University (2018XZC-0016).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. L. Bu¨hmann, J. Lehmann, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Westphal.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>DL-Learner: A framework for inductive learning on the Semantic Web</article-title>
          . J. Web Sem.,
          <volume>39</volume>
          :
          <fpage>15</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Muggleton</surname>
          </string-name>
          , L.De Raedt,
          <string-name>
            <given-names>D.</given-names>
            <surname>Poole</surname>
          </string-name>
          , I. Bratko, P. Flach. (
          <year>2012</year>
          )
          <article-title>ILP turns 20</article-title>
          .
          <source>J. Machine Learning</source>
          .,
          <volume>86</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Nickel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tresp</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Gabrilovich.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>A review of relational machine learning for knowledge graphs</article-title>
          .
          <source>J. Proceedings of the IEEE</source>
          ,
          <volume>104</volume>
          (
          <issue>1</issue>
          ):
          <fpage>11</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. L. Bu¨hmann,
          <string-name>
            <surname>J. Lehmann.</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>Pattern based knowledge base enrichment</article-title>
          .
          <source>Proc. ISWC'13</source>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Yakhnenko.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          .
          <source>Proc. of NIPS'13</source>
          , pp.
          <fpage>2787</fpage>
          -
          <lpage>2795</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>