<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Combining Sum-Product Network and Noisy-Or Model for Ontology Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Weizhuo Li</string-name>
          <email>liweizhuo@amss.ac.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Mathematics,Academy of Mathematics and Systems Science, Chinese Academy of Sciences</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">P. R. China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology matching is the key challenge to achieve semantic interoperability in building the Semantic Web. We present an alternative probabilistic scheme, called GMap, which combines the sum-product network and the noisy-or model. More precisely, we employ the sum-product network to encode the similarities based on individuals and disjointness axioms across ontologies and calculate the contributions by the maximum a posterior inference. The noisy-or model is used to encode the probabilistic matching rules, which are independent of each other as well as the value calculated by the sum-product network. Experiments show that GMap is competitive with many OAEI top-ranked systems. Futhermore, GMap, bene ted from these two graphical models, can keep inference tractable in the whole matching process.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Ontology matching is the process of nding relationships or correspondences
between entities of di erent ontologies[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Many e orts have been conducted to
automate the discovery in this process, e.g., incorporating more elaborate
approaches including scaling strategies[
        <xref ref-type="bibr" rid="ref3 ref6">3, 6</xref>
        ], ontology repair techniques to ensure
the alignment coherence[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], employing machine learning techniques[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], using
external resources to increase the available knowledge for matching[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and utilizing
probabilistic graphical models to describe the related entities[
        <xref ref-type="bibr" rid="ref1 ref10 ref11">1, 10, 11</xref>
        ].
      </p>
      <p>
        In this paper, we propose an alternative probabilistic schema, called GMap,
based on two special graphical models|sum-product network(SPN) and
noisyor model. SPN is a directed acyclic graph with variables as leaves, sums and
products as internal nodes, and weighted edges[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. As it can keep inference
tractable and describe the context-speci c independence[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], we employ it to
encode the similarities based on individuals and disjointness axioms and
calculate the contributions by the maximum a posterior inference. Noisy-or model
is a special kind of Bayesian Network[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. When the factors are independent of
each other, it is more suitable than other graphical models, specially in the
inference e ciency[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Hence, we utilize it to encode the probabilistic matching
rules. Thanks to the tractable inference of these special graphical models, GMap
can keep inference tractable in the whole matching process. To evaluate GMap,
we adopt the data sets from OAEI ontology matching campaign. Experimental
results indicate that GMap is competitive with many OAEI top-ranked systems.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>
        In this section, we brie y introduce our approach. Given two ontologies O1 and
O2, we calculate the lexical similarity based on edit-distance, external lexicons
and TFIDF[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Then, we employ SPN to encode the similarities based on
individuals and disjointness axioms and calculate the contributions. After that, we
utilize the noisy-or model to encode the probabilistic matching rules and the
value calculated by SPN. With one-to-one constraint and crisscross strategy in
the re ne module, GMap obtains initial matches. The whole matching procedure
is iterative. If it does not produce new matches, the matching is terminated.
2.1
      </p>
      <p>Using SPN to encode individuals and disjointness axioms
In open world assumption, individuals or disjointness axioms are missing at
times. Therefore, we de ne a special assignment|"U nknown" for the
similarities based on these individuals and disjointness axioms.</p>
      <p>For the similarity based on individuals, we employ the string equivalent to
judge the equality of them. When we calculate the similarity of concepts based
on individuals across ontologies, we regard individuals of each concept as a set
and use Ochiai coe cient1 to measure the value. We use a boundary t to divide
the value into three assignments(i.e., 1, 0 and U nknown). Assignment 1(or 0)
means that the pair matches(or mismatches). If the value ranges between 0 and
t or the individuals of one concept are missing, the assignment is U nknown.</p>
      <p>For the similarity based on disjointness axioms, we utilize these axioms and
subsumption relations within ontologies and de ne some rules to determine its
value. For example, x1, y1 and x2 are concepts that come from O1 and O2. If x1
matches x2 and x1 is disjoint with y1, then y1 is disjoint with x2. The similarity
also have three assignments. Assignment 1(or 0) means the pair mismatches(or
overlaps). Otherwise, the similarity based on disjointness axioms is U nknown.</p>
      <p>SPN |= (I ! M |D0)</p>
      <p>×
·
D0
×
+
+
×
+
×
+ +
· · ·
I1 I2 I3
·
M1</p>
      <p>SPN |= (I ? M |D1)
×</p>
      <p>+
+
·
M2
·
D1
·</p>
      <p>M3</p>
      <p>As shown in Figure 1, we designed a sum-product network S to encode
above similarities and calculate the contributions, where M represents the
contributions and leaves M1, M2, M3 are indicators that comprise the
assignments of M . All the indicators are binary-value. M1 = 1(or M2 = 1) means
that the contributions are positive(or negative). If M3 = 1, the contributions
1 https://en.wikipedia.org/wiki/Cosine similarity
are U nknown. Leaves I1, I2, I3, D0, D1 are also binary-value indicators that
correspond to the assignments of similarities based on individuals(I) and
disjointness axioms(D). The concrete assignment metrics are listed in Table 1{2.</p>
      <p>Probabilistic matching rules
two classes probably match if their fathers match
two classes probably match if their children match
two classes probably match if their siblings match
two classes about domain probably match if related
objectproperties match and range of these property match
two classes about range probably match if related
objectproperties match and domain of these properties match
two classes about domain probably match if related
dataproperties match and value of these properties match</p>
      <p>When we focus on calculating the matching probability of one pair, the
matching rules are independent of each other as well as the value calculated
by SPN. Therefore, we utilize the noisy-or model to encode them.</p>
      <p>S0</p>
      <p>R1
S1</p>
      <p>
        R2
S2
probability of one pair, P (S = 1jS0; R1; :::; R6), is calculated according to the
formulas in the lower-right corner. ci is the count of satis ed Ri and sigmoid
function f (ci) is used to limit the upper bound of contribution of Ri. As the
inference in the noisy-or model can be computed in time linear in size of nodes[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
GMap can keep inference tractable in the whole matching process.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>To evaluate our approach2, we adopt three tracks(i.e., Benchmark, Conference
and Anatomy) from OAEI ontology matching campaign in 20143.
3.1</p>
      <p>Comparing against the OAEI top-ranked systems
Table 4 shows a comparison of the matching quality of GMap and other OAEI
top-ranked systems, which indicates that GMap is competitive with these
promising existent systems. For Anatomy track, GMap does not concentrate on
language techniques and it emphasizes one-to-one constraint. Both of them may
cause a low alignment quality. In addition, all the top-ranked systems employ
alignment debugging techniques, which is helpful to improve the quality of
alignment. However, we do not employ these techniques in the current version.
We separate SPN and the noisy-or model from GMap and evaluate their
contributions respectively. As listed in Table 5, SPN is suitable to the matching task
that the linguistic levels across ontologies are di erent and both of ontologies use
same individuals to describe the concepts such as Biblio(201{210) in Benchmark
track. Thanks to the contributions of individuals and disjointness axioms, SPN
can improve the precision of GMap. When the structure information is very rich
across the ontologies, the noisy-or model is able to discover some hidden
matches with the existing matches and improve the recall such as in Anatomy track.
However, if the ontology does not contain above features such as in Conference
track, the improvement is not evident. Nevertheless, thanks to the
complementary of these two graphical models to some extent, combining the sum-product
network and the noisy-or model can improve the alignment quality as a whole.
2 The software and results are available at https://github.com/liweizhuo001/GMap.
3 http://oaei.ontologymatching.org/2014/
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>We have presented GMap, which is suitable for the matching task that many
individuals and disjointness axioms are declared or the structure information
is very rich. However, it still has a lot of room for improvement. For example,
language techniques is essential to improve the quality of initial matches. In
addition, dealing with alignment incoherent is also one of our future works.
Acknowledgments. This work was supported by the Natural Science Foundation
of China (No. 61232015). Many thanks to Songmao Zhang, Qilin Sun and Yuanyuan
Wang for their helpful discussion on the design and implementation of the GMap.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Albagli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben-</surname>
            Eliyahu-Zohary,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shimony</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          :
          <article-title>Markov network based ontology matching</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          <volume>78</volume>
          (
          <issue>1</issue>
          ),
          <volume>105</volume>
          {
          <fpage>118</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Experience in aligning anatomical ontologies</article-title>
          .
          <source>International journal on Semantic Web and information systems 3(2)</source>
          ,
          <volume>1</volume>
          {
          <fpage>26</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Djeddi</surname>
          </string-name>
          , W.E, Khadir, M.T.:
          <article-title>XMAP: a novel structural approach for alignment of OWL-full ontologies</article-title>
          .
          <source>In: Proc. of Machine and Web Intelligence(ICMWI)</source>
          . pp.
          <volume>368</volume>
          {
          <issue>373</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madhavan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhamankar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.:
          <article-title>Learning to match ontologies on the semantic web</article-title>
          .
          <source>The VLDB Journal</source>
          <volume>12</volume>
          (
          <issue>4</issue>
          ),
          <volume>303</volume>
          {
          <fpage>319</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <source>Ontology Matching(2nd Edition)</source>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Faria</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pesquita</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al.:
          <article-title>The agreementmakerlight ontology matching system In: 2013 OTM Conferences</article-title>
          . pp.
          <volume>527</volume>
          {
          <issue>541</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Learning the structure of sum-product networks</article-title>
          .
          <source>In: Proc. of International Conference on Machine Learning(ICML)</source>
          . pp.
          <volume>873</volume>
          {
          <issue>880</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>LogMap: Logic-based and scalable ontology matching</article-title>
          .
          <source>In: Proc. of International Semantic Web Conference(ISWC)</source>
          . pp.
          <volume>273</volume>
          {
          <issue>288</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
          </string-name>
          , N.:
          <article-title>Probabilistic Graphical Models</article-title>
          . MIT press (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaiswal</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          <string-name>
            <surname>OMEN</surname>
          </string-name>
          :
          <article-title>A probabilistic ontology mapping tool</article-title>
          .
          <source>In: Proc. of International Semantic Web Conference(ISWC)</source>
          . pp.
          <volume>537</volume>
          {
          <issue>547</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Niepert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noessner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
          </string-name>
          , H.:
          <article-title>Probabilistic-logical web data integration</article-title>
          .
          <source>Reasoning Web</source>
          . pp.
          <volume>504</volume>
          {
          <issue>533</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Poon</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Sum-product networks: A new deep architecture</article-title>
          .
          <source>In: Proc of International Conference on Computer Vision Workshops(ICCV Workshops)</source>
          . pp.
          <volume>689</volume>
          {
          <issue>690</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>