<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GMap: Results for OAEI 2015</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Weizhuo Li</string-name>
          <email>liweizhuo@amss.ac.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qilin Sun</string-name>
          <email>sunqilin@amss.ac.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Mathematics,Academy of Mathematics and Systems Science, Chinese Academy of Sciences</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">P. R. China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>GMap is an alternative probabilistic scheme for ontology matching, which combines the sum-product network and the noisy-or model. More precisely, we employ the sum-product network to encode the similarities based on individuals and disjointness axioms. The noisy-or model is utilized to encode the probabilistic matching rules, which describe the influences among entity pairs across ontologies. In this paper, we briefly introduce GMap and its results of four tracks (i.e.,Benchmark, Conference, Anatomy and Ontology Alignment for Query Answering) on OAEI 2015.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Presentation of the system</title>
      <sec id="sec-2-1">
        <title>State, purpose, general statement</title>
        <p>
          The state of the art approaches have utilized probabilistic graphical models [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] for
ontology matching such as OMEN [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], iMatch [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and CODI [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. However, few of them
can keep inference tractable and ensure no loss in inference accuracy. In this paper, we
propose an alternative probabilistic scheme, called GMap, combining the sum-product
network (SPN) and the noisy-or model [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Except for the tractable inference, these
two graphical models have some inherent advantages for ontology matching. For
SPN, even if the knowledge such as individuals or disjointness axioms is missing, SPN
can also calculate their contributions by the maximum a posterior (MAP) inference.
For the noisy-or model, it is a reasonable approximation for incorporating probabilistic
matching rules to describe the influences among entity pairs.
        </p>
        <p>
          Figure 1 shows the sketch of GMap. Given two ontologies O1 and O2, we calculate
the lexical similarity based on edit-distance, external lexicons and TFIDF [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] with the
max strategy. Then, we employ SPN to encode the similarities based on individuals
and disjointness axioms and calculate the contribution through MAP inference. After
that, we utilize the noisy-or model to encode the probabilistic matching rules and the
value calculated by SPN. With one-to-one constraint and crisscross strategy in the refine
module, GMap obtains initial matches. The whole matching procedure is iterative. If
there is no additional matches identified, the matching is terminated.
1.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Specific techniques used</title>
      </sec>
      <sec id="sec-2-3">
        <title>The similarities based on individuals and disjointness axioms In open world as</title>
        <p>sumption, individuals or disjointness axioms are missing at times. Therefore, we define
O2</p>
        <p>Computing
lexical
similarity</p>
        <p>Using SPN
to encode
individuals and
disjointness
axioms</p>
        <p>Using Noisy-Or
model to encode
probabilistic
matching rules</p>
        <p>Refining
matches
Yes</p>
        <p>No</p>
        <p>O2
a special assignment—”U nknown” of the similarities based on these individuals and
disjointness axioms.</p>
        <p>For individuals, we employ the string equivalent to judge the equality of them. When
we calculate the similarity of concepts based on individuals across ontologies, we
regard individuals of each concept as a set and use Ochiai coefficient1 to measure the
value. We use a boundary t to divide the value into three assignments (i.e., 1, 0 and
U nknown). Assignment 1 (or 0) means that the pair matches (or mismatches). If the
value ranges between 0 and t or the individuals of one concept are missing, the
assignment is U nknown.</p>
        <p>For disjointness axioms, we utilize these axioms and subsumption relations within
ontologies and define some rules to determine assignments of similarity. For example,
x1, y1 and x2 are concepts that come from O1 and O2. If x1 matches x2 and x1 is
disjoint with y1, then y1 is disjoint with x2 as well as their descendants. The similarity also
have three assignments. Assignment 1 (or 0) means the pair mismatches (or overlaps).
If all the rules are not satisfied, the assignment is U nknown.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Using SPN to encode the simialrities based on individuals and disjointness axioms</title>
        <p>
          Sum-Product Network is a directed acyclic graph with weighted edges, where variables
are leaves and internal nodes are sums and products [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. As shown in Figure 2, we
designed a sum-product network S to encode above similarities and calculate the
contributions. All the leaves, called indicators, are binary-value. M represents the contribution
of individuals and disjointness axioms and indicators M1, M2, M3 comprise the
assignments of it. M1 = 1 (or M2 = 1) means that the contribution is positive (or negative).
If M3 = 1, the contribution is U nknown. Similarly, Indicators D0; D1; I1; I2; I3
correspond to assignments of the similarities based on individuals and disjointness axioms.
The concrete assignment metrics are listed in Table 1–2 and the assignment metric of
M is similar to the metric of similarity D.
1 https://en.wikipedia.org/wiki/Cosine similarity
·
D0
×
+
· ·
I1 I2
+
×
+
·
I3
+
×
+
·
M1
        </p>
        <p>SPN |= (I ? M |D1)
×</p>
        <p>+
+
·
M2
·
D1
·
M3</p>
        <p>
          With the MAP inference in SPN [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], we can obtain the indicators’ value of
contribution M . The MAP inference has three steps. Firstly, replace sum nodes with max nodes.
Secondly, with the bottom-up method, each max node can get a maximum weighted
value. Finally, the downward pass starts from the root node and recursively selects the
highest-value child of each max node, then the indicators’ value of M are obtained.
Moreover, even if individuals or disjointness axioms are missing at times, We can also
calculate the contribution M by MAP inference. Assumed I = 1, D = U nknown for
one pair, then we can obtain I1 = 1; I2 = 0; I3 = 0; D0 = 1; D1 = 1 with defined
similarities and assignment metrics of SPN. As contribution M is not given, so we need
to set M1 = 1; M2 = 1; M3 = 1. After MAP inference, we observe M1 = 1 which
means that the contribution is positive. Moreover, it is able to infer D0 = 1, which
means the pair overlaps.
        </p>
        <p>
          As the network S is complete and decomposable, the inference in S can be
computed in time linear in the number of edges [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. So MAP inference is tractable.
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Combining the lexical similarity and the contribution calculated by SPN Consider</title>
        <p>ing the range of lexical similarity, we define a scaling factor to limit the contribution
of lexical similarity. It can help us to analyze the sources from different contributions.
The SPN-based similarity (S0) is defined in Eqs 1, which is calculated according to the
indicators’ value of M and D.</p>
        <p>S0(x1; x2) =
80
&gt;
&gt;
&gt;
&lt;
&gt;
&gt;
&gt;
:
lexSim(x1; x2) +
lexSim(x1; x2)
lexSim(x1; x2)</p>
        <p>M2 = 1; D1 = 1
M1 = 1; D0 = 1
M2 = 1; D0 = 1
M3 = 1; D0 = 1
(1)
where is a contribution factor that represents the contribution based on disjointness
axioms and individuals. If contribution is positive (negative) and pair overlaps, the
SPNbased similarity is equal to the scaled lexical similarity adding (subtracting) . If the
contribution is U nknown and pair overlaps, the SPN-based similarity is equal to the
scaled lexical similarity. If the pair mismatches, then the inferred contribution is negative
and the SPN-based similarity is equal to 0.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Using Noisy-Or model to encode probabilistic matching rules As listed in Table</title>
        <p>3, we utilize probabilistic matching rules to describe the influences among the related
pairs across ontologies.</p>
        <p>ID
R1
R2
R3
R4
R5
R6</p>
        <p>Probabilistic matching rules
two classes probably match if their fathers match
two classes probably match if their children match
two classes probably match if their siblings match
two classes about domain probably match if related
objectproperties match and range of these property match
two classes about range probably match if related
objectproperties match and domain of these properties match
two classes about domain probably match if related
dataproperties match and value of these properties match</p>
        <p>
          Considering the matching probability of one pair, we observe that the condition
of each rule has two value (i.e., T or F) and all the matching rules are independent
of each other approximately. Moreover, all of them benefit to improving the matching
probability of this pair. Therefore, we utilize the noisy-or model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] to encode them.
        </p>
        <p>S0</p>
        <p>R1
S1</p>
        <p>R2
S2</p>
        <p>
          OR
There are two kinds of parameters that need be set. one mainly comes from
networks and it is set manually based on some considerations [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The others are adapted by
I3CON data set2 such as scaling factor ( ), contribution factor ( ) in Eqs 1 and
threshold ( ). Nevertheless, we do not make any specific adaptation for OAEI 2015 evaluation
campaign and all parameters are the same for different tracks.
In this section, we present the results of GMap achieved on OAEI 2015. Our system
mainly focuses on Benchmark, Anatomy, Conference. Adding to that, we also present
the results of the test Ontology Alignment for Query Answering which not follow the
classical ontology alignment evaluation on the SEALS platform.
The goal of Benchmark is to evaluate the matching systems in scenarios where the input
ontologies lack important information. Table 4 summarizes the average results3 of it.
        </p>
        <p>GMap had a good performance in biblio, ranking third in F-measure, because it
makes use of the string resource such as identifiers, labels and comments. Specially in
ontologies 201–210 of biblio, as the mapping concepts have the same group of
individuals but different names, SPN can play a role in improving the alignment quality of
GMap.
2 http://www.atl.external.lmco.com/projects/ontology/i3con.html
3 The new test set about energy exists some troubles.
2.2</p>
      </sec>
      <sec id="sec-2-7">
        <title>Anatomy</title>
        <p>The Anatomy track consists of finding an alignment between the Adult Mouse Anatomy
(2744 classes) and a part of the NCI Thesaurus (3304 classes) describing the human
anatomy. The results are shown in Table 5.</p>
        <p>GMap ranked fifth in Anatomy track. We analyze that GMap does not concentrate
on language techniques such as the abbreviations and emphasizes one-to-one
constraint. Both of them may cause a low recall. In addition, these top-ranked systems employ
alignment debugging techniques, which is helpful to improve alignment quality.
However, we do not employ these techniques in the current version.
2.3</p>
      </sec>
      <sec id="sec-2-8">
        <title>Conference</title>
        <p>Conference track contains sixteen ontologies from the conference organization domain.
There are two versions of reference alignment. The original reference alignment is
labeled as RA1, and the new reference alignment, generated as a transitive closure
computed on the original reference alignment, is labeled as RA2. Table 6 shows the results
of our system in this track.</p>
        <p>
          For Conference track, GMap ranked sixth of the 14 participants, which outperforms
others in recall except AML but its precision is lower than them. There are mainly two
reasons. One is the lexical similarity which combines the similarities based on
editdistance, external lexicons and TFIDF with the max strategy. The other is the noisy-or
model which is hard to describe the negative effect on pairs matching [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Both of them
would retain some false positive matches after matching finished. Specially in property
pairs, even though their domains and ranges mismatch, GMap can not describe this
negative impact. Therefore, employing alignment debugging techniques are comparatively
ideal method solutions to deal with this problem.
The aims of OA4QA are investigating the effects of logical violations affecting
computed alignments and evaluating the effectiveness of repair strategies employed by the
matchers. In the OAEI 2015 the ontologies and reference alignment (RA1) are based
on the conference track. RAR1 is a repaired version of RA1 different from RA2 in the
conference track. The table 7 presents the results for the whole set of queries.
        </p>
        <p>Since GMap did not consider mapping repair techniques, it was only able to answer
half of queries, which influenced the obtained precision and recall at last.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>General comments</title>
      <p>3.1</p>
      <sec id="sec-3-1">
        <title>Comments on the results</title>
        <p>
          GMap achieved qualified results in its first participation in OAEI, which is competitive
with other systems in some tracks such as Benchmark, Conference, Anatomy. Both of
the employed graphical models are able to improve the quality of alignment in terms
of the defined lexical similarity [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Most improvements are attributed to the noisy-or
model because it makes use of rich relations specified in ontologies such as in Anatomy
track. If there are some individuals and disjointness axioms declared in ontologies, SPN
will work such as biblio (201–210) in Benchmark track. More importantly, Combining
SPN and the noisy-or model is able to increase precision and recall further.
        </p>
        <p>However, some weaknesses still remain. For example, the alignment incoherence
of GMap is unsolved, which influences the performance of GMap. In addition, it is
important for us to consider the efficiency of GMap such as running time and memory
usage for large-scale mapping problems.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Discussions on the way to improve the proposed system</title>
        <p>GMap still has a lot of room for improvement. Employing alignment debugging
techniques are able to solve the alignment incoherent and reduce some false positive
matches in alignment such as the pair fConference: has members, edas: hasMemberg in
Conference track. In addition, seeking available data sets to learn parameters of the
sumproduct network and the noisy-or model is also one direction of our future works.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we have presented GMap and its results of four tracks (i.e.,Benchmark,
Conference, Anatomy and Ontology Alignment for Query Answering) on OAEI 2015.
The results show that GMap is competitive with the top-ranked systems in some tracks
by means of combining some special graphical models (i.e.,SPN, Noisy-or model). On
the other hand, for those disadvantages exposed, we discuss the possible solutions. In
the future, we would like to participate in more tracks and hope to efficiently solve the
instance matching and large biomedical ontologies matching challenges.
Acknowledgments. This research was partly supported by the Natural Science
Foundation of China (No. 61232015), the National Key Research and Development Program
of China (Grant No. 2002CB312004), the Knowledge Innovation Program of the
Chinese Academy of Sciences, Key Lab of Management, Decision and Information
Systems of CAS, Institute of Computing Technology of CAS, and the Key Laboratory of
Multimedia and Intelligent Software at Beijing University of Technology.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Albagli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben-</surname>
            Eliyahu-Zohary,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shimony</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          :
          <article-title>Markov network based ontology matching</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          <volume>78</volume>
          (
          <issue>1</issue>
          ) (
          <year>2012</year>
          )
          <fpage>105</fpage>
          -
          <lpage>118</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Characterizing the semantic web on the web</article-title>
          .
          <source>In: The Semantic Web-ISWC 2006</source>
          . Springer (
          <year>2006</year>
          )
          <fpage>242</fpage>
          -
          <lpage>257</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Ontology Matching. Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Learning the structure of sum-product networks</article-title>
          .
          <source>In: Proceedings of The 30th International Conference on Machine Learning</source>
          . (
          <year>2013</year>
          )
          <fpage>873</fpage>
          -
          <lpage>880</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
          </string-name>
          , N.:
          <article-title>Probabilistic graphical models: principles and techniques</article-title>
          . MIT press (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>Combining sum-product network and noisy-or model for ontology matching</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaiswal</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          :
          <article-title>Omen: A probabilistic ontology mapping tool</article-title>
          .
          <source>In: The Semantic Web-ISWC 2005</source>
          . Springer (
          <year>2005</year>
          )
          <fpage>537</fpage>
          -
          <lpage>547</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Niepert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
          </string-name>
          , H.:
          <article-title>A probabilistic-logical framework for ontology matching</article-title>
          . In: AAAI,
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Poon</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domingos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Sum-product networks: A new deep architecture</article-title>
          .
          <source>In: Computer Vision Workshops (ICCV Workshops)</source>
          ,
          <source>2011 IEEE International Conference on, IEEE</source>
          (
          <year>2011</year>
          )
          <fpage>689</fpage>
          -
          <lpage>690</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>