<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IAMA Results for OAEI 2013</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuanzhe Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuepeng Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shizhu He</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kang Liu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jun Zhao</string-name>
          <email>jzhaog@nlpr.ia.ac.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xueqiang Lv</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beijing Key Laboratory of Internet Culture and Digital Dissemination Research</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Automation, Chinese Academy of Sciences</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the results of IAMA on OAEI 2013. IAMA (Institute of Automation's Matcher) is an ontology matching system with the capability to deal with large scale ontologies. IAMA is designed to find out the correspondences between two ontologies by using multiple similarity measures. Candidate filtering technique is adopted when processing ontologies at large scale.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Presentation of the system</title>
      <sec id="sec-2-1">
        <title>State, purpose, general statement</title>
        <p>
          Large amount of ontologies has been published since the semantic web emerged.
However, managing the heterogeneity among various ontologies is still a problem [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. For
example, many ontologies describe the same entity (i.e., class or property) using
different terminologies, while the entities having the same name belonging to different
ontologies may refer to disparate objects. Finding the matching between different
ontologies is still challenging. Ontology matching, as a solution to the aforementioned
problem, has received great interests in these years.
        </p>
        <p>The principal goal of IAMA is to discover equivalent entities rapidly between
different ontologies. We use efficient terminology matching techniques and do not turn
to any external resource at this stage. IAMA is able to match classes and properties
of two input ontologies. The system could achieve qualified results, though neglecting
the structural information. The Matching process takes little time to cope with small
ontologies. When processing large scale ontologies, IAMA could still, with the help of
candidate filtering, yield the alignment in reasonable time. We tend to make an universal
and extensible system, so more matching methods could be conveniently incorporated
in the future.
1.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Specific techniques used</title>
        <p>IAMA employs various similarity measures to take advantage of the available
information in the ontologies. The entities in two ontologies are pairwise compared, and lexical
similarities and structural similarities are calculated respectively. In the current version,
only 1:1 alignment is considered.</p>
        <p>Let O1 and O2 denote the two input ontologies, and e1 is an entity in O1. Each
entity e2 in O2 has a similarity with e1 indicated as sim(e1; e2). We are able to find out the
maximum value as sim(e1; e^). If sim(e1; e^) is greater than a predetermined threshold
t1, entity pair (e1, e^) will be added to the alignment. In the following paragraphs, we
will present the used similarity measures in our system.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Lexical Similarity</title>
        <p>The system extracts local names, labels, and comments of the entities in the two
input ontologies as lexical features. For most situations, the lexical information is
effective.</p>
        <p>Local Name similarity measures the similarity between the names of two entities.
We get rid of the spaces and other punctuations because the entity name is comprised
of multiple words or contains hyphens at times. All the letters are turned to lower case
simultaneously. Label Similarity measures the similarity between the labels. Not all the
entities have labels, and many entities have a label exactly the same as its local name.
Comment Similarity measures the similarity between the comments. A comment of an
entity is usually a brief descriptive sentence, which is helpful when the two ontologies
name their entities with quite different style. Both labels and comments are processed
as local names, thereby treated as a single word.</p>
        <p>
          IAMA uses Levenshtein [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] distance, which is proved competent in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], to calculate
lexical similarities. For the three lexical similarities mentioned above, we do not take
them equally. Each similarity is assigned a weight intuitively. Local name similarity has
a greater weight than label similarity, while comments similarity has the lowest weight.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Individual Similarity</title>
        <p>Between the classes that have individuals, Individual Similarity is additionally
calculated. The names of individuals that belong to a class are extracted to a set of string.
Assume S1 and S2 are two sets, then the similarity between them is computed as
follows:
sim(S1; S2) = 2
#(S1 \ S2)
#S1 + #S2
(1)
For example, if c1 is a class in ontology O1, and c2 is a class in ontology O2. The names
of the individuals belonging to c1 is a set of string i1 = fs1; s2; s3g, and similarly we
get i2 = fs2; s3; s4; s5g. The individual similarity simi(c1; c2) is:
simi(c1; c2) = 2
#(i1 \ i2)
#i1 + #i2
= 2</p>
        <p>IAMA adopts the maximum value of all the similarities as the final similarity of
the entity pair. It is worth noting that other similarities such as superclass similarity,
subclass similarity, domain similarity and range similarity are also tested in our
earlier attempts. But they contributed little considering the time increased. They could be
added easily if needed, which makes IAMA extensible.</p>
        <p>Pairwise compare is time consuming. In most cases, calculating similarities for
every entity pair is unnecessary. Candidate Filtering helps to find out a few promising
entity pairs in advance, thus saving running time dramatically.</p>
        <p>Assume the two input ontologies are O1 and O2, and O2 has more entities than
O1. For each entity in O1, we attempt to find out potential entities in O2 to construct
a candidate set. The idea is implemented as follow. First, the lexical information in
the bigger ontology O2, namely name, label and comment is tokenized and indexed
by Lucene3. Second, we construct search query for each entity in O1. For instance,
the lexical information of an entity in O1 is ”Reference”, ”Reference”, ”Base class for
all entries”. We split it into index tokens, and every single token is searched in the
constructed index, yielding top-k entities as a candidate set. Last, our system calculates
the final similarity values pairwise.</p>
        <p>The time used for indexing and searching is acceptable. For large input ontologies,
candidate filtering improves the matching speed substantially. Take anatomy track for
example, the difference can be seen in Table 1. The experiment is conducted on a
computer with 4.7GHz Intel i5 CPU (4 core) and 8GB RAM.</p>
        <p>IAMA without candidate filtering</p>
        <p>IAMA with candidate filtering</p>
        <p>Precision F-Measure Recall Runtime (ms)
0.994 0.719 0.563 117,503
0.995 0.713 0.555 5,376</p>
        <p>Candidate filtering could still miss some potential entity pairs though negligible.
IAMA defined an alterable trigger threshold t2, which is set to 500 empirically. Only
both the two ontologies have more than 500 entities, candidate filtering is employed.
There are two key parameters in IAMA (i.e., t1 and t2). Specifically, if the final
similarity of an entity pair is greater than t1, the pair will be added to the alignment. t2 is the
trigger threshold of candidate filtering component as mentioned before. In the version
to participate in OAEI 2013, t1 is set to 0.9 and t2 is set to 500.
3 http://lucene.apache.org
2.1</p>
        <p>
          benchmark
The goal of the benchmark data set is to provide a stable and detailed picture of each
algorithm[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The benchmark test library consists of several test suits. The test suites
are generated from the usual bibliography ontology this year, and they are blind to
participants. Table 2 shows the results of benchmark track. Pt F-m./s means the average
F-measure point provided per second.
Our system acquired its best results in this track. Concerning F-measure, IAMA ranked
fourth in the 21 systems. The comparison with other top systems is shown in Table 3
The task of anatomy track is to find the alignment between the Adult Mouse Anatomy
and a part of the NCI Thesaurus. These two ontologies describe the mouse anatomy and
the human anatomy respectively. The results of our system on anatomy are shown in
Table 4.
        </p>
        <p>Runtime Size Precision F-Measure Recall Recall Coherent
10 845 0.996 0.713 0.555 0.014
Since both the two ontologies have the scale larger than 500 entities, candidate filtering
is employed. As a result, IAMA finishes this track in 10 seconds. Only two
systems are faster than IAMA. The simple use of lexical similarity generates mostly trivial
correspondences, leading the low recall+ measure.
2.3</p>
        <p>conference
Conference track contains sixteen ontologies from the conference organization domain.
There are two versions of reference alignment. The original reference alignment is
labeled as ra1, and the new reference alignment, generated as a transitive closure
computed on the original reference alignment, is labeled as ra2. Table 5 shows the results
of our system in this track.</p>
        <p>IAMA finishes the conference track in 53 seconds. Candidate filtering has not been
activated.
2.4</p>
        <p>multifarm
The MultiFarm data set contains ontologies in eight different languages. These
ontologies are translated from conference track. IAMA does not design a multilingual method
specifically, thus obtained relatively poor results. We managed to utilize language
detection and translation API. Unfortunately, it increased the processing time of our system
and led to other problems. In the next version, IAMA will adopt specialized method to
deal with multilingual ontologies. The results are presented in Table 6.</p>
        <p>Average Precision Average F-Measure Average Recall
0.30 0.05 0.03
The task of library track is to match two real-world thesaurus, namely STW and TheSoz.
IAMA does not provide particular method aiming at this track. The results can be seen
in Table 7. IAMA does not apply particular method for this track.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.6 large biomedical ontologies</title>
        <p>Large Biomedical track challenges matching tools by offering large scale ontologies.
The task of this track is to find alignments between Foundational Model of Anatomy
(FMA), SNOMED CT, and the National Cancer Institute Thesaurus (NCI). IAMA
finishes the task in reasonable time owe to the use of candidate filtering. Table 8 shows the
results.</p>
        <p>Task 1: Small FMA and NCI fragments</p>
        <p>P F R #Mappings Runtime (s)
0.979 0.733 0.585 1,751 14
Task 2: Whole FMA and NCI ontologies</p>
        <p>P F R #Mappings Runtime (s)
0.901 0.708 0.582 1,894 139
Task 3: Small FMA and SNOMED fragments</p>
        <p>P F R #Mappings Runtime (s)
0.962 0.236 0.134 1,250 27
Task 4: Whole FMA and SNOMED ontologies</p>
        <p>P F R #Mappings Runtime (s)
0.749 0.227 0.134 1,600 218
Task 5: Small SNOMED and NCI fragments</p>
        <p>P F R #Mappings Runtime (s)
0.965 0.604 0.439 8,406 99
Task 6: Whole SNOMED and NCI ontologies</p>
        <p>P F R #Mappings Runtime (s)
0.917 0.593 0.439 8,843 207</p>
        <p>IAMA is one of the fifteen systems that are able to complete all six tasks, and
provides the best results in terms of precision in task 1 and task 2. Furthermore, our
system finishes all the tasks in 704 seconds, only slower than LogMapLt (371 seconds).
The average results are shown in Table 9.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>General comments</title>
      <sec id="sec-3-1">
        <title>Comments on the results</title>
        <p>IAMA achieved qualified results in its first participation in OAEI. The results for
benchmark, conference, and large biomedical track is better. Since the system does not design
specific method to handle MultiFarm and library track, the results are relatively poor. It
is evident that IAMA got relatively high precision but low recall. The reason is that the
threshold t1 is fixed to a high value of 0.9. Candidate filtering, as already mentioned,
cuts down the recall as well.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Discussions on the way to improve the proposed system</title>
        <p>IAMA remains much to be improved. First, the system does not take advantage of
structural information, which is beneficial when lack of lexical information. We tried
to calculate structural similarity like subclass similarity and superclass similarity, but
did not receive expected results. The hierarchy information is also remained to be
exploited. Second, predetermining all the parameters loses the flexibility. The influence
of parameter t1 can be seen in Table 10. The experiment is conducted on a computer
with 4.7GHz Intel i5 CPU (4 core) and 8GB RAM. A self-adjust mechanism is to be
employed in the future. Third, the system lacks the ability to match ontologies in
different languages. The next version will support multi-language inputs. We expect the
optimized system would become an eligible universal ontology matching system.
This paper has reported the results of IAMA in OAEI 2013. The results reflect that
IAMA has the ability to deal with a majority of ontologies, including large ones. On
the other hand, for those disadvantages exposed, we discuss the possible solutions. By
and large, IAMA achieved reasonable results for its first participation in OAEI, and it
is promising to be much improved in the future.
This work was supported by the National Natural Science Foundation of China (No.
61070106,61272332,61202329) and the Opening Project of Beijing Key Laboratory of
Internet Culture and Digital Dissemination Research(ICDD201201).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>P</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          and Je´roˆme Euzenat.
          <article-title>Ontology matching: State of the art and future challenges. Knowledge and Data Engineering</article-title>
          , IEEE Transactions on, (
          <volume>99</volume>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Vladimir</surname>
            <given-names>I</given-names>
          </string-name>
          <string-name>
            <surname>Levenshtein</surname>
          </string-name>
          .
          <article-title>Binary codes capable of correcting deletions, insertions and reversals</article-title>
          .
          <source>In Soviet physics doklady</source>
          , volume
          <volume>10</volume>
          , page 707,
          <year>1966</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Giorgos</given-names>
            <surname>Stoilos</surname>
          </string-name>
          , Giorgos Stamou, and
          <string-name>
            <given-names>Stefanos</given-names>
            <surname>Kollias</surname>
          </string-name>
          .
          <article-title>A string metric for ontology alignment</article-title>
          .
          <source>In The Semantic Web-ISWC</source>
          <year>2005</year>
          , pages
          <fpage>624</fpage>
          -
          <lpage>637</lpage>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Jose´ Luis Aguirre, Bernardo Cuenca Grau, Kai Eckert, Je´roˆme Euzenat, Alfio Ferrara, Robert Willem van Hague,
          <string-name>
            <surname>Laura Hollink</surname>
            , Ernesto Jimenez-Ruiz,
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>Andriy</given-names>
          </string-name>
          <string-name>
            <surname>Nikolov</surname>
          </string-name>
          , et al.
          <article-title>Results of the ontology alignment evaluation initiative 2012</article-title>
          .
          <source>In Proc. 7th ISWC workshop on ontology matching (OM)</source>
          , pages
          <fpage>73</fpage>
          -
          <lpage>115</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>