<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>YAM++ - Results for OAEI 2013</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>DuyHoa Ngo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zohra Bellahsene</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University Montpellier 2, LIRMM</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we briefly present the new YAM++ 2013 version and its results on OAEI 2013 campaign. The most interesting aspect of the new version of YAM++ is that it produces high matching quality results for large scale ontology matching tasks with good runtime performance on normal hardware configuration like PC or laptop without using any powerful server. Presentation of the system YAM++ - (not) Yet Another Matcher is a flexible and self-configuring ontology matching system for discovering semantic correspondences between entities (i.e., classes, object properties and data properties) of ontologies. This new version YAM++ 2013 has a significant improvement from the previous versions [6, 5] in terms of both effectiveness and efficiency, especially for very large scale tasks. The YAM++'s general architecture is not changed much. However, most of its algorithms have been updated/added by the new effective ones. For example, we have implemented a disk-based method for storing the temporary information of the input ontology during the indexing process in order to save main memor space. Consequently, the new version YAM++ 2013 has improved both the matching quality and time performance in large scale ontology matching tasks. This year, YAM++ participates in six tracks including Benchmark, Conference, Multifarm, Library, Anatomy and Large Biomedical Ontologies tracks. However, due to limitation of time and person, we have not upgrade YAM++ to participate to Interactive matching evaluation and Instance Matching tasks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
      <p>State, purpose, general statement
In YAM++ approach all useful information of entities such as terminological, structural
or contextual, semantic and extensional are exploited. For each type of extracted
information, a corresponding matching module has been implemented in order to discover
as many as possible candidate mappings.</p>
      <p>
        The major drawback of the previous version YAM++ 2012 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], despite the fact that
it achieved good results and high ranking at almost tracks, is very low time performance,
especially for the Large Biomedical Ontologies tracks. After carefully studying this
issue, we realize that our algorithms for pre-processing and indexing the input ontologies
lead to a high complexity of O(n2), where n is the size of the ontology. Additionally,
the semantic verification component did not work well for the very large scale ontology
matching task.
      </p>
      <p>In the current version YAM++ 2013, the flaws mentioned above have been
significantly fixed. Firstly, we have revised our algorithms for pre-processing and indexing the
input ontologies and now they are with O(knk+kvk) complexity, where n and v are the
number of nodes and edges of a Directed Acyclic Graph transformed from the ontology.
Moreover, we have implemented a disk-based method for storing the temporary
information of the input ontologes during the indexing process. This method allows us save
a significant space of main memory. this makes possible run YAM++ with very large
scale ontology matching in a personal computer with ordinary configuration (using 1G
JVM only).</p>
      <p>Secondly, we have introduced different inconsistent alignment patterns in order to
detect as much as possible conflict set. Then, a new and fast approximate algorithm has
been implemented to find the nearly optimization solution, which corresponds to the
final consistent alignment.
1.2</p>
      <p>Specific techniques used
In this section, we will briefly describe the workflow of YAM++ and its main
components, which are shown in Fig.1.</p>
      <p>r
e
d
a
o
L
y
g
o
l
o
t
n
O</p>
    </sec>
    <sec id="sec-2">
      <title>Annotation</title>
    </sec>
    <sec id="sec-3">
      <title>Indexing</title>
    </sec>
    <sec id="sec-4">
      <title>Structure</title>
    </sec>
    <sec id="sec-5">
      <title>Indexing</title>
    </sec>
    <sec id="sec-6">
      <title>Context</title>
      <p>Indexing
g
n
i
r
e
lit
F
e
r
P
e
t
a
d
i
d
n
a
C</p>
    </sec>
    <sec id="sec-7">
      <title>Similarity</title>
    </sec>
    <sec id="sec-8">
      <title>Computation</title>
    </sec>
    <sec id="sec-9">
      <title>Candidate</title>
    </sec>
    <sec id="sec-10">
      <title>Post-Filtering</title>
    </sec>
    <sec id="sec-11">
      <title>Similarity</title>
      <p>Propagation
n
o
it
a
c
iif
r
e
V
c
it
n
a
m
e
S</p>
      <p>In YAM++ approach, a generic workflow for a given ontology matching scenario is
as follows.
1. Input ontologies are loaded and parsed by a Ontology Loader component;
2. Information of entities in ontologies are indexed by the Annotation Indexing, the</p>
      <p>Structure Indexing and Context Indexing components;
3. Candidates Pre-Filtering component filters out all possible pairs of entities from
the input ontologies, whose descriptions are highly similar;
4. The candidate mappings are then passed into Similarity Computation
component, which includes: (i) the Terminological Matcher component that produces a
set of mappings by comparing the annotations of entities; (ii) the Instance-based
Matcher component that supplements new mappings through shared instances
between ontologies and (iii) the Contextual Matcher, which is used to compute the
similarity value of a pair of entities by comparing their context profiles. In YAM++,
the matching results of the Terminological Matcher, the Contextual Matcher and
the Instance-based Matcher are combined to have a unique set of mappings. We
call them element level matching result.
5. The Similarity Propagation component then enhances element level matching
result by exploiting structural information of entities; We call the result of this phase
structure level matching result.
6. The Candidate Post-Filtering component is used to combine and select the
potential candidate mappings from element and structure level results.
7. Finally, the Semantic Verification component refines those mappings in order to
eliminate the inconsistent ones.</p>
      <p>Let us now to present the specific features of each component.</p>
      <p>Ontology Loader To read and parse input ontologies, YAM++ uses OWLAPI open
source library. In addition, YAM++ makes use of (i) Pellet1 - an OWL 2 Reasoner in
order to discover hidden relations between entities in small ontology and (ii) ELK2
reasoner for large ontology. In this phase, the whole ontology is loaded in the main
memory.</p>
      <p>Annotation Indexing In this component, all annotations information of entities such as
ID, labels and comments are extracted. The languages used for representing annotations
are considered. In the case where input ontology use different languages to describe the
annotations of entities, a multilingual translator (Microsoft Bing) is used to translate
those annotations to English. Those annotations are then normalized by tokenizing into
set of tokens, removing stop words, and stemming. Next, the resulting tokens are
indexed in a table for future use.</p>
      <p>Structure Indexing In this component, the main structure information such as IS-A and
PAR-OF hierarchies of ontology are stored. In particular, YAM++ assigns a compressed
bitset values for every entity of the ontology. Through the bitset values of each entity,
YAM++ can fast and easily gets its ancestors, descendants, etc. A benefit of this method
is to easily access to the structure information of ontology and minimize memory for
storing it. After this step, the loaded ontology can be released to save main memory.
Context Indexing In this component, we define a context profile of an entity as a set
of three text corpora: (i) Entity Description includes annotation of the entity itself; (ii)
Ancestor Description comprises the descriptions of its ancestor and (iii) Descendant
Description comprises the descriptions of its descendant. Indexing those corpora is We
performed by Lucene indexing engine.</p>
      <p>Candidates Pre-Filtering The aim of this component is to reduce the computational
space for a given scenario, especially for the large scale ontology matching tasks. In
YAM++, two filters have been designed for the purpose of performing matching process
efficiently.
1 http://clarkparsia.com/pellet/
2 http://www.cs.ox.ac.uk/isg/tools/ELK/
– A Description Filter is a search-based filter, which filters out the candidate
mappings before computing the real similarity values between the description of
entities. Firstly, the descriptions of all entities in the bigger size ontology are indexed
by Lucene search engine. For each entity in the smaller size ontology, three
multiple terms queries corresponding to three description included in its context profile
will be performed. The top-K algorithm based on ranking score of those queries is
used to select the most similar entities.
– A Label Filter is used to fast detect candidate mappings, where the labels of
entities in each candidate mapping are similar or differ in maximum two tokens. The
intuition is that if two labels of two entities differ by more than three tokens, any
string-based method will produce a low similarity score value. Then, these entities
are highly unmatched.</p>
      <p>
        Similarity Computation The three matcher described in this component are the same as
in the YAM++ 2012 version. For more detail, we refer readers to our papers:
Terminological Matcher [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Instance-based Matcher [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A slight modification at the
Contextual Matcher is that we use algorithm described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for small ontology matching,
whereas we use the Lucene ranking score for large scale ontology matching.
Similarity Propagation This component is similar to the Structural Matcher
component described in YAM++ 2012 version. It contains two similarity propagation methods
namely Similarity Propagation and Confidence Propagation.
      </p>
      <p>
        – The Similarity Propagation method is a graph matching method, which
inherits the main features of the well-known Similarity Flooding algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
only difference is about transforming an ontology to a directed labeled graph. This
matcher is not changed from the first YAM++ version to the current version.
Therefore, for saving space, we refer to section Similarity Flooding of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for more
details.
– The Confidence Propagation method principle is as follows. Assume ha1; b1; ; c1i
and ha2; b2; ; c2i are two initial mappings, which are maybe discovered by the
element level matcher (i.e., the terminological matcher or instance-based matcher).
If a1 and b1 are ancestors of a2 and b2 respectively, then after running confidence
propagation, we have ha1; b1; ; c1 + c2i and ha2; b2; ; c2 + c1i. Note that,
confidence values are propagated only among collection of initial mappings.
      </p>
      <p>
        In YAM++, the aim of the Similarity Propagation method is discovering new
mappings by exploiting as much as possible the structural information of entities. This
method is used for a small scale ontology matching task, where the total number of
entities in each ontology is smaller than 1000. In contrary, the Confidence
Propagation method supports a Semantic Verification component to eliminate inconsistent
mappings. This method is mainly used in a large scale ontology matching scenario.
Candidates Post-Filtering The aim of the Mappings Combination and Selection
component is to produce a unique set of mappings from the matching results obtained by
the terminological matcher, instance-based matcher and structural matcher. In this
component, a Dynamic Weighted Aggregation method have been implemented. Given an
ontology matching scenario, it automatically computes a weight value for each matcher
and establishes a threshold value for selecting the best candidate mappings. The main
idea of this method can be seen in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for more details.
      </p>
      <p>Semantic Verification After running the similarity or confidence propagation on overall
candidate mappings, the final achieved similarity values reach a certain stability. Based
on those values, YAM++ is able to remove inconsistent mappings with more certainty.
There are two main steps in the Semantic Verification component such as (i)
identifying inconsistent mappings, and (ii) eliminating inconsistent mappings.</p>
      <p>
        In order to identify inconsistencies, several semantic conflict patterns have been
designed in YAM++ as follows (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for more detail):
– Two mappings ha1; b1i and ha2; b2i are crisscross conflict if a1 is an ancestor of a2
in ontology O1 and b2 is an ancestor of b1 in ontology O2.
– Two mappings ha1; b1i and ha2; b2i are disjointness subsumption conflict if a1 is
an ancestor of a2 in ontology O1 and b2 disjoints with b1 in ontology O2 and vice
versa.
– A property-property mapping hp1; p2i is inconsistent with respect to alignment A
if fDoms(p1) Doms(p2)g \ A = ; and fRans(p1) Rans(p2)g \ A = ;
then (p1; p2), where Doms(p) and Rans(p) return a set of domains and ranges of
property p.
– Two mappings ha; b1i and ha; b2i are duplicated conflict if the cardinality matching
is 1:1 (for a small scale ontology matching scenario) or the semantic similarity
SemSim(b1; b2) is less than a threshold value (for a large scale matching with
cardinality 1:m).
      </p>
      <p>
        Two methods, i.e., complete and approximate diagnosis are used in order to
eliminate inconsistent mappings. We use complete version Alcomo [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for small scale. In
term of approximate version for large scale, we transform this task into a Maximum
Weighted Vertex Cover problem. A modification of Clarkson algorithm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which is
a Greedy approach. The idea of this method is that it iteratively removes the mapping
with the smallest cost, which is computed by a ratio of its current confidence value to
number of its conflicts.
1.3
      </p>
      <p>Adaptations made for the evaluation
Before running the matching process, YAM++ analyzes the input ontologies and adapts
itself to the matching task. In particular, if the annotations of entities in input ontologies
are described by different languages, YAM++ automatically translates them in English.
If the number of entities in input ontologies is smaller than 1000, YAM++ is switched
to small scale matching regime, otherwise, it runs with large scale matching regime.
The main difference between the two regimes lies in the Similarity Propagation and
Semantic Verification components as we discussed above.
1.4</p>
      <p>Link to the system and parameters file
A SEALS client wrapper for YAM++ system and the parameter files can be download
at: http://www2.lirmm.fr/d˜ ngo/YAMplusplus2013.zip. See the instructions in tutorial
from SEALS platform3 to test our system.
3 http://oaei.ontologymatching.org/2013/seals-eval.html
1.5</p>
      <p>Link to the set of provided alignments (in align format)
The results of all tracks can be downloaded at: http://www2.lirmm.fr/d˜ngo/
YAMplusplus2013Results.zip.
2</p>
      <sec id="sec-11-1">
        <title>Results</title>
        <p>In this section, we present the evaluation results obtained by running YAM++ with
SEALS client with Benchmark, Conference, Multifarm, Library, Anatomy and
Large Biomedical Ontologies tracks. All experiments are executed by YAM++ with
SEALS client version 4.1 beta and JDK 1.6 on PC Intel 3.0 Pentium, 3Gb RAM,
Window XP SP3.
2.1</p>
        <p>Benchmark
In OAEI 2013, Benchmark includes 5 blind tests for both organizers and participants.
Those tests are regeneration of the bibliography test set. Table 1 shows the avergae
results of YAM++ running on the Benchmark dataset.
Conference track now contains 16 ontologies from the same domain (conference
organization) and each ontology must be matched against every other ontology. This track
is an open+blind, so in the Table 2, we can only report our results with respect to the
available reference alignments
The goal of the MultiFarm track is to evaluate the ability of matcher systems to deal
with multilingual ontologies. It is based on the OntoFarm dataset, where annotations
of entities are represented in different languages such as: English (en), Chinese (cn),
Czech (cz), Dutch (nl), French (fr), German (de), Portuguese (pt), Russian (ru) and
Spanish (es). YAM++’s results are showed in the Fig. 2
The Anatomy track consists of finding an alignment between the Adult Mouse Anatomy
(2744 classes) and a part of the NCI Thesaurus (3304 classes) describing the human
anatomy. Table 3 shows the evaluation result and runtime of YAM++ on this track.
The library track is a real-word task to match the STW (6575 classes) and the TheSoz
(8376 classes) thesaurus. Table 4 shows the evaluation result and runtime of YAM++
against an existing reference alignment on this track.</p>
        <p>Test set
Library</p>
        <p>Precision Recall Fmeasure
0.692 0.808 0.745
Table 4: YAM++ results on Library track</p>
        <p>Run times
411 (s)
2.6</p>
        <p>Large Biomedical Ontologies
This track consists of finding alignments between the Foundational Model of Anatomy
(FMA), SNOMED CT, and the National Cancer Institute Thesaurus (NCI). There are 9
sub tasks with different size of input ontologies, i.e., small fragment, large fragment and
the whole ontologies. Table 5 shows the evaluation results and run times of YAM++ on
those sub tasks.
3</p>
      </sec>
      <sec id="sec-11-2">
        <title>General comments</title>
        <p>This is the third time YAM++ participates to the OAEI campaign. We found that SEALS
platform is a very valuable tool to compare the performance of our system with the
others. Besides, we also found that OAEI tracks covers a wide range of heterogeneity in
ontology matching task. They are very useful to help developers/researchers to develop
their semantic matching system.
3.1</p>
        <p>Comments on the results
The current version of YAM++ has shown a significant improvement both in terms of
matching quality and runtime with respect to the previous version. In particular, the</p>
        <p>Test set
Small FMA - NCI</p>
        <p>Whole FMA - NCI
Small FMA - SNOMED
Whole FMA - SNOMED
Small SNOMED - NCI
Whole SNOMED - NCI
H-mean Fmeasure value of all the very large scale dataset (i.e., Library, Biomedical
ontologies) has been improved.
4</p>
      </sec>
      <sec id="sec-11-3">
        <title>Conclusion</title>
        <p>In this paper, we have presented our ontology matching system called YAM++ and
its evaluation results on different tracks on OAEI 2013 campaign. The experimental
results are promising and show that YAM++ is able to work effectively and efficiently
with real-world ontology matching tasks. In near future, we continue improving the
matching quality and efficiency of YAM++. Furthermore, we plan to deal with instance
matching track also.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Kenneth</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Clarkson</surname>
          </string-name>
          .
          <article-title>A modification of the greedy algorithm for vertex cover</article-title>
          .
          <source>Information Processing Letters</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>25</lpage>
          ,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] Sergey Melnik el at</article-title>
          .
          <article-title>Similarity flooding: A versatile graph matching algorithm and its application to schema matching</article-title>
          .
          <source>In ICDE</source>
          , pages
          <fpage>117</fpage>
          -
          <lpage>128</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Meilicke</surname>
          </string-name>
          .
          <article-title>Alignment incoherence in ontology matching</article-title>
          . In
          <source>PhD.Thesis</source>
          , University of Mannheim,
          <source>Chair of Artificial Intelligence</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>DuyHoa</given-names>
            <surname>Ngo</surname>
          </string-name>
          .
          <article-title>Enhancing ontology matching by using machine learning, graph matching and information retrieval techniques</article-title>
          .
          <source>In PhD.Thesis</source>
          , University Montpellier II,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>DuyHoa</given-names>
            <surname>Ngo and Zohra Bellahsene</surname>
          </string-name>
          .
          <article-title>Yam++ results for oaei 2012</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>OM</given-names>
          </string-name>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>DuyHoa</given-names>
            <surname>Ngo</surname>
          </string-name>
          , Zohra Bellahsene, and
          <string-name>
            <given-names>Remi</given-names>
            <surname>Coletta</surname>
          </string-name>
          .
          <article-title>Yam++ results for oaei 2011</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>OM</given-names>
          </string-name>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>DuyHoa</given-names>
            <surname>Ngo</surname>
          </string-name>
          , Zohra Bellahsene, and
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Todorov</surname>
          </string-name>
          .
          <article-title>Opening the black box of ontology matching</article-title>
          .
          <source>In ESWC</source>
          , pages
          <fpage>16</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>DuyHoa</given-names>
            <surname>Ngo</surname>
          </string-name>
          , Zohra Bellasene, and
          <string-name>
            <given-names>Remi</given-names>
            <surname>Coletta</surname>
          </string-name>
          .
          <article-title>A generic approach for combining linguistic and context profile metrics in ontology matching</article-title>
          .
          <source>In ODBASE Conference</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>