<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qian Zheng</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chao Shao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juanzi Li</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhichun Wang</string-name>
          <email>zcwang@bnu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Linmei Hu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beijing Normal University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tsinghua University</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the results of RiMOM2013 in the Ontology Alignment Evaluation Initiative (OAEI) 2013. We participated in three tracks of the tasks: Benchmark, IM@OAEI2013 , and Multifarm. We first describe the basic framework of our matching System (RiMOM2013); then we describe the alignment process and alignment strategies of RiMOM2013, and then we present specific techniques used for different tracks. At last we give some comments on our results and discuss some future work on RiMOM2013.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>State, purpose, general statement</title>
      <p>As shown in Fig. 1, the whole system is consists of three layers: User Interface layer,
Control layer and Component layer. In the User Interface layer, RiMOM2013 provides
an interface to allow customizing the matching procedure: including selecting preferred
components, setting the parameters for the system, choosing to use translator tool or
not. In semi-automatic ontology matching, the task layer stores parameters of the
alignment tasks, and controls the execution process of components in the component layer.
In component layer, we define six groups of executable components, including
preprocessor, matcher, aggregator, evaluator, postprocessor and other utilities. In each group,
there are several instantiated components. For a certain alignment task, user can select
appropriate components and execute them in desired sequence.</p>
      <sec id="sec-2-1">
        <title>User Interface</title>
      </sec>
      <sec id="sec-2-2">
        <title>Task Collection</title>
      </sec>
      <sec id="sec-2-3">
        <title>Method Choosing</title>
      </sec>
      <sec id="sec-2-4">
        <title>Task</title>
        <p>Ontology O1
Ontology O2
Parameters
References</p>
      </sec>
      <sec id="sec-2-5">
        <title>Preprocess</title>
      </sec>
      <sec id="sec-2-6">
        <title>Matcher</title>
      </sec>
      <sec id="sec-2-7">
        <title>Aggregator Evaluator</title>
      </sec>
      <sec id="sec-2-8">
        <title>Postprocess</title>
      </sec>
      <sec id="sec-2-9">
        <title>Util</title>
        <p>lftrrrsscaeeePpuooD JrrrsscaeeePPnoo rrrssceePPoo tLggaaenunoO rrrssceePPoo iltrFaumM rrssceePpo IL01PAOW rrrssceePPoo liiliitryFSgandoom trcaehMilittsLcaaEeenbdD trcaehMilliiitryFSgadnoom ttrrcccSaaeeephoVM ttrrcaeedhoNMW IttrsccaaeenhnM trrgggaeoAitrvggaeeeehdAW trrgggaeoAiittsscgeeeenhdnoCW trrgggaeoAitrygaeehndoHmW trrgggaeoAliitrcaePnppnnooCm trrgggaeoAiiitSggeedhdomW lftrrvaaEPuo trrssscePPooo iitsscFaanunnuoG iltrFTeeePnnoO lIfcvaeeennoR illtrrsFTeedhho kaeW trendoW ittrsLSdpooW lltrrsgaaTenoooG
This year we participate in three tracks of the campaign: Benchmark, Multifarm, and
Instance Matching. We will describe specific techniques used in different tracks as
follows:</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Benchmark</title>
      <p>For benchmark track, we use five matching methods: Similarity preprocessor,
Similarity matcher, Similarity Flooding preprocessor, Similarity Flooding matcher, and
Similarity Aggregator.</p>
      <p>We use Edit Distance method and WordNet 2.0 to calculate the similarity between
labels of entities, then for each entity pair we combine these two similarities to an
aggregated similarity.</p>
      <p>
        Experiments are did on five different flooding methods based on similarity
flooding[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: Property Only Method(POM), Hierarchy Method(HM), Common Relation Me
thod(CRM), RDFGraphOfSchema Method(RGSM) and Nothing Method(NM). These
five methods are used to generate the initial graph only for the next step. In POM we
add entity pairs which have superclass relationship; And in HM, we add entity pairs
which have subclass and super property; In CRM, first we check the relationship
between each two entities, then we add entity pairs which have domain relationship or
range relationship. In RGSM, we add these pairs either contented in HM and CRM.
And for NM, we add all entity pairs into initialize graph.
      </p>
      <p>In the next two steps we use the similarity flooding method to flood the similarities
in the graph, and because the map is usually gargantuan, we use a threshold filter to
prune the pairs whose similarity smaller than threshold when after the flooding process.</p>
      <p>Next we use Aggregator to combine these similarities: EditDistance similarity,
WordNet 2.0 similarity, similarity Flooding result similarity. The experiment reflects that the
only single task list without aggregator and other similarities(EditDistance and
WordNet 2.0) gains the best result.</p>
    </sec>
    <sec id="sec-4">
      <title>Multifarm</title>
      <p>
        The multifarm track is designed to test the aligning systems’ ability on multi-lingual
dataset[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The multifarm data is composed of a set of ontologies translated in seven
different languages and the corresponding alignments between these ontologies. Each
entity in one ontology requested to be matched with related entity in different language
ontology.
      </p>
      <p>The nodus makes this task difficult is that there is restricted information in each
entity, which usually only has label information like ”writes contribution”, and the label
of its range property of this entity is ”contribution”, the label of its domain property of
this entity is ”author”, which when translated into same language usually got same or
almost same result like ”autor” in Spanish.</p>
      <p>In the first preprocess step in multifarm task, we use google translate tool to make
two different language into same language, such as when we do the ”en-cn” alignment
task, we translate the Chinese label to English, and when we do the ”cn-es” alignment
task, we translate Spanish label into Chinese. Particularly, when either the source
ontology or target ontology’s language is Russian, we translate them both into English.</p>
      <p>In the second preprocess step, we use google translate tool to make two different
entities’s label all in English for the purpose of use wordnet 2.0 in order to calculate the
sentence similarity.</p>
      <p>Next we use Aggregator to combine these two similarities for each label pair, the
experiment reflects that the edit-distance contributes more in the combined-similarity.</p>
    </sec>
    <sec id="sec-5">
      <title>Instance Matching</title>
      <p>m
1 m-1</p>
      <p>Source Ontology
1 m-1 m
Target Ontology</p>
      <p>Subject matching by unique &lt;Predicate,Object&gt;
m</p>
      <p>m
1 m-1 1 m-1</p>
      <p>Object matching by aligned instances
Unique Subject Matching</p>
      <p>Found new matching pairs</p>
      <p>One-left Object Matching</p>
      <p>Data
Preprocess</p>
      <p>Score Matching</p>
      <p>Aligned instances</p>
      <p>Yes
Link Flooding Algorithm</p>
      <p>Found no matching pairs</p>
      <p>Threshold
δ&gt;δmin?
No
End</p>
      <p>
        For instance matching task, we propose an algorithm called Link Flooding
Algorithm inspired by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] , which includes four main modules, namely Data Preprocess, Unique
Subject Matching, One-left Object Matching and Score Matching. Before going into the
details, we first define ontology Ont as a set of RDF triples &lt; s; p; o &gt; (Subject,
Predicate, Object), and instance Ins as a set of many RDF triples having the same Subject.
Since an instance’s subject could be another’s object, we consider instance matching in
three situations: subject and subject alignment, subject and object alignment and object
and object alignment.
      </p>
      <p>In the first module called data preprocess, we purify the data including
transferring the data sets which are multilingual to be uniform in English. Additionally, we
unify the format of data, for example, the date expressed as ”august, 01, 2013” or
”August, 01, 2013” is transformed to ”08, 01, 2013”. We also do a lot of other operations
like removing special characters to clean the data. The second module achieves
instance matching through one unique &lt; p; o &gt; for the two instances to be aligned.
For example, if in source ontology, only one instance, IN SX has &lt; p; o &gt; as &lt;
birthday; "01; 08; 2013" &gt;, then in target ontology, instances containing &lt;birthday,”01,
08, 2013”&gt; are concluded to be aligned with IN SX . Consequently, one instance in
source ontology can be matched with arbitrary number of instances in target ontology.
In the third module, we obtain object and object alignment via all of the aligned
subjects. In detail, if two aligned instances have a same predicate both having m objects, of
which m 1 are aligned, then the ”one-left” object is aligned. The last module is named
Score Matching where we consider two instances aligned if the weighted average score
of their comments, mottos, birthDates ,and almaMaters is above a certain threshold.
In this task, we take the edit distance as score measure of similarity. We illustrate the
algorithm in Fig. 2.</p>
      <p>
        We first input source ontology and target ontology into the algorithm, as shown in
the picture, the black circles represent the subjects of the RDF triples, the gray circles
represent the objects of the RDF triples, and the white triangles represent the
predicate[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We then clean the data set with the module Data Preprocess. Next, we generate
some initial instance matching pairs as seeds through Unique Subject Matching. As we
mentioned previously, one instance’s subject could be another’s object, we can input the
seeds to One-left Object Matching to get more matching pairs. With those new
detected matching pairs, we reapply Unique Subject Matching to acquire more new matching
pairs. So, we can iteratively run these two modules until we can not find any new
matching pairs. After that, we need to run the Score Matching module with a high threshold
to get new pairs with high confidence, thus we can repeat previous operation namely
iteratively running Unique Subject Matching and One-left Object Matching module with
little error. Later, we reduce the threshold step by step, where in each step newfound
pairs are input into the repeated previous operation to control error propagation. Lastly,
we output all of the matching pairs if the threshold is below the minimum threshold or
all the instances in target ontology are aligned.
1.3
      </p>
    </sec>
    <sec id="sec-6">
      <title>Adaptations made for the evaluation</title>
      <p>Deploying the system on seals platform by a network bears three main challenges.
Firstly, the input source can not download as a file, we can hardly see the information
and structure inherently. Secondly, without the input string path, we can not determine
which task and which size of the dataset are now using. Lastly, with the calling the
interface we provide by seals platform, some XML reader problems occur and make
the process interrupt, then we have no choice but to discard the XML read and load
component to make the system executable, but in multifarm task, we found that there
is some difference between the result generated by our local pc and by seals
platform, there may have some undiscoverable problems when we turn RiMOM2013 as a
unvarying-purpose system.
1.4</p>
    </sec>
    <sec id="sec-7">
      <title>Link to the system and parameters file</title>
      <p>The RiMOM2013 system can be found at</p>
      <p>http://keg.cs.tsinghua.edu.cn/project/RiMOM/
2</p>
      <sec id="sec-7-1">
        <title>Results</title>
        <p>As introduced before, RiMOM2013 participates in three tracks in OAEI 2013. In the
following section, we present the results and related analysis for the individual OAEI
2013 tracks below.
2.1</p>
        <p>benchmark
There are two test set this year, biblio and finance, and for each dataset there are 94
align tasks. We divide these tasks into four groups, 101, 20x, 221-247 and 248-266.
We got good result on 221-247 and the result turns bad on 248-266, compared with
the 2010’s result, the evaluate fashion is changed this year, and there is some error
during the system docking mission, when we try to use a XML loader to implement
circuit-customize, the incompatible problem occurred and because of we do not know
the exactly version of the tool seals platform called, we have to write the program
imitation separately and make them inflexible. As RiMOM2013 is an dynamic system,
these problem more or less affected our implementation.
There are 36 language pairs in multifarm data set, these pairs is combined with 8
languages: Chinese(cn), Czech(cz), Dutch(nl), French(fr), German(de), Portuguese(pt),
Russian(ru), Spanish(es). And permutate depend on lexicographical order. Results are
show in Table. 1.</p>
        <p>Result is shown in Table 2 and this result is from OAEI2013 result page. It is notable
that our system got the minimum runtime among the multilingual matchers, which is
not put in this table. Although we got the third rank in multifarm task, we still have to
mention that our system basically is a translation based system and the connection with
the translator’s supplier is not that good. Otherwise, we could have made it much better.
We have proven it locally with no edas and ekaw ontologies, getting F1 as 0.49.</p>
        <p>The table shows that the worst results all happened in Chinese tasks, because the
basic tool we use in all multifarm fashion is translate tool, we use both google
translator and bing’s translator to initialize the label set before we calculate the WordNet
similarity, edit-distance similarity and vector space similarity.</p>
        <p>Because of the fact that information in each multifarm’s tasks is qualified,
involuntarily, we got the limit on result, the highest F1 we got is 0.605 which is Czech ontology
and English ontology ’s alignment on local machine.
2.3</p>
        <p>instance matching
The result for Instance Matching 2013 is shown in Table 3.</p>
        <p>As we can see from the table, we achieve high values for all measures in all five
testcases, especially in testcase1 and 3. Furthermore, the official result shows that we
win first prize in IM@OAEI2013. We confidently believe our algorithm, Link
Propagation Algorithm is effective for instance matching. We owe our results to each module
of the algorithm and further explain the results more specifically.</p>
        <p>For testcase1, the Score Matching module exploits weighted average score,
therefore avoiding emphasizing some particular information of instances. The reasons why
we attain best performance in testcase1 also include little change in target ontology.
In testcase2, with almost only link information, we needn’t employ last module Score
Matching. Nevertheless, it achieves comparative performance, reflecting the power of
link information, in other words, the power of Link Flooding Algorithm. Though
testcase 3, 4 and 5 have few initial links, we can find new matching pairs through Score
Matching. Although only a few matching pairs are found, we can detect lots of new
pairs by iteratively running Unique Subject Matching and One-left Object Matching.</p>
      </sec>
      <sec id="sec-7-2">
        <title>General comments</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Discussions on the way to improve the proposed system</title>
      <p>We have got no split new method implemented during the benchmark task, and there
also have much information in these tasks that we need to make them outcrop. And
we have not run the RiMOM2013 on anatomy, conference, Library, etc. For anatomy,
since many technical terms emerge as labels in ontologies, we should add some
manually labelling step to generate the reference alignment result but the problem is how to
determine a result pair is matched or not if we have not any biological knowledge. For
multifarm, because the multifarm dateset is translated from conference collection, if we
do the experiment on conference before multifarm, there may be a credible auxiliary
information between each entity pair during the multifarm experiment.
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>Comments on the OAEI 2013 measures</title>
      <p>The results show that in schema level matching, using description information gain the
better matching result, by contrast, in instance level’s matching, using linking
information got the better result, because in instance level, the types of relationship between
each entity is diverse, and in schema level is drab.
4</p>
      <sec id="sec-9-1">
        <title>Conclusion</title>
        <p>In this paper, we present the result of RiMOM2013 in OAEI 2013 Campaign. We
participate in three tracks this year, including Benchmark, Multifarm and Instance Matching.
We presented the architecture of RiMOM2013 framework and described specific
techniques we used during this campaign. In our project, we design a new framework to do
the ontology alignment task. We focus on the instance matching task and propose three
new method in instance matching tasks. The results show that our project can both deal
with multi-lingual ontology on schema level and do well on instance level, and this will
be paid attention in the community.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Acknowledgement</title>
        <p>The work is supported by NSFC (No. 61035004), NSFC-ANR(No. 61261130588), 863
High Technology Program (2011AA01A207), FP7-288342, and THU-NUS NExT
CoLab.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>RiMOM: a dynamic multistrategy ontology alignment framework</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          . (
          <year>2009</year>
          )
          <fpage>1218</fpage>
          -
          <lpage>1232</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>RiMOM results for oaei 2010</article-title>
          . In: OM'
          <fpage>10</fpage>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Melnik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Similarity Flooding: A versatile graph matching algorithm and its application to schema matching</article-title>
          . In: ICDE'
          <fpage>02</fpage>
          . (
          <year>2002</year>
          )
          <fpage>117</fpage>
          -
          <lpage>128</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Castro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
          </string-name>
          , F.,
          <string-name>
            <surname>van Hage</surname>
            ,
            <given-names>W.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montiel-Ponsoda</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>de</surname>
            <given-names>Azevedo</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.R.</given-names>
            ,
            <surname>Stuckenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Svb-Zamazal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Svtek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Tamilin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>dos Santos</surname>
          </string-name>
          , C.T.,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Multifarm: A benchmark for multilingual ontology matching</article-title>
          .
          <source>J. Web Sem</source>
          . (
          <year>2012</year>
          )
          <fpage>62</fpage>
          -
          <lpage>68</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
          </string-name>
          , J.:
          <article-title>Cross-lingual knowledge linking across wiki knowledge bases</article-title>
          .
          <source>In: WWW'12</source>
          . (
          <year>2012</year>
          )
          <fpage>459</fpage>
          -
          <lpage>468</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ichise</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>SLINT: a schema-independent linked data interlinking system</article-title>
          .
          <source>In: OM'12</source>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>