<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>RiMOM Results for OAEI 2015</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yan Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juanzi Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tsinghua University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <abstract>
        <p>This paper presents the results of RiMOM in the Ontology Alignment Evaluation Initiative (OAEI) 2015. We only participated in Instance Matching@OAEI2015. We first describe the overall framework of our matching System (RiMOM); then we detail the techniques used in the framework for instance matching. Last, we give a thorough analysis on our results and discuss some future work on RiMOM.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. We index the instances based on their objects in two knowledge bases respectively,
and then select the instances which contain the same keys as candidate instance
pairs. We limit the number of pairs to be compared by this step, which significantly
improve the efficiency of the system.
2. We implement several matchers in our instance matching system, we can execute
these matchers in parallel and then aggregate the result according to the
characteristics of the source ontologies.</p>
      <p>
        In order to solve the challenges in large-scale instance matching, we propose an
instance matching framework RiMOM-2015 (RiMOM-Instance Matching), which is
based on our former ontology matching system RiMOM [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The RiMOM-2015
framework is designed for large-scale instance matching task specially. It presents a novel
multi-strategy method to be fit for different kind of ontology and employs inverted
index to imporve the efficiency.
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>State, purpose, general statement</title>
      <p>This section describes the overall framework of RiMOM. The overview of the instance
matching system is shown in Fig. 1. The system includes seven modules, i.e.,
Preprocess, Predicate Alignment, Mathcher Choosing, Candidate Pair Generation, Matching
Score Calculation, Instance Alignment and Validation. The sequences of the process
are shown in the Fig. 1. We illustrate the process as follows.
1. Preprocess: The system begins with Preprocess, which loads the ontologies and
parameters into system. In the meantime, preprocessor can get some meta data
about the two ontologies, which will be used in the later processes, Predicate
alignment and Matcher choosing
2. Predicate Alignment: In this process, we will get the alignments of the
predicates between the two ontologies. Currently, in our system, this process is
semiautomatic.
3. Matcher choosing: The system will choose the most suitable one or more
matchers according to the meta data of the ontologies.
4. Candidate Pairs Generation: In this step, we get the candidate pair when the
instances have the same literal objects on some discriminatory predicate.
5. Matching Score Calculation: After the candidate set generation, we calculate
more accurate similarity using the algorithm chosen by step 3. In this task, the
vector distance similarity was calculated between each candidate pair.
6. Instance Alignment: According to the similarity calculated in step 5, we get the
final instance alignment.
7. Validation: We will evaluate the alignment result on Precision, Recall and
F1</p>
      <p>Measure if there is validation data set.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Specific techniques used</title>
      <p>This year we only participate in the Instance Matching track. We will describe
specific techniques used in this track.</p>
      <p>Data Preprocessing: First, we remove some stop words like ”a, of, the”, etc.
Afterwards, we calculate the TF-IDF values of words in each knowledge base. We also
calculate some information of each predicate, in order to find the important predicates.</p>
      <p>Predicate Alignment: It is apparent that we should get the alignment of the
predicates before we calculate the similarity of instances. The predicates can express rich
semantics, and there exists one-to-one, one-to-many, or many-to-many relationships
among these predicates. We can find some of one-to-one relationships through
calculating the Jaccard Similarity of the two predicates. i.e.</p>
      <p>sim(pi; pj ) = jOpi \ Opj j</p>
      <p>jOpi [ Opj j
where pi and pj are predicates in two ontologies respectively. Opj is the range of
the predicate pj .</p>
      <p>There are also some one-to-many relationships. We get the alignments of them by
manual regulations, e.g.</p>
      <p>object(pi) =
n
X object(pj )
j=1
object(pi) = max object(pj )</p>
      <p>j=1::n
object(pi) = min object(pj )</p>
      <p>j=1::n
Ia.</p>
      <p>where Ia is an instance, La is a list which contains all of the objects of the instance
Candidate Pairs Generation: This step aims to pick a relatively small set of
candidate pairs from all pairs. Due to the large scale of knowledge bases, it is impossible
to calculate matching scores of all instance pairs. In our method, we firstly generate the
inverted index on the objects. instance pairs are selected into the candidate set when
they have common objects. This method may reduce the recall, but it also reduce the
scale of computation significantly.</p>
      <p>Multi-Strategy: We implement several matchers in our system, e.g. label-based
approach and structure-based approach. In the preprocess step, we will compare the
schema of the two ontologies. If the range of predicates is similar, the label-based
approach will play a key role in the matching process. Otherwise, the literal properties are
not similar (e.g. the two ontologies are defined in different languages), label-based
approach will not be effective. In this case, we will get some supplementary information
(e.g. machine translation, WordNet), or use structure-based appraoch.</p>
      <p>Similarity Calculation: In OAEI 2015 instance matching track, the ontologies are
all defined in the same language, English. In the tasks which we took part in, author
dis and author rec, the schema of the ontologies tend to be similar. So label-based
vector distance matcher is chosen to calculate the similarity of the instances, it is defined
as follows:</p>
      <p>La = Objects(Ia)
1</p>
      <p>X
jLaj Oa2La
Sim(Ia; Ib) = Sim(La; Lb) =
max(Sim(Oa; Ob)jOb 2 Lb)
where Oa is one of the objects in the list La. We define the similarity of the two
instances equals to the similarity of their objects list. For each Oa in La, we find a
most similar object Ob in Lb. The algorithm varies with the data type of the object.
For example, for date, we use the indicator function. The indicator function will be 1
when the dates are the same, otherwise, 0. For some literal properties, such as ”title”,
we compute cosine similarity based on the tf-idf vectors.</p>
      <p>Instance Alignment After we get the accurate similarity, for each instance in source
ontology, we choose the instance which has the best score in target ontology. Then we
filter the result on a certain threshold and get the final Instance Alignment.
1.3</p>
    </sec>
    <sec id="sec-4">
      <title>Link to the system and parameters file</title>
      <p>The RiMOM system (2015 version) can be found at https://www.dropbox.
com/s/6bx4pb46ytvddvy/RiMOM.zip?oref=e.
2</p>
      <sec id="sec-4-1">
        <title>Results</title>
        <p>The Instance Matching track contains five subtasks. we present the results and
related analysis for the two subtasks (author-disambiguation and author-recognition) in the
following subsections.
2.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Author Disambiguation sub-task</title>
      <p>The goal of the author-dis task is to link OWL instances referring to the same person
(i.e., author) based on their publications. We can use the Sandbox (small scale data set)
to tune our parameters. The class ’author’ have only one literal properties, ’name’. So
we must get alignments on the class ’publication’. Finally, we get 854 pairs for Sandbox
task, and 8428 pairs for Mainbox task.</p>
      <sec id="sec-5-1">
        <title>EXONA</title>
        <p>InsMT+</p>
        <p>Lily
LogMap
RiMOM</p>
      </sec>
      <sec id="sec-5-2">
        <title>EXONA</title>
        <p>InsMT+</p>
        <p>Lily
LogMap
RiMOM</p>
        <p>Expected mappings Retrieved mappings Precision Recall F-measure
854 854 0.941 0.941 0.941
854 722 0.834 0.705 0.764
854 854 0.981 0.981 0.981
854 779 0.994 0.906 0.948
854 854 0.929 0.929 0.929</p>
        <p>Table 1. The result for Author-dis sandbox</p>
        <p>The reference alignments of sandbox are provided by sponsor, so we only pay
attention to mainbox. As shown in table 2, the results for the author-dis mainbox task are:
Precision 0.911, Recall 0.911, F-measure 0.911, which is slightly lower than sandbox.
Afterwards, we find that the property ’title’ plays a key role in publication. So we think
that we can get a better result if we do some deeper work on it.
2.2</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Author Recognition sub-task</title>
      <p>The goal of the Author-rec task is to associate a person (i.e., author) with the
corresponding publication report containing aggregated information about the publication
activity of the person, such as number of publications, h-index, years of activity,
number of citations. The final goal is similar with the Author-dis task, but there are some
changes on schema of the ontology. The most remarkable is that there exists
one-tomany relationships between the properties. So we add some manual regulation to solve
the problem.</p>
      <p>As show in table 4, RiMOM get a excellent result on author-rec task. The results for
the author-dis mainbox task are: Precision 0.999, Recall 0.999, Fmeasure 0.999, which
expresses that the algorithm we implement is very suitable for this task.</p>
      <sec id="sec-6-1">
        <title>EXONA</title>
        <p>InsMT+</p>
        <p>Lily
LogMap
RiMOM</p>
        <p>Expected mappings Retrieved mappings Precision Recall F-measure
854 854 0.518 0.518 0.518
854 90 0.556 0.059 0.106
854 854 1.0 1.0 1.0
854 854 1.0 1.0 1.0
854 854 1.0 1.0 1.0</p>
        <p>Table 3. The result for Author-rec sandbox
Our system need align the predicates before instance matching, and in this process, the
system is required to scan all of the instances in the ontology, which may cause a waste
of time. In addition, the process of P redicateAlignment is semi-automatic, we have
to add some manual regulations to deal with the one-to-many relationships.</p>
        <p>In conclusion, we hope to develop our system through inventing an algorithm to
align the predicates automatically and iteratively. Firstly we can use the values of
predicates to align the instances, and in turn, the aligned instances will help us to update the
similarity for predicates. In this way, we will gradually get the final alignment result.
These two tasks are instance matching task on publication data set. We use the reference
of the sandbox to tune the parameters,and it turns out that our approach is effective. We
also find that the inverted index not only improve efficiency, but reduce the mistake and
increase the Precision. There are also some aspects we are not satisfied with. For time’s
sake, we don’t take part in other three tasks. Finally, we are looking forward to making
some progress in the next OAEI campaign.
3</p>
        <sec id="sec-6-1-1">
          <title>Conclusion and future work</title>
          <p>In this paper, we present the system of RiMOM in OAEI 2015 Campaign. We
participate in intance matching track this year. We described specific techniques we used in
the task. In our project, we design a new framework to deal with the instance matching
task. The result turns out that our method is effective and efficient.</p>
          <p>In the future, we will develop an iterative algorithm to align the predicates
automatically.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becker</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Dbpedia - A crystallization point for the web of data</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>7</volume>
          (
          <issue>3</issue>
          ) (
          <year>2009</year>
          )
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hoffart</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berberich</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>YAGO2: A spatially and temporally enhanced knowledge base from wikipedia</article-title>
          .
          <source>Artif. Intell</source>
          .
          <volume>194</volume>
          (
          <year>2013</year>
          )
          <fpage>28</fpage>
          -
          <lpage>61</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tang</surname>
          </string-name>
          , J.:
          <article-title>Xlore: A large-scale english-chinese bilingual knowledge graph</article-title>
          .
          <source>In: Proceedings of the ISWC 2013 Posters &amp; Demonstrations Track</source>
          , Sydney, Australia, October
          <volume>23</volume>
          ,
          <year>2013</year>
          . (
          <year>2013</year>
          )
          <fpage>121</fpage>
          -
          <lpage>124</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Euzenat</surname>
          </string-name>
          , J.:
          <article-title>Ontology matching: State of the art and future challenges</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>25</volume>
          (
          <issue>1</issue>
          ) (
          <year>2013</year>
          )
          <fpage>158</fpage>
          -
          <lpage>176</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Rimom: A dynamic multistrategy ontology alignment framework</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>21</volume>
          (
          <issue>8</issue>
          ) (
          <year>2009</year>
          )
          <fpage>1218</fpage>
          -
          <lpage>1232</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>