<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CroMatcher - Results for OAEI 2013</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marko Gulić</string-name>
          <email>marko.gulic@pfri.hr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boris Vrdoljak</string-name>
          <email>boris.vrdoljak@fer.hr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Electrical Engineering and Computing</institution>
          ,
          <addr-line>Zagreb</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Maritime Studies</institution>
          ,
          <addr-line>Rijeka</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>CroMatcher is an ontology matching system based on terminological and structural matchers. The most important part of the system is automated weighted aggregation of correspondences produced by using different basic ontology matchers. This is the first year CroMatcher has been involved in the OAEI campaign. The results obtained this year will certainly help in finding and resolving shortcomings in the system before the next campaign.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>State, purpose, general statement</title>
      <p>CroMatcher is a system that executes several basic matchers and then aggregates the
results obtained by these matchers. The system does not use any external resource.
After the execution of terminological basic matchers, the automatic weighted
aggregation is executed. The results of certain terminological basic matcher are
included into the common results depending on their importance. The importance of
certain basic matcher is determined automatically within weighted aggregation. Then,
the several iterative structural matchers are executed (e.g. if the child entities are
similar, the parent entities are similar too). To find correspondences with structural
matchers, the common results of terminological matchers are used. After the
execution of structural basic matchers, the automatic weighted aggregation is
executed too. At the end of matching process, the weighted aggregation is executed
for the terminological and structural common results. Finally, the method of final
alignment (choosing the relevant correspondences between entities of two ontologies)
is executed. This method iteratively takes the best correspondences between two
certain entities into the final alignment. Each entity can be related just to one entity of
other ontology.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Specific techniques used</title>
      <p>In this section, the main components of the CroMatcher will be described in details.
The workflow and the main components of the system can be seen in the Fig. 1. The
CroMatcher consists of the following components:
1. Data extraction from ontologies - the information of every entity is extracted
from given ontologies. After extraction of all data about certain entity, all textual
data is normalized by tokenizing into set of tokens, and removing stop words.</p>
      <p>Data extraction from</p>
      <p>ontologies</p>
      <p>Parallel composition
Structural matchers
Autoweighted aggregation
(aggregated correspondences
of structural matchers)</p>
      <p>Parallel composition</p>
      <p>Terminological matchers
Autoweighted aggregation
(aggregated correspondences
of terminological matchers)
Autoweighted aggregation</p>
      <p>(final aggregation)</p>
      <p>Final
alignment</p>
    </sec>
    <sec id="sec-4">
      <title>2. Terminological matchers:</title>
      <p>
         Matcher that compares ID and annotations’ text of two entities (classes or
properties) with the bi(tri)gram matcher (tests how many bi(tri)grams, i.e.
(substrings of length 2, 3) are the same within two names, e.g. FTP and
FTPServer have 2 bigrams - FT and TP) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
 Matcher that compares only label (or entity’s ID if the entity does not have label)
of two entities (classes or properties) with the bi(tri)gram matcher
 Matcher that compares textual profiles of two entities with TF/IDF [2] and
cosine similarity [3]. A profile of class entity contains annotations of actual class
entity (and all sub classes) and annotations of every property whose domain is
actual class. A profile of property entity contains annotations of actual property
entity and all sub properties.



      </p>
      <p>Matcher that compares individuals of two entities with TF/IDF and cosine
similarity. An individual of class entity contains individual values of actual class
entity and individual values of all subclasses. An individual of property entity
contains individual values of its range class entities.</p>
      <p>Matcher that compares extra individuals of two entities with TF/IDF and cosine
similarity. An extra individual of class entity contains individual values of first
super class of actual class entity. An extra individual of property entity contains
individual values of its domain and range class entities.</p>
      <p>Matcher that compares some general data about the entities. A general data of
class entity contains number of object (data) properties, number of restrictions
and number of sub (super) class entities. A general data of property entity
contains number of sub (super) property entities, number of domain class
entities. More similar the general data, there is the greater correspondence
between entities.
3. Structural matchers:
 Matcher that compares the similarity between super entities (classes or
properties) of currently compared entities. If the super entities are similar,
compared entities are similar too. The matcher is executed iteratively and it ends
when the correspondence value of compared entities stops changing. In each
step, the new correspondence value of compared entities is calculated by
summing 50% of the previous similarity value and 50% of the similarity value
between super entities.
 Matcher that compares the similarity between sub entities (classes or properties)
of currently compared entities. If the sub entities are similar, the compared
entities are similar too. The matcher is executed iteratively and it ends when the
correspondence value of compared entities stops changing. In each step, the new
correspondence value of compared entities is calculated by summing 50% of the
previous similarity value and 50% of the similarity value between sub entities.
 Matcher that compares the similarity between properties (and its range classes)
that have the currently compared classes as their domain. A part of matcher for
similarity between properties compares domain classes of properties.
 Matcher that compares the similarity between range classes of currently
compared properties.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Autoweighted aggregation for parallel composition of basic matchers:</title>
      <p>After the execution of terminological and structural matchers, the results of these
matchers have to be aggregated together. In our system, we used a parallel
composition of matchers for integration of multiple matchers. The main problem in
parallel composition is how to aggregate the results obtained by every basic matcher.
Weighted aggregation is one of the methods for aggregation of matchers [4]. This
method determines a weighted sum of similarity values of the basic matchers and
needs relative weights which should correspond to the expected importance of the
basic matchers. The problem is how to determine the importance of every basic
matcher. Our automatic Autoweight method proposed in [5] automatically defines the
importance of various basic matchers in order to improve overall performance of the
matching system. In this method, the importance of certain basic matcher is specified
by determining the importance of individual best correspondences (greatest
correspondences between two entities in both directions of mapping, as those
correspondences are the most relevant) within the results obtained by that matcher.
The importance of a certain correspondence found within the results of a basic
matcher is higher when the same correspondence is found within a smaller number of
other basic matchers. The method that finds the same correspondences as all other
methods does not provide any new significant information for the matching process.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Process of final alignment:</title>
      <p>At the end, the selection of relevant correspondences, for inclusion in the final
alignment, is executed iteratively. The final alignment includes only the greatest
correspondences between entity1i (first ontology) and entity2j (second ontology). A
correspondence between entity1i and entity2j is the greatest correspondence only if it
has the greatest value among all correspondences in which the entity1i (or entity2j) is
included. Threshold for these greatest correspondences is set to 0.15. We consider that
this threshold is sufficient because the final alignment included only those
correspondences that are the greatest for both compared entities.
1.3</p>
    </sec>
    <sec id="sec-7">
      <title>Link to the system and parameters file</title>
      <p>A system can be downloaded from the http://www.seals-project.eu (tool identifier:
e0fe95d5-943e-4652-bc53-5b36b712c9cb, version: 1.0).
2</p>
      <sec id="sec-7-1">
        <title>Results</title>
        <p>2.1</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Benchmark</title>
      <p>In this section, the evaluation results of CroMatcher matching system executed on the
SEALS platform are presented.</p>
      <p>In OAEI 2013, benchmark includes one blind test (biblio). In Table 1 the result
obtained by running the CroMatcher ontology system can be seen.</p>
    </sec>
    <sec id="sec-9">
      <title>Test set</title>
      <p>Benchmark</p>
    </sec>
    <sec id="sec-10">
      <title>Recall</title>
      <p>0.82</p>
    </sec>
    <sec id="sec-11">
      <title>Precision</title>
      <p>0.95</p>
    </sec>
    <sec id="sec-12">
      <title>F-Measure</title>
      <p>0.88</p>
    </sec>
    <sec id="sec-13">
      <title>Time (s)</title>
      <p>1114
This is the first year CroMatcher has been involved in the OAEI campaign and the
focus was on benchmark track. Therefore, the system had problems with other tracks
because we did not manage to test the system for other tracks before the evaluation
due the lack of time. This year, the ontology matching system had to finish matching
the anatomy ontologies within 10 hours, and our system has not finished even after 30
hours therefore we need to speed up the system before the next OAEI campaign. In
the conference track, our system was partially evaluated because it could not process
several ontologies. In the multifarm, library, large biomedical ontologies, our system
gave an “OutOfMemory” exception so we need to solve that problem too before the
next evaluation. Regarding the instance matching, we did not participate in this track.
3</p>
    </sec>
    <sec id="sec-14">
      <title>General comments</title>
      <p>As we stated before, this is the first time the CroMatcher system participates in the
OAEI campaign. We are very pleased that our ontology matching system was
evaluated on the SEALS platform because this way we could compare our system
with existing systems. There are many different test cases and we think that these test
cases will help us to improve our system in the future.
3.1</p>
    </sec>
    <sec id="sec-15">
      <title>Comments on the results</title>
      <p>Our system shows great results in benchmark track. Considering the fact that the
benchmark track contains the largest number of ontologies in which the different parts
are missing, we can conclude that our system performs well but only while matching
small ontologies like these in the benchmark track. While matching big ontologies
(thousands of entities), our system is quite slow and it cannot handle big ontologies
yet.
3.2</p>
    </sec>
    <sec id="sec-16">
      <title>Discussions on the way to improve the proposed system</title>
      <p>We will have to find faster measure than TF/IDF to compare different documents of
entities. Also, we will have to store the data about the entities in a separate file instead
in the java objects in order to reduce the usage of memory in the system.
4</p>
      <sec id="sec-16-1">
        <title>Conclusion</title>
        <p>The CroMatcher ontology matching system and its results of evaluation on different
OAEI track were presented in this paper. The evaluation results show that CroMatcher
successfully matches small ontologies but it has problems dealing with ontologies that
have a large number of entities. We will try to solve this problem and prepare the
system to be competitive in all OAEI tracks next year.
2. Salton, G., McGill, M.H.: Introduction to Modern Information Retrieval. McGraw-Hill,</p>
        <p>New York (1983)
3. Baeza-Yates, R., Ribeiro-Neto B.: Modern Information Retrieval. Addison-Wesley, Boston
(1999)
4. Do, H., Rahm, E.: COMA - a system for flexible combination of schema matching
approaches. In Proc. 28th International Conference on VLDB, pages 610-621, 2002.
5. Gulić, M., Magdalenić, I., Vrdoljak, B.: Automatically Specifying Parallel Composition of
Matchers in Ontology Matching Process. In: Barriocanal, E. G., Cebeci, Z., Okur, M. C.,
Öztürk, A. (eds.) MTSR 2011. Communications in Computer and Information Science, vol.
240, pp. 22-33. Springer, Berlin Heidelberg (2011)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Ontology matching. Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>