<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An integrated matching system: GeRoMeSuite and SMB - Results for OAEI 2010</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christoph Quix</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Avigdor Gal</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomer Sagi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Kensche</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>RWTH Aachen University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technion - Israel Institute of Technology</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the results of an integrated matching system which is the result of a cooperation project between the Israel Institute of Technology (Technion) and the RWTH Aachen University in Germany. We have integrated the GeRoMeSuite system (from RWTH Aachen) and SMB (from Technion). Both tools aim at matching schemas; while GeRoMeSuite offers a variety of matchers, SMB provides the information on how to combine matchers and how to enhance match results. Thus, an integration of the tools is beneficial for both systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Presentation of the system</title>
      <sec id="sec-2-1">
        <title>GeRoMeSuite</title>
        <p>
          As a framework for model management, GeRoMeSuite [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] provides an environment to
simplify the implementation of model management operators. GeRoMeSuite is based on
the generic role based metamodel GeRoMe [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which represents models from different
modeling languages (such as XML Schema, OWL, SQL) in a generic way. Thereby,
the management of models in a polymorphic fashion is enabled, i.e., the same operator
implementations are used regardless of the original modeling language of the schemas.
In addition to providing a framework for model management, GeRoMeSuite implements
several fundamental operators such as Match [6], Merge [5], and Compose [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          The matching component of GeRoMeSuite has been described in more detail in
[6], where we present and discuss in particular the results for heterogeneous matching
tasks (e.g., matching XML Schema and OWL ontologies). An overview of the complete
GeRoMeSuite system is given in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
1.2
        </p>
        <p>
          SMB
The Schema Matching Boosting (SMB) Service is a toolkit for enhancing the
performance of schema matchers. SMB operates in 3 modes: Enhance, Learn, and
Recommend. In the enhance mode, SMB recieves a raw correspondence matrix (with
similarity values for attribute correspondence in the range of [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ]) and performs an analysis
of the results per row and column. Subsequently, SMB uses contrasting and
weakening algorithms to boost results of “promising” rows and columns and weaken results
of “non-promising” rows and columns respectively. Contrasting is perfromed using a
modified version of the Weber contrast function. Weakening is inversly proportional to
the row and column average.
        </p>
        <p>
          The learn mode is used to perform off-line training of SMB on the perfromance
behavior of matchers w.r.t. various matching tasks which are classified to classes
according to their a-priory features such as schema size. Training is performed using the
SMB algorithm, as introduced in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The recommend classifies in run-time a given
matching task, providing the reccomended ensemble weights for the matching systems
various components. The Learn and recommend modes are a re-implementation of the
system presented in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in which run-time complexity has been reduced from O(n!) to
O(n2) and generic interfaces have been provided to allow any matching system to use
SMB by command-line invocation.
1.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>State, purpose, general statement</title>
        <p>GeRoMeSuite is a generic system which can match ontologies as well as schemas in
other modeling languages such as XML Schema or SQL. Therefore, it is well suited
for matching tasks across heterogeneous modeling languages, such as matching XML
Schema with OWL. We discussed in [6] that the use of a generic metamodel, which
represents the semantics of the models to be matched in detail, is more advantageous
for such heterogeneous matching tasks than a simple graph representation.</p>
        <p>SMB is also a modeling language independent ‘meta’ matching system which mainly
works on the similarity matrices produced by GeRoMeSuite. It improves the clarity of
the similarity values by improving ‘good’ values and descreasing ‘bad’ values. This
should increase the precision of the match result.
1.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Specific techniques used</title>
        <p>Besides the integration of GeRoMeSuite and SMB, we focused this year on adding
validation methods to the system to improve the precision of the match result. A
component for adding disjointness relationships in an ontology has been added to the
matching framework. The component uses machine learning techniques to identify disjoint
concepts with one ontology. The disjointness relationships can then be used in the
validation of schema matches using logical reasoning.</p>
        <p>Furthermore, we developed a component which can use a background ontology
to find additional matches in the ontology. The system is able to find an appropriate
background ontology on the web automatically, using Google and Swoogle. Due to the
set up of the OAEI campaign, we did not use this component for OAEI.
1.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Adaptations made for the evaluation</title>
        <p>We evaluated several match configurations which is easily possible due to the adaptable
and extensible matching framework of GeRoMeSuite. As only one configuration can
be used for all matching tasks, we had to find a good compromise between performance
in terms of precision and recall, time performance for larger ontologies (e.g., anatomy),
and selection of appropriate matchers which work well on all tracks. For example, we
also tested configurations which had an f-measure that was about 5% higher than the
String SMB 
configuration which Cwheilderveenntually used, but these configurations did not work well on
atcher Enhancer
all tracks. The identifiMcaatctihoenr of good match configurations is a topic for future research.</p>
        <p>Aggregation
Fig. 1 iAnggdreigcataiotnes the strategy which we used for the matching tasks in the benchmark</p>
        <p>SMB  Match Final
tsrtaacnkce. All aggregationPaarenndt filter stEenphsancer</p>
        <p>use variable weAigggrehgtastionand thVarelidshatoioldns, whFiicltherare
baatcsheedr on the statisticMalavtcahleures of the input similarities.</p>
        <p>Role</p>
        <p>The role matcher is a special matcher whMicahtcchoermpares the roles of model elements in
our generic role-based metamodel. In principle, this results in matching only elements
of the same type, e.g., classes with classes only and properties with properties only.</p>
        <p>String
Matcher
Instance
Matcher</p>
        <p>Aggregation</p>
        <p>Children
Matcher
Parent
Matcher</p>
        <p>SMB 
Enhancer</p>
        <p>SMB 
Enhancer</p>
        <p>Aggregation</p>
        <p>Role
Matcher</p>
        <p>Aggregation</p>
        <p>Match
Validation</p>
        <p>Final
Filter</p>
        <p>On a technical level, we implemented a command line interface for the
matching component, as the matching component is normally used from within the GUI of
GeRoMeSuite. The command line interface can work in a batch mode in which several
matching tasks and configurations can be processed and compared. The existence of
this tool enabled also an easy integration with the OAEI web service interface.
The results for the OAEI campaign 2010 are available at http://www.dbis.rwth-aachen.
de/gerome/oaei2010/</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>Benchmark</title>
        <p>The following table shows the average results for precision and recall in the benchmark
track.
1xx 1,00 1,00
2xx (xx&lt;48) 0,96 0,88
2xx (xx&gt;47) 0,89 0,51
3xx 0,79 0,38</p>
        <p>A first check, whether a match configuration is suitable at all are the 1xx ontologies.
A configuration should produce the perfect result for these tracks, which is the case for
the configuration, we have finally chosen.</p>
        <p>For the simpler tasks in the 2xx data set (201-247), our system was able to achieve
a very good result with an f-measure of more than 0.9.</p>
        <p>For some of the really difficult tasks (248-266), our system was not able to find any
correspondence as there is hardly any information that can be used (e.g., task 265 with
no labels, no comments, no hierarchy, etc.).</p>
        <p>The results for the tasks 3xx was in general good (f-measure of about 0.6 for 301,
302, and 304). However, ontology 303 is difficult for our generic system as the
namespaces are not defined in a standard way. Therefore, we could only find a few
correspondences.
2.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Conference</title>
        <p>The ontologies in the conference track are rather small and the matching tasks are more
difficult as the ontologies have been designed by humans using different terminologies
and having different goals in mind. As this is a more realistic case than the benchmark
track, we have chosen a configuration which produces good results for the conference
track. Using validation rules to check the logical consistency of the identified
correspondences and a final filter step which generates only 1:1 correspondences was beneficial
for the quality of the result.</p>
        <p>At the current point, we can only report the results with respect to the reference
alignments which are available. For these tasks, we achieve an average f-measure of
about 0.45.
2.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Anatomy 2.4</title>
      </sec>
      <sec id="sec-3-4">
        <title>Directory</title>
        <p>We participated in this task in the sub-tracks 1 to 3. Probably because of our validation
and filtering methods, we achieved a high precision but low recall in task 1. Therefore,
we used the result of task 1 also for task 2. In task 3, we achieved a high recall with
respect to the partial reference alignment. We have to wait for the results with respect
to the full alignment to make a final statement about the quality for this subtask.
We participate only in the single task modality of the directory track. The size of the
input ontologies is similar to the anatomy track, so the same problems of scalability have
to be faced here. We submitted an alignment with about 700 correspondences. Due to
a missing reference alignment for the single task modality, we could not evaluate the
quality of this result.</p>
        <p>The main reason for not participating in the small task modality is that the small
ontologies do not contain enough information to do a reasonable matching. Furthermore,
we think that many of the given reference alignments are not correct.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Comments</title>
      <p>We participate this time the third time in OAEI and see again some improvement of
our matcher compared to last year. Thus, a structured evaluation and comparison of
ontology alignment and schema matching components as OAEI is very useful for the
development of such technologies. We appreciate especially the automatic evaluation
system, although we also had to put some additional effort to get the interface and our
web service working.</p>
      <p>However, some reference alignments, especially in the directory track, should be
reconsidered as they do not seem to be right. Furthermore, an oriented track as in OAEI
2009 would be useful to evaluate semantic matching techniques.</p>
      <p>We are currently working on a system to generate a matching benchmark which
comes closer to the challenges of real ontologies. We would be happy if we could
contribute the results to OAEI 2011.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>As our tool is neither specialized on ontologies nor limited to the matching task, we did
not expect to deliver the best results. However, we are very satisfied with the overall
results, as we can compete with the special purpose ontology alignment tools.</p>
      <p>We will continue to work on the improvement of our matching system and on the
integration of GeRoMeSuite and SMB. We will especially focus on the problem of
identifying good match configurations automatically. We hope to participate again with
an improved system in the OAEI campaign next year.</p>
      <p>Acknowledgements: This work is supported by the DFG Research Cluster on Ultra
High-Speed Mobile Information and Communication (UMIC, http://www.umic.
rwth-aachen.de) and by the Umbrella Cooperation Programme (http://www.umbrella-coop.
org/).
5. C. Quix, D. Kensche, X. Li. Generic Schema Merging. J. Krogstie, A. Opdahl, G.
Sindre (eds.), Proc. 19th Intl. Conf. on Advanced Information Systems Engineering (CAiSE’07),
LNCS, vol. 4495, pp. 127–141. Springer-Verlag, 2007.
6. C. Quix, D. Kensche, X. Li. Matching of Ontologies with XML Schemas using a
Generic Metamodel. Proc. Intl. Conf. Ontologies, DataBases, and Applications of
Semantics (ODBASE), pp. 1081–1098. 2007.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Gal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sagi</surname>
          </string-name>
          .
          <source>Tuning the Ensemble Selection Process of Schema Matchers. Information Systems</source>
          ,
          <volume>35</volume>
          (
          <issue>8</issue>
          ):
          <fpage>845</fpage>
          -
          <lpage>859</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Kensche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Chatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarke. GeRoMe: A Generic Role</surname>
          </string-name>
          <article-title>Based Metamodel for Model Management</article-title>
          .
          <source>Journal on Data Semantics</source>
          , VIII:
          <fpage>82</fpage>
          -
          <lpage>117</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Kensche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li. GeRoMeSuite: A System for Holistic Generic Model Management. C. Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Garofalakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Aberer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Florescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. Y.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ganti</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-C. Kanne</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Klas</surname>
          </string-name>
          , E. J. Neuhold (eds.),
          <source>Proceedings 33rd Intl. Conf. on Very Large Data Bases (VLDB)</source>
          , pp.
          <fpage>1322</fpage>
          -
          <lpage>1325</lpage>
          . Vienna, Austria,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Kensche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarke</surname>
          </string-name>
          .
          <article-title>Generic Schema Mappings</article-title>
          .
          <source>Proc. 26th Intl. Conf. on Conceptual Modeling (ER'07)</source>
          , pp.
          <fpage>132</fpage>
          -
          <lpage>148</lpage>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>