<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">An integrated matching system: GeRoMeSuite and SMB -Results for OAEI 2010</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Christoph</forename><surname>Quix</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">RWTH Aachen University</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Avigdor</forename><surname>Gal</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Technion -Israel Institute of Technology</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tomer</forename><surname>Sagi</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Technion -Israel Institute of Technology</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">David</forename><surname>Kensche</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">RWTH Aachen University</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">An integrated matching system: GeRoMeSuite and SMB -Results for OAEI 2010</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">1ED37F1508056F6AAF1BC7AB7CF1D7D2</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T05:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present the results of an integrated matching system which is the result of a cooperation project between the Israel Institute of Technology (Technion) and the RWTH Aachen University in Germany. We have integrated the GeRoMeSuite system (from RWTH Aachen) and SMB (from Technion). Both tools aim at matching schemas; while GeRoMeSuite offers a variety of matchers, SMB provides the information on how to combine matchers and how to enhance match results. Thus, an integration of the tools is beneficial for both systems.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>1 Presentation of the system</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">GeRoMeSuite</head><p>As a framework for model management, GeRoMeSuite <ref type="bibr" target="#b2">[3]</ref> provides an environment to simplify the implementation of model management operators. GeRoMeSuite is based on the generic role based metamodel GeRoMe <ref type="bibr" target="#b1">[2]</ref>, which represents models from different modeling languages (such as XML Schema, OWL, SQL) in a generic way. Thereby, the management of models in a polymorphic fashion is enabled, i.e., the same operator implementations are used regardless of the original modeling language of the schemas. In addition to providing a framework for model management, GeRoMeSuite implements several fundamental operators such as Match <ref type="bibr" target="#b5">[6]</ref>, Merge <ref type="bibr" target="#b4">[5]</ref>, and Compose <ref type="bibr" target="#b3">[4]</ref>.</p><p>The matching component of GeRoMeSuite has been described in more detail in <ref type="bibr" target="#b5">[6]</ref>, where we present and discuss in particular the results for heterogeneous matching tasks (e.g., matching XML Schema and OWL ontologies). An overview of the complete GeRoMeSuite system is given in <ref type="bibr" target="#b2">[3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">SMB</head><p>The Schema Matching Boosting (SMB) Service is a toolkit for enhancing the performance of schema matchers. SMB operates in 3 modes: Enhance, Learn, and Recommend. In the enhance mode, SMB recieves a raw correspondence matrix (with similarity values for attribute correspondence in the range of [0,1]) and performs an analysis of the results per row and column. Subsequently, SMB uses contrasting and weakening algorithms to boost results of "promising" rows and columns and weaken results of "non-promising" rows and columns respectively. Contrasting is perfromed using a modified version of the Weber contrast function. Weakening is inversly proportional to the row and column average.</p><p>The learn mode is used to perform off-line training of SMB on the perfromance behavior of matchers w.r.t. various matching tasks which are classified to classes according to their a-priory features such as schema size. Training is performed using the SMB algorithm, as introduced in <ref type="bibr" target="#b0">[1]</ref>. The recommend classifies in run-time a given matching task, providing the reccomended ensemble weights for the matching systems various components. The Learn and recommend modes are a re-implementation of the system presented in <ref type="bibr" target="#b0">[1]</ref> in which run-time complexity has been reduced from O(n!) to O(n 2 ) and generic interfaces have been provided to allow any matching system to use SMB by command-line invocation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">State, purpose, general statement</head><p>GeRoMeSuite is a generic system which can match ontologies as well as schemas in other modeling languages such as XML Schema or SQL. Therefore, it is well suited for matching tasks across heterogeneous modeling languages, such as matching XML Schema with OWL. We discussed in <ref type="bibr" target="#b5">[6]</ref> that the use of a generic metamodel, which represents the semantics of the models to be matched in detail, is more advantageous for such heterogeneous matching tasks than a simple graph representation.</p><p>SMB is also a modeling language independent 'meta' matching system which mainly works on the similarity matrices produced by GeRoMeSuite. It improves the clarity of the similarity values by improving 'good' values and descreasing 'bad' values. This should increase the precision of the match result.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.4">Specific techniques used</head><p>Besides the integration of GeRoMeSuite and SMB, we focused this year on adding validation methods to the system to improve the precision of the match result. A component for adding disjointness relationships in an ontology has been added to the matching framework. The component uses machine learning techniques to identify disjoint concepts with one ontology. The disjointness relationships can then be used in the validation of schema matches using logical reasoning.</p><p>Furthermore, we developed a component which can use a background ontology to find additional matches in the ontology. The system is able to find an appropriate background ontology on the web automatically, using Google and Swoogle. Due to the set up of the OAEI campaign, we did not use this component for OAEI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.5">Adaptations made for the evaluation</head><p>We evaluated several match configurations which is easily possible due to the adaptable and extensible matching framework of GeRoMeSuite. As only one configuration can be used for all matching tasks, we had to find a good compromise between performance in terms of precision and recall, time performance for larger ontologies (e.g., anatomy), and selection of appropriate matchers which work well on all tracks. For example, we also tested configurations which had an f-measure that was about 5% higher than the configuration which we eventually used, but these configurations did not work well on all tracks. The identification of good match configurations is a topic for future research.</p><p>Fig. <ref type="figure" target="#fig_0">1</ref> indicates the strategy which we used for the matching tasks in the benchmark track. All aggregation and filter steps use variable weights and thresholds, which are based on the statistical values of the input similarities.</p><p>The role matcher is a special matcher which compares the roles of model elements in our generic role-based metamodel. In principle, this results in matching only elements of the same type, e.g., classes with classes only and properties with properties only.  On a technical level, we implemented a command line interface for the matching component, as the matching component is normally used from within the GUI of GeRoMeSuite. The command line interface can work in a batch mode in which several matching tasks and configurations can be processed and compared. The existence of this tool enabled also an easy integration with the OAEI web service interface.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.6">Link to the system and parameters file</head><p>More information about the system can be found on the homepage of GeRoMeSuite: http://www.dbis.rwth-aachen.de/gerome/oaei2010/</p><p>The page provides also links to the configuration files used for the evaluation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.7">Link to the set of provided alignments (in align format)</head><p>The results for the OAEI campaign 2010 are available at http://www.dbis.rwth-aachen. de/gerome/oaei2010/ 2 Results</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Benchmark</head><p>The following table shows the average results for precision and recall in the benchmark track.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Task</head><p>Precision Recall 1xx 1,00 1,00 2xx (xx&lt;48) 0,96 0,88 2xx (xx&gt;47) 0,89 0,51 3xx 0,79 0,38 A first check, whether a match configuration is suitable at all are the 1xx ontologies. A configuration should produce the perfect result for these tracks, which is the case for the configuration, we have finally chosen.</p><p>For the simpler tasks in the 2xx data set (201-247), our system was able to achieve a very good result with an f-measure of more than 0.9.</p><p>For some of the really difficult tasks (248-266), our system was not able to find any correspondence as there is hardly any information that can be used (e.g., task 265 with no labels, no comments, no hierarchy, etc.).</p><p>The results for the tasks 3xx was in general good (f-measure of about 0.6 for 301, 302, and 304). However, ontology 303 is difficult for our generic system as the namespaces are not defined in a standard way. Therefore, we could only find a few correspondences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Conference</head><p>The ontologies in the conference track are rather small and the matching tasks are more difficult as the ontologies have been designed by humans using different terminologies and having different goals in mind. As this is a more realistic case than the benchmark track, we have chosen a configuration which produces good results for the conference track. Using validation rules to check the logical consistency of the identified correspondences and a final filter step which generates only 1:1 correspondences was beneficial for the quality of the result.</p><p>At the current point, we can only report the results with respect to the reference alignments which are available. For these tasks, we achieve an average f-measure of about 0.45.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Anatomy</head><p>We participated in this task in the sub-tracks 1 to 3. Probably because of our validation and filtering methods, we achieved a high precision but low recall in task 1. Therefore, we used the result of task 1 also for task 2. In task 3, we achieved a high recall with respect to the partial reference alignment. We have to wait for the results with respect to the full alignment to make a final statement about the quality for this subtask.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Directory</head><p>We participate only in the single task modality of the directory track. The size of the input ontologies is similar to the anatomy track, so the same problems of scalability have to be faced here. We submitted an alignment with about 700 correspondences. Due to a missing reference alignment for the single task modality, we could not evaluate the quality of this result.</p><p>The main reason for not participating in the small task modality is that the small ontologies do not contain enough information to do a reasonable matching. Furthermore, we think that many of the given reference alignments are not correct.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Comments</head><p>We participate this time the third time in OAEI and see again some improvement of our matcher compared to last year. Thus, a structured evaluation and comparison of ontology alignment and schema matching components as OAEI is very useful for the development of such technologies. We appreciate especially the automatic evaluation system, although we also had to put some additional effort to get the interface and our web service working.</p><p>However, some reference alignments, especially in the directory track, should be reconsidered as they do not seem to be right. Furthermore, an oriented track as in OAEI 2009 would be useful to evaluate semantic matching techniques.</p><p>We are currently working on a system to generate a matching benchmark which comes closer to the challenges of real ontologies. We would be happy if we could contribute the results to OAEI 2011.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>As our tool is neither specialized on ontologies nor limited to the matching task, we did not expect to deliver the best results. However, we are very satisfied with the overall results, as we can compete with the special purpose ontology alignment tools.</p><p>We will continue to work on the improvement of our matching system and on the integration of GeRoMeSuite and SMB. We will especially focus on the problem of identifying good match configurations automatically. We hope to participate again with an improved system in the OAEI campaign next year.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Matching Strategy for OAEI 2010</figDesc></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgements: This work is supported by the DFG Research Cluster on Ultra High-Speed Mobile Information and Communication (UMIC, http://www.umic. rwth-aachen.de) and by the Umbrella Cooperation Programme (http://www.umbrella-coop. org/).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Tuning the Ensemble Selection Process of Schema Matchers</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Sagi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Systems</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="845" to="859" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">GeRoMe: A Generic Role Based Metamodel for Model Management</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kensche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Chatti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jarke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal on Data Semantics</title>
		<imprint>
			<biblScope unit="volume">VIII</biblScope>
			<biblScope unit="page" from="82" to="117" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">GeRoMeSuite: A System for Holistic Generic Model Management</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kensche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings 33rd Intl. Conf. on Very Large Data Bases (VLDB)</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Koch</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Gehrke</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">N</forename><surname>Garofalakis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Srivastava</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Aberer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Deshpande</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Florescu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><forename type="middle">Y</forename><surname>Chan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Ganti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C.-C</forename><surname>Kanne</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">W</forename><surname>Klas</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Neuhold</surname></persName>
		</editor>
		<meeting>33rd Intl. Conf. on Very Large Data Bases (VLDB)<address><addrLine>Vienna, Austria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="1322" to="1325" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Generic Schema Mappings</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kensche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jarke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 26th Intl. Conf. on Conceptual Modeling (ER&apos;07)</title>
				<meeting>26th Intl. Conf. on Conceptual Modeling (ER&apos;07)</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="132" to="148" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Generic Schema Merging</title>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kensche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 19th Intl. Conf. on Advanced Information Systems Engineering (CAiSE&apos;07)</title>
		<title level="s">LNCS</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Krogstie</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Opdahl</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><surname>Sindre</surname></persName>
		</editor>
		<meeting>19th Intl. Conf. on Advanced Information Systems Engineering (CAiSE&apos;07)</meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">4495</biblScope>
			<biblScope unit="page" from="127" to="141" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Matching of Ontologies with XML Schemas using a Generic Metamodel</title>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kensche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Ontologies, DataBases, and Applications of Semantics (ODBASE)</title>
				<meeting>Intl. Conf. Ontologies, DataBases, and Applications of Semantics (ODBASE)</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="1081" to="1098" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
