<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">COMA++: Results for the Ontology Alignment Contest OAEI 2006</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Sabine</forename><surname>Massmann</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Leipzig</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Daniel</forename><surname>Engmann</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Leipzig</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Erhard</forename><surname>Rahm</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Leipzig</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">COMA++: Results for the Ontology Alignment Contest OAEI 2006</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">70517E5A61BAAB9402AF9FFCFA48D256</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T06:49+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper summarizes the OAEI Contest 2006 results for the matching tool COMA++. The study shows that a generic schema matching system can also effectively solve complex ontology matching tasks.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Presentation of the system</head><p>COMA++ is an extension of our previous COMA prototype <ref type="bibr" target="#b0">[1]</ref>. It is a customizable and generic tool for matching both schemas and ontologies specified in languages such as SQL, XML Schema or OWL <ref type="bibr" target="#b1">[2]</ref>. COMA++ offers a GUI and supports the combined use of several match algorithms as well as the reuse of previously confirmed match results <ref type="bibr" target="#b5">[6]</ref>. The COMA++ architecture is shown in figure <ref type="figure" target="#fig_0">1</ref>. The Repository persistently stores all match-related data, the Model and Mapping Pools manage all schemas, ontologies, and mappings in memory, and the Matching Engine performs the match operations. The GUI provides access to these components and is used to visualize models, manage the match process and mappings. The Matching Engine contains different libraries that supports many match algorithms and match strategies. The similarity results of individual matchers are maintained and aggregated within a similarity matrix per match task <ref type="bibr" target="#b0">[1]</ref>. Match strategies implement workflows to deal with complex match tasks and enable a reuse of previous results and the decomposition of larger match tasks into smaller ones <ref type="bibr" target="#b2">[3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">State, purpose, general statement</head><p>COMA and COMA++ have proven to be very effective for matching database and XML schemas <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b5">6]</ref>. The main reason for this test was to see the effectiveness of a generic matching tool for dealing with ontologies. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Repository</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Specific techniques used</head><p>An automatic match process in COMA++ consists of several steps. In the first step the imported schemas and ontologies are transformed into a generic graph representation. The graph nodes represent schema/ontology components such as classes or properties and have attributes like name and data type. All relationships, e.g. aggregations and specializations, are uniformly represented by edges between nodes. In the next step graph nodes are matched with each other using a match strategy and matchers. There is no differentiation made between node types, so that for example classes and properties can be matched. The similarity values obtained by the individual matchers are aggregated according to a combination strategy (average, etc.). The match candidates are selected from the aggregated correspondences, e.g. based on a threshold criterion. Finally, the result mapping (RDF alignment) is generated.</p><p>In addition to the schema-based matchers we used an instance-level matcher which has recently been added to the COMA++ match library.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">Adaptations made for the evaluation</head><p>In addition to the integration of an instance matcher only few changes to COMA++ were necessary to deal with specifics of the contest. As mentioned, the output mapping was translated into the predefined RDF alignment format. Furthermore the result of a matcher was ignored if it contained the same similarity value for all entities. This was a minor adaptation made because the same strategy had to be used for all tests.</p><p>Another change was the splitting of huge ontologies into several smaller ones. The results of the smaller match tasks were then merged. Another selection step was applied on the merged results to obtain the final result mapping.</p><p>To fit the rules of the contest the prototype is not using synonyms and abbreviations which can be given to the system. The specific creation of them was not allowed but would have been necessary because of the different domains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.4">Link to the system, parameters file and to the set of provided alignments</head><p>At the following URL .zip archives of all the contest results are available.</p><p>Furthermore the system with a parameters file can be downloaded.</p><p>http://dbs.uni-leipzig.de/Research/coma_oaei.html</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Results</head><p>The results discussed here have been calculated with five matchers: NameType, Comment, Parents, Children and Instance. For the combination of the match results the average value has been computed and a selection has been made using, e.g. a threshold. The best setting has been determined by running different configurations on the benchmark and choosing the one with the highest f-measure. The exact parameters can be found in the appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Benchmark</head><p>This test is a systematic benchmark test containing 50 tests which can be used for identifying the strengths and weaknesses of an algorithm.</p><p>The overall score of COMA++ for this task (except 102) is quite good: Precision Recall F-Measure Time Average 0.96 0.82 0.88 7.0 sec</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.1">Tests 101-104</head><p>The results for tests 101, 103 and 104 are perfect because the classes and properties have the same names, comments and instances. The language restriction and generalization have no influence. The alignment for the irrelevant ontology 102 contains a few false matches that have similar names, e.g. "year -yearValue". There are no matches expected for this test, thus precision and recall automatically are 0.0, so we left this value out at the average calculation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Precision</head><p>Recall F-Measure Time Average 1.00 1.00 1.00 15.4 sec</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.1.2</head><p>Tests 201-247 The results of these tests differ depending on the given information because the chosen strategy uses names, data types, comments, structure and instance. If one or more of these information is missing only the remaining information can be used.</p><p>For the tasks 202, 209 and 210 the names and the comments differ so these information can't be used and the results have a lower recall.</p><p>For all other tests of this group the names, the comments or both contain useful information so the results are quite good.</p><p>The tests 221-247 even have the same names and comments, whereas the structure is different. Instances are similar but some ontologies don't contain them. The given information is enough to reach very good results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Precision</head><p>Recall F-Measure Time Average 0.98 0.95 0.97 8.1 sec</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.3">Tests 248-266</head><p>In these tests the names have been substituted with random strings and there are no comments. The algorithm can thus only use the hierarchy and the instances, if given. Not for every class and property instances exist, so that information just helps to find corresponding entities. The results for these tests are therefore satisfactory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Precision</head><p>Recall F-Measure Time Average 0.89 0.51 0.65 4.2 sec</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.4">Tests 301-304 (Real Ontologies)</head><p>The real-world ontologies have been a more difficult task for COMA++ because the ontologies are quite different compared with the 101 ontology. Three out of the four ontologies don't contain instances -only 304 does. 302 and 303 don't use comments, the structure is quite different and the names are often dissimilar, which the prototype could not find because the contest disallowed us to use auxiliary information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Precision</head><p>Recall F-Measure Time Average 0.84 0.69 0.76 3.6 sec</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Anatomy</head><p>For the anatomy task two large ontologies had to be aligned. Because of the huge size the matching task had to be splitted by our system into smaller ones. The part results were merged and then a variety has been selected. The selection was necessary because with the splitted matching more false matches have been found.</p><p>Another difficulty has been the fact that in the FMA ontology the id of classes look like "frame_92794" and "frame_51746" and the real information is in the label. Whereas the OpenGALEN ontology has meaningful ids and uses rarely labels. These labels or ids are made up of a lot of tokens and sometimes they differ only in a few letters, e.g. "fifth" instead of "first". Therefore we expect that more false positives will be found than in the benchmark test.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Directory</head><p>For this test we matched 4640 pairs of ontologies.</p><p>To find out more about the quality of our strategy and that kind of test we also matched the 2265 ontology pairs of the contest 2005. We reached a recall around 0.32 what is as good as the best participants. Looking at the missing correspondences we couldn't find any similarity of the names, e.g., "7/source.owl#Academic_Departments" and "7/target.owl#United_Kingdom" and no comments or instances existed. That's why we couldn't figure out a way to improve our system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Food</head><p>The food ontologies uses the different format SKOS. We transformed the given SKOS files into OWL format to be able to match them. These ontologies are quite large so the match process has to be splitted as well as in the anatomy test.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5">Conference</head><p>This task contains 10 ontologies that deal with conference organisation. The calculation of alignments between each of them was no problem because of the smaller size.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">General comments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Comments on the results</head><p>Given that COMA and COMA++ were not specifically designed for matching ontologies and we invested only a small amount of time for the contest the overall results are surprisingly good. The new instance matcher proved to be effective especially for the tests where useful information was only provided by instance values.</p><p>The used parameters were selected for the whole set of tests. For individual match tasks better results than reported can be obtained by using tailored configuration parameters. Another point is that domain-specific abbreviations, synonyms and previous match results could not be utilized in order to conform with the contest rules.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Discussions on the way to improve the proposed system</head><p>The use of auxiliary information that is conforming to the rules, e.g. WordNet or UMLS, could improve the recall results. The addition of ontology-oriented matchers and the distinction between node and relationship types could also be helpful.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Comments on the OAEI 2006 procedure</head><p>This is our first participation in this Ontology Alignment Contest. Since we are not involved in the contest preparation we had no prior knowledge of most tasks and the regulations. We thus had comparatively little time (about 2 months) to deal with the details of six test series and technical problems caused by unknown formats and large files. Furthermore, we had to adapt the system to the contest rules and try to find the best strategy and configuration.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 .</head><label>1</label><figDesc>Figure 1.Architecture of COMA++</figDesc></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>The presented contest results show that COMA++ is not only effective for schema matching but also for ontology matching. This underlines the viability of generic approaches for complex metadata management problems.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendix: Raw results</head><p>The following benchmark results have been computed with the following parameters:</p><p>• Strategie: NoContext </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">COMA -A System for Flexible Combination of Schema Matching Approach</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Do</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Very Large Databases (VLDB)</title>
				<meeting>Intl. Conf. Very Large Databases (VLDB)</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Schema and Ontology Matching with COMA++ (Software Demonstration</title>
		<author>
			<persName><forename type="first">D</forename><surname>Aumüller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Do</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Massmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 24. ACM SIGMOD Intl. Conf. Management of Data</title>
				<meeting>24. ACM SIGMOD Intl. Conf. Management of Data</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Matching Large XML Schemas</title>
		<author>
			<persName><forename type="first">E</forename><surname>Rahm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Do</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Massmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM SIGMOD Record</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">4</biblScope>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Comparison of Schema Matching Evaluations</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Do</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Melnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 2. Intl. Workshop Web and Databases</title>
		<title level="s">LNCS</title>
		<meeting>2. Intl. Workshop Web and Databases</meeting>
		<imprint>
			<publisher>Springer Verlag</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">2593</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A Survey of Approaches to Automatic Schema Matching</title>
		<author>
			<persName><forename type="first">E</forename><surname>Rahm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Bernstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">VLDB Journal</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">4</biblScope>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Schema Matching and Mapping-based Data Integration</title>
		<author>
			<persName><forename type="first">Hong-Hai</forename><surname>Do</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<pubPlace>Germany</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Department of Computer Science, Universität Leipzig</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Dissertation</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
