FOAM – Framework for Ontology Alignment and Mapping Results of the Ontology Alignment Evaluation Initiative Marc Ehrig York Sure Institute AIFB Institute AIFB University of Karlsruhe University of Karlsruhe 76128 Karlsruhe, Germany 76128 Karlsruhe, Germany ehrig@aifb.uni-karlsruhe.de sure@aifb.uni-karlsruhe.de ABSTRACT 4. Similarity Aggregation, i.e. aggregate the multiple similarity assessments for one pair of entities into a single measure. This paper briefly introduces the system FOAM and its 5. Interpretation, i.e. use all aggregated numbers, a threshold underlying techniques. We then discuss the results returned and an interpretation strategy to propose the alignment from the evaluation. They were very promising and at the (align(e1)=‘ e2’). This may also include a user validation. same time clarifying. Concisely: labels are very important; 6. Iteration, i.e. as the similarity of one alignment influences structure helps in cases where labels do not work; the similarity of neighboring entity pairs; the equality is dictionaries may provide additional evidence; ontology propagated through the ontologies. management systems need to deal with OWL-Full. The Finally, we receive alignments linking the two ontologies. results of this paper will also be very interesting for other participants, showing specific strengths and weaknesses of This general process was extended to meet the mentioned our approach. requirements. • High quality results were achieved through a combination of a rule-based approach and a machine learning approach. Underlying individual rules such as, if the super-concepts are 1. PRESENTATION OF THE SYSTEM similar the entities are similar, have been assigned weights by a machine learnt decision tree [5]. Especially steps 1, 3 1.1 State, purpose, general statement and 4 were adjusted for this. Currently, our approach does In recent years, we have seen a range of research work on not make use of additional background knowledge such as methods proposing alignments [1; 2]. When we tried to apply dictionaries here. these methods to some of the real-world scenarios we address in • Efficiency was mainly achieved through an intelligent other research contributions [3], we found that existing alignment selection of candidate alignments in 2, the search step methods did not suit the given requirements: selection [4]. • high quality results; • User-interaction allows the user intervening during the • efficiency; interpretation step. By presenting the doubtable alignments • optional user-interaction; (and only these) to the user, overall quality can be • flexibility with respect to use cases; considerably increased. Yet this happens in a minimal • and easy adjusting and parameterizing. invasive manner. We wanted to provide the end-user with a tool taking ontologies • The system can automatically set its parameters according to as input and returning alignments (with explanations) as output a list of given use cases, such as ontology merging, meeting these requirements. versioning, ontology mapping, etc. The parameters also change according to the ontologies to align, e.g., big 1.2 Specific techniques used ontologies always require the efficient approach, whereas We have observed that alignment methods like QOM [4] or smaller ones do not [6]. PROMPT [2] may be mapped onto a generic alignment process • All these parameters may be set manually. This allows using (Figure 1). Here we will only mention the six major steps to the implementation for very specific tasks as well. clarify the underlying approach for the FOAM tool. We refer to • Finally, FOAM has been implemented in Java and is freely [4] for a detailed description. available, thus extensible. 1. Feature Engineering, i.e. select excerpts of the overall 1.3 Adaptations made for the contest ontology definition to describe a specific. This includes No special adjustments have been made for the contest. However, individual features, e.g. labels, structural features, e.g. some elements have been deactivated. Due to the small size of the subsumption, but also more complex features as used in benchmark and directory ontologies efficiency was not used, user- OWL, e.g. restrictions. interaction was removed for the initiative, and no specific use 2. Search Step Selection, i.e. choose two entities from the two case parameters were taken. A general alignment procedure was ontologies to compare (e1,e2). applied. 3. Similarity Assessment, i.e. indicate a similarity for a given The system used for the evaluation is a derivative of the ontology description (feature) of two entities (e.g., alignment tool used in last year’s contests I3Con [7] and EON- simsuperConcept(e1,e2)=1.0). OAC [8]. 72 2. RESULTS 2.1.4 Tests 248 to 266 All tests were performed on a standard notebook under Windows. These tests were the most challenging ones for our approach. FOAM has been implemented in Java with all its advantages and Labels and comments had been removed and different structural disadvantages. elements as well. The individual results of the benchmark ontologies were grouped. Precision reaches levels of 0.61 to 0.95. Recall is in the range of Further, one short section describes the testing of the directory 0.18 to 0.55. Unfortunately, the evaluation results did not show a and anatomy ontologies. The concrete results can be found in clear tendency of which structural element is most important for Section 6.3 of this paper. our alignment approach. It seems that the structural features can Iteration 6 1 2 3 4 5 Feature Search Step Similarity Similarity Inter- Engineering Selection Computation Aggregation pretation Input Output Figure 1: Ontology Alignment Process be exchanged to a certain degree. If one feature is missing, 2.1.1 Tests 101 to 104 evidence is collected from another feature. This is a nice result for These tests are basic tests for ontology alignment. our approach, as it indicates that the weighting scheme of the As the system assumes that equal URIs mean equal objects an individual features has been assigned correctly. One tendency that alignment of an ontology with itself always returns the correct could be identified was that with decreasing semantic information alignments. The alignment with and irrelevant ontology does not the found alignments become sparser. However, most of the return any results. Language generalization or restriction does not identified alignments were correct (see precision). affect the results. Our approach is robust enough to cope with We will briefly mention one test for which our approach these differences. Considering the differences which occur in real performed surprisingly well. Ontology 262 has practically world ontology modeling this is a very desirable feature. everything removed: no labels; no comments; no properties; no hierarchies. Nevertheless, some alignments have been identified. 2.1.2 Tests 201 to 210 The only information that remained was the links between Tests 201 through 210 focus on labels and comments of instances and their classes. By checking whether instance sets ontological entities. were the same (at least in terms of numbers, the instance labels actually differed), some concepts could be correctly aligned. The labels are the most important feature to identify an alignment. In fact, everything else can be neglected, if the labels indicate an alignment (e.g. also the comments in Test 203). Vice versa, 2.1.5 Tests 301 to 304 changed labels do seriously affect the outcomes. As our approach Ontologies 301 through 304 represent schemas modeled by other currently does not make use of any dictionaries, this is critical. institutions but covering the same domain of bibliographic Small changes as occurring through a different naming metadata. From the evaluation perspective, these real world convention can be balanced-out (Test 204 is only slightly worse ontologies combine the difficulties of the previous tests. than the ideal result). Synonyms or translations, possibly also Especially test case 301 differs both in terms of structure and with removed comments, lower especially recall considerably labels. Its labels generally use the term “has”, i.e. “hasISBN” (between 0.57 and 0.87). Nevertheless, the structure alignment instead of “ISBN”. This results in a rather low term similarity, as does find many of the alignments, despite the differing labels. For our approach does not split the strings into individual terms. the mentioned recalls, precision stays between 0.80 and 0.96. Combined with the differing structure this results in a rather low quality. Also for the other ontologies, both precision and recall do 2.1.3 Tests 221 to 247 not reach perfect levels. However, the results are satisfactory. In For all these tests the structure is changed. fact, preliminary tests using our semi-automatic approach showed that results could be noticeably increased with very little effort. However, as the labels remain, alignment is very good. Again, The question that will partially also be answered by this initiative, this indicates that labels are the main distinguishing feature. Only is what can maximally be reached. We hope to gain these insights smaller irritations result from the differing structures. In specific, by comparing our results to other participants’ results. more false positives are identified resulting in a precision of in the worst case “only” 0.94. Recall stays above0.97. According to the amount of structure also the processing time changes. Please note 2.2 Directory Ontologies that first results are returned almost instantaneously (less than 5 The directory ontologies are subsumption hierarchies. They could seconds). The times presented in the table represent the total time be easily processed. The evaluation results at the workshop will until the approach stops its search for alignments. presumably show the following main effects: Subsumption helps to identify some alignments correctly. Our missing usage of dictionaries misses some alignments. As this dataset only uses 73 subsumption, we cannot rely on the more complex ontology are one good underlying test base. For our approach, the directory features which our approach normally also tries to exploit. Thus, tests are less interesting, as they are restricted to subsumption results will not be ideal. hierarchies, rather than complete ontologies. Many of the specific advantages of our approach cannot be applied. It was very 2.3 Anatomy Ontologies unfortunate, that we could not run the anatomy tests. However, We were very interested in running our ontology alignment on the we think it is very important to have some real world ontologies, big real world anatomy ontologies. Especially for our efficient and we hope to test them at a latter point in time. approach, this would have been a deep evaluation. Unfortunately, For future work, it might be interesting to add some user- the ontologies were modeled in OWL-Full. Our approach is based interaction component to the tests. It would also be interesting to on the KAON2-infrastructure1 that only allows for OWL-DL. As not only have real world ontologies, but also see which alignment this interaction is very deep, it was not possible to change to an approach performs how for specific ontology alignment ontology environment capable of OWL-Full for the contest. We applications. could not run these tests. One result, for us, was the realization that ontologies will probably not stay in the clean world of OWL- 3.4 Comments on the measures DL. We will have to draw consequences from this. Precision and recall are without any doubt the most important measures. Some balancing measure needs to be added as well, as 3. GENERAL COMMENTS we have done with the f-measure. Otherwise, it is very difficult to draw conclusions on which approach worked best on which test 3.1 Comments on the results set. For future evaluation it would also be interesting to make use An objective comment on strengths or weakness requires the of some less strict evaluation measure, as presented in [9]. comparison with other participants, which will not be available before the workshop. However, some conclusions can be drawn. 4. CONCLUSION Strengths: In this paper, we have briefly presented an approach and a tool for • Labels or identifiers are important and help to align ontology alignment and mapping - FOAM. This included the most of the entities. general underlying process. Further, we have mentioned how specific requirements are realized with this tool. We then applied • The structure helps to identify alignments, if the labels FOAM to the test data. The results were carefully analyzed. We are not expressive. also discussed some future steps for both our own approach and • A more expressive ontology results in better the evaluation of alignments in general. alignments; an argument in favor of ontologies The main conclusions from the experiments were: compared to simple classification structures. • It is possible to create a good automatic ontology • The generally learnt weights have shown very good alignment approaches. results. • Labels are most important. Weaknesses: • Structure helps, if the labels are not expressive. • The approach cannot deal with consequently changed labels. Especially translations, synonyms, or other • Due to the importance of labels, our approach needs to conventions make it difficult to identify alignments. be extended with e.g. dictionaries in the background. • The system is bound to OWL-DL or lesser ontologies. • One general conclusion from the real world ontologies, was that an ontology system has to be able to also manage OWL-Full, as the real world does not provide 3.2 Discussions on the way to improve the the clean ontologies of OWL-DL. proposed system In general, the evaluation has shown us where our specific Possible improvements are directly related to the weaknesses in strengths and weaknesses are, and how we can continue on the previous section. improving. The results of other participants will give us some • Extending the handling of labels (strings) can further guidelines. presumably increase overall effectiveness. Usage of dictionaries is widely applied and will be added to our 5. REFERENCES approach as well. [1] Agrawal, R., Srikant, R.: On integrating catalogs. In: • The tight interconnection of FOAM with KAON2 Proceedings of the Tenth International Conference on restricts the open usage of it. Currently efforts are being the World Wide Web (WWW-10), ACM Press (2001) made to decouple them by inserting a general ontology 603–612 management layer. [2] Noy, N.F., Musen, M.A.: The PROMPT suite: interactive tools for ontology merging and mapping. 3.3 Comments on the test cases The benchmark tests have shown very interesting general results International Journal of Human-Computer Studies 59 on how the alignment approach behaves. These systematic tests (2003) 983–1024 [3] Ehrig, M., Haase, P., van Harmelen, F., Siebes, R., Staab, S., Stuckenschmidt, H., Studer, R., Tempich, 1 http://kaon2..semanticweb.org C.: The SWAP data and metadata model for 74 semantics-based peer-to-peer systems. In: Proceedings of MATES-2003. First German Conference on # Name Prec. Rec. F- Time Multiagent Technologies. LNAI, Erfurt, Germany, measure Springer (2003) 101 Reference 1.0 1.0 1.0 2.96 [4] Ehrig, M., Staab, S. QOM - quick ontology mapping. alignment In F. van Harmelen, S. McIlraith, and D. Plexousakis, 102 Irrelevant - - - 207.14 editors, Proceedings of the Third International ontology Semantic Web Conference (ISWC2004), LNCS, pages 103 Language 1.0 1.0 1.0 180.95 683–696, Hiroshima, Japan, 2004. Springer. generalization [5] Ehrig, M., Staab, S., Sure, Y. Supervised learning of 104 Language 1.0 1.0 1.0 177.63 an ontology alignment process. In Proceedings of the restriction Workshop on IT Tools for Knowledge Management 201 No names 0.90 0.65 0.75 175.99 Systems: Applicability, Usability, and Benefits (KMTOOLS) at 3. Konferenz Professionelles 202 No names, no 0.85 0.57 0.68 176.59 Wissensmanagement, Kaiserslautern, Germany, April comments 2005. 203 No comments 1.0 1.0 1.0 174.21 [6] Ehrig, M., Sure, Y. Adaptive Semantic Integration. In 204 Naming 0.96 0.93 0.94 185.09 Proceedings of the Workshop on Ontologies-based conventions techniques for DataBases and Information Systems at 205 Synonyms 0.80 0.67 0.73 174.46 VLDB 2005, Trondheim, Norway, August 2005. 206 Translation 0.93 0.76 0.84 172.15 [7] Hughes, T. Information Interpretation and Integration 207 0.95 0.78 0.86 167.89 Conference (I3CON) at PerMIS-2004, Gaithersburg, 208 0.96 0.87 0.92 164.20 MD, USA, August 2004. 209 0.81 0.57 0.67 168.63 [8] Sure, Y., Corcho, O., Euzenat, J., Hughes, T. (editors), 3rd International Workshop on Evaluation of Ontology 210 0.92 0.67 0.77 164.31 based Tools (EON2004), Volume 128, CEUR-WS 221 No specialization 1.0 1.0 1.0 172.92 Publication. Workshop at the 3rd International 222 Flattened 1.0 1.0 1.0 127.63 Semantic Web Conference (ISWC 2004), 7th-11th hierarchy November 2004, Hiroshima, Japan 223 Expanded 0.99 1.0 0.99 142.70 [9] Ehrig, M., Euzenat, J. Relaxed Precision and Recall for hierarchy Ontology Alignment. In Proceedings of the Integrating 224 No instance 1.0 0.99 0.99 42.09 Ontologies Workshop at K-Cap ’05, Banff, Alberta, 225 No restrictions 1.0 1.0 1.0 171.13 Canada, October 2005. 228 No properties 1.0 1.0 1.0 112.60 6. RAWRESULTS 230 Flattened classes 0.94 1.0 0.97 137.60 6.1 Link to the system and parameters file 232 1.0 0.99 0.99 45.50 The FOAM system may be downloaded at 233 1.0 1.0 1.0 110.57 http://www.aifb.uni-karlsruhe.de/WBS/meh/foam. 236 1.0 1.0 1.0 12.77 The system is continuously improved, so results may slightly 237 1.0 1.0 1.0 87.94 differ from the results provided in this paper. The interested 238 1.0 1.0 1.0 106.29 reader is encouraged to download, test, and use the system. 239 0.94 1.0 0.97 73.14 6.2 Link to the set of provided alignments (in 240 0.95 0.97 0.97 84.63 align format) 241 1.0 1.0 1.0 11.15 The results are also available through the website: 246 0.94 1.0 0.97 51.14 http://www.aifb.uni-karlsruhe.de/WBS/meh/foam/results.zip. 247 0.94 1.0 0.97 70.27 6.3 Matrix of results 248 0.85 0.48 0.62 251.65 The following results were achieved in the evaluation runs. As 249 0.73 0.46 0.57 150.39 FOAM only allows identifying equality relations, precision and 250 0.95 0.55 0.69 114.00 recall only refer to these. 251 0.88 0.41 0.56 132.39 252 0.62 0.34 0.44 145.59 75 253 0.80 0.44 0.57 83.96 262 0.78 0.21 0.33 21.70 254 0.75 0.18 0.29 103.56 265 0.85 0.38 0.52 70.50 257 0.76 0.48 0.59 28.43 266 0.63 0.36 0.46 81.68 258 0.86 0.39 0.53 133.79 301 BibTeX/MIT 0.78 0.35 0.48 23.43 259 0.75 0.45 0.56 149.39 302 BibTeX/UMBC 0.88 0.74 0.80 21.31 260 0.85 0.38 0.52 71.21 303 Karlsruhe 0.84 0.90 0.87 61.08 261 0.61 0.33 0.43 82.89 304 INRIA 0.94 0.97 0.95 43.32 76