Exploiting the UMLS Metathesaurus in the Ontology Alignment Evaluation Initiative Ernesto Jiménez-Ruiz, Bernardo Cuenca Grau, and Ian Horrocks Department of Computer Science, University of Oxford {ernesto,berg,ian.horrocks}@cs.ox.ac.uk Abstract. In this paper we describe how the UMLS Metathesaurus—the most comprehensive effort for integrating medical thesauri and ontologies—is being used within the context of the Ontology Alignment Evaluation Initiative (OAEI). We also present the obtained results in the Large BioMed track of the OAEI 2011.5 campaign where the reference alignments are based on UMLS. Finally, we propose a new reference alignment based on the harmonisation of the outputs of the systems participating in the OAEI Large BioMed track. 1 Introduction The Ontology Alignment Evaluation Initiative1 (OAEI) is an international campaign for the systematic evaluation of ontology matching systems —software programs capable of finding correspondences (called alignments) between the vocabularies of a given set of input ontologies [22, 7, 9, 23]. The matching problems in the OAEI are organised in several tracks, with each track involving different kinds of test ontologies [7]. The ontologies in the largest test case in the OAEI 2011 contain only 2,000–3,000 classes; however, ontology matching tools have significantly improved in the last few years and there is a need for more challenging and realistic matching problems for which suitable reference alignments exist [22, 7]. UMLS-Metathesaurus (UMLS) [1] is currently the most comprehensive effort for integrating medical thesauri and ontologies, including the National Cancer Institute Thesaurus (NCI) [12, 11], the Foundational Model of Anatomy (FMA) [19] and the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) [24], which are large-scale and semantically rich ontologies. NCI, FMA and SNOMED CT are grad- ually superseding the existing medical classifications and are becoming core platforms for accessing, gathering, and sharing biomedical knowledge and data. Hence, matching such large ontologies represents a very interesting challenge for the OAEI initiative. In this paper we describe how the UMLS correspondences between NCI, FMA and SNOMED CT have been used as reference alignments for the new Large BioMed track2 in the OAEI initiative. Furthermore we present the results obtained in the OAEI 2011.5 campaign for this track and we propose a new reference alignment based on the harmonisation of the outputs of the participating ontology matching systems. 1 http://oaei.ontologymatching.org/ 2 http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/ Table 1. The notion of “Joint” in the MRCONSO file from the UMLS distribution. CUI Language Source Entity Joint FMA Set of joints C0022417 ENG SNOMED CT Joint structure Joint NCI Articulation Table 2. UMLS-based alignment between FMA, NCI and SNOMED CT for the notion of “Joint”. Ontology pair Generated Alignments h1, FMA:Joint, NCI :Joint, 1.0, equivi h2, FMA:Joint, NCI :Articulation, 1.0, equivi FMA ∼ NCI h3, FMA:Set of joints, NCI :Joint, 1.0, equivi h4, FMA:Set of joints, NCI :Articulation, 1.0, equivi h5, FMA:Joint, SNOMED:Joint structure, 1.0, equivi FMA ∼ SNOMED CT h6, FMA:Set of joints, SNOMED:Joint structure, 1.0, equivi h7, SNOMED:Joint structure, NCI :Joint, 1.0, equivi SNOMED CT ∼ NCI h8, SNOMED:Joint structure, NCI :Articulation, 1.0, equivi 2 The UMLS-based reference alignments Ontology alignments are often conceptualised as tuples with the form hid, e1 , e2 , n, ρi, where id is a unique identifier for the mapping, e1 , e2 are entities in the vocabulary of the integrated ontologies, n is a numeric confidence measure between 0 and 1, and ρ is a relation between e1 and e2 , typically subsumption (i.e., e1 is more specific than e2 ) and equivalence (i.e., e1 and e2 are synonyms) [8]. The OAEI initiative uses an RDF format to represent the alignments3 [6] containing the aforementioned elements. Alternatively, OAEI alignments are also represented as OWL 2 subclass and equivalence axioms with the mapping identifier (id) and confidence (n) added as OWL 2 annotation axioms [4]. Although the standard UMLS distribution does not directly provide sets of align- ments (in the OAEI sense) between the integrated ontologies, it is relatively straight- forward to extract alignment sets from the information provided in the distribution files [15]. Concretely, we have processed the MRCONSO4 file, which contains every entity in UMLS together with its concept unique identifier (CUI), its source vocabulary (e.g. FMA), its language (e.g. English), and other attributes not relevant for the OAEI. Table 1 shows an excerpt from the MRCONSO file associated to the notion of “Joint”. It follows from Table 1 that the notion of “Joint” is shared by FMA, SNOMED CT and NCI. In particular, FMA contains the entities Joint and Set of joints, NCI the entities Articulation and Joint, and SNOMED CT only the entity Joint structure. All these entities have been annotated with the same CUI C0022417 and therefore, according to UMLS’s intended meaning, they are synonyms. Then, for each pair of entities e1 and e2 from different sources and annotated with the same CUI, we have 3 http://alignapi.gforge.inria.fr/format.html 4 http://www.ncbi.nlm.nih.gov/books/n/nlmumls/ch03/ Table 3. UMLS-based alignments Ontology pair Original alignments Unsatisfiabilities Refined alignments FMA ∼ NCI 3,024 655 2,898 FMA ∼ SNOMED CT 9,072 6,179 8,111 SNOMED CT ∼ NCI 19,622 20,944 18,322 Table 4. Results for the Large BioMed track in the OAEI 2011.5 campaign. Refined UMLS Original UMLS System Size Unsat. Time (s) P R F P R F LogMap 2,658 9 0.868 0.796 0.830 0.875 0.769 0.819 126 GOMMAbk 2,983 17,005 0.806 0.830 0.818 0.826 0.815 0.820 1,093 GOMMAnobk 2,665 5,238 0.845 0.777 0.810 0.862 0.759 0.807 960 LogMapLt 3,466 26,429 0.675 0.807 0.735 0.695 0.796 0.742 57 5 CSA 3,607 >10 0.514 0.640 0.570 0.528 0.629 0.574 14,068 5 Aroma 4,080 >10 0.467 0.657 0.546 0.480 0.647 0.551 9,503 5 MapSSS 2,440 33,186 0.426 0.359 0.390 0.438 0.353 0.391 >10 generated the corresponding (equivalence) UMLS-based alignments with a confidence value of 1.0 (see Table 2). The integration of new resources in UMLS combines expert assessment and so- phisticated auditing protocols [1, 3, 10]. However, it has been noticed that UMLS-based alignments lead to a large number of unsatisfiable classes if they are represented as OWL 2 axioms and integrated with the input ontologies [15, 14]. For example the in- tegration of SNOMED CT and NCI via UMLS-based alignments leads to more than 20,000 unsatisfiable classes. To address this problem, we have presented in [14] a re- finement of the (original) UMLS-based alignments that do not lead to (many) unsatis- fiable classes (see Table 3). This refinement is based on the alignment repair module of the ontology matching system LogMap [14, 16]. 3 Results of the Large BioMed track in the OAEI 2011.5 In this section we briefly present the obtained results in the Large BioMed track of the OAEI 2011.5 campaign.5 We have only evaluated the FMA-NCI matching problem, where the used versions of FMA and NCI contains 78,989 and 66,724 classes, respec- tively. The original and refined UMLS-based alignments (see Table 3) has been used as reference to evaluate the efficiency of participating ontology matching systems. Table 4 summarizes the obtained results where systems has been ordered according to the F-measure against the refined UMLS-based reference alignment. LogMapLt —a simple ontology matcher—has been used as a base-line. Besides precision (P), recall (R), F-measure (F) and runtimes we have also evaluated the coherence of the align- ments when reasoning together with the input ontologies.6 Note that we have evaluated 5 http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2011.5/ 6 We have used the OWL 2 reasoner HermiT [20] Fig. 1. Harmonised alignments for the FMA-NCI matching problem of the OAEI 2011.5. GOMMA [17] with two different configurations. GOMMAbk uses UMLS-based back- ground knowledge, while GOMMAnobk has this feature deactivated. GOMMA (with its two configurations) and LogMap are a bit ahead in terms of F- measure with respect to Aroma [5], CSA [25] and MapSSS [2], which could not top the results of the base-line LogMapLt. GOMMAbk obtained the best results in terms of re- call, while LogMap provided the best results in terms of precision and F-measure. The use of the original UMLS-based reference alignment did not imply important variations. Since the original set contains more mappings, precision and recall slightly increases and decreases, respectively. It is worth mentioning, however, that GOMMAbk improves its results when comparing with the original UMLS-based reference alignment and pro- vides the best F-measure. Regarding mapping coherence, only LogMap generated an ‘almost’ clean output in all three tasks. Although GOMMAnobk also provides highly precise output correspon- dences, they lead to a huge amount of unsatisfiable classes. 4 Towards a silver standard reference alignment The original UMLS-based reference alignment, as shown in Section 2, contains errors (i.e. lead to large number of unsatisfiable classes when integrated with the input ontolo- gies). On the other hand, the refined UMLS-based reference alignment is based on the (incomplete) alignment repair techniques of the ontology matching systems LogMap [14, 16], which may fail to detect and discard the appropriate alignments. Thus, in or- der to turn the extracted UMLS-based reference alignments into an agreed-upon gold standard expert assessment would be needed, which is almost unfeasible for large align- ment sets. We have opted to move towards a silver standard by harmonising the outputs of different matching tools over the relevant ontologies. Similar silver standards have been developed for named entity recognition problems [21, 13]. We have harmonised the outputs of the systems participating in the OAEI 2011.5 FMA-NCI matching problem. Each system has been associated a weighted vote based on its precision w.r.t. the refined UMLS-based reference alignment (see Table 4). For ex- ample, LogMap and MapSSS have been associated the weights 0.868 and 0.426, respec- tively. Note that systems participating with two versions (e.g. GOMMA and LogMap) have been only considered once in the voting process. Figure 1 summarises the evolution of the F-measure, Precision and Recall for the harmonised alignment depending on the minimum required votes. For example the har- monised alignment set requiring 4.0 points of weighted votes has a precision of 0.971 and a recall of 0.369 w.r.t. the refined UMLS-based reference alignment. As expected precision increases and recall decreases as the required votes increase. We have selected the harmonised alignment set with the highest F-measure (0.91) as the “first” silver standard of the FMA-NCI matching problem. This set contains 2,890 alignments that have been ”at least” voted by two systems with weight 0.90. Note that this harmonised alignment has not been yet refined and it is known to lead to more than 14,000 unsatisfiable classes when integrated with FMA and NCI. 5 Future work In the OAEI 2012 campaign7 we also intend to evaluate the SNOMED-NCI and FMA- SNOMED matching problems using the correspondent UMLS-based reference align- ments (see Table 3). We will also create harmonised silver standards alignments and we will evaluate the participating systems against them. This comparison will be very useful to analyse how different a system is with respect to the others. Finally, we also intend to combine different reasoning and diagnosis tools such as ALCOMO8 [18] to generate error-free refinements of both the UMLS-based reference alignments and the harmonised silver standards. Acknowledgements This work was supported by the Royal Society, the EU FP7 project SEALS and by the EPSRC projects ConDOR, ExODA, LogMap and Score!. We also thank the organisers of the OAEI evaluation campaigns for providing test data and infrastructure. References 1. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic acids research 32 (2004) 2. Cheatham, M.: MapSSS results for OAEI 2011. In: Proceedings of the 6th Ontology Match- ing Workshop. pp. 184–189 (2011) 3. Cimino, J.J., Min, H., Perl, Y.: Consistency across the hierarchies of the UMLS semantic network and metathesaurus. J of Biomedical Informatics 36(6) (2003) 7 http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/2012/ 8 http://web.informatik.uni-mannheim.de/alcomo/ 4. Cuenca Grau, B., Horrocks, I., Motik, B., Parsia, B., Patel-Schneider, P., Sattler, U.: OWL 2: The next step for OWL. Journal of Web Semantics 6(4), 309–322 (2008) 5. David, J., Guillet, F., Briand, H.: Association Rule Ontology Matching Approach. Journal of Semantic Web Information Systems 3(2), 27–49 (2007) 6. David, J., Euzenat, J., Scharffe, F., dos Santos, C.T.: The Alignment API 4.0. Semantic Web 2(1), 3–10 (2011) 7. Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontology Alignment Evaluation Initiative: six years of experience. J Data Semantics (2011) 8. Euzenat, J.: Semantic precision and recall for ontology alignment evaluation. In: Proc. of the 20th International Joint Conference on Artificial Intelligence, IJCAI. pp. 348–353 (2007) 9. Euzenat, J., Ferrara, A., van Hage, W.R., Hollink, L., Meilicke, C., Nikolov, A., Ritze, D., Scharffe, F., Shvaiko, P., Stuckenschmidt, H., Sváb-Zamazal, O., Trojahn dos Santos, C.: Results of the Ontology Alignment Evaluation Initiative 2011. 6th OM workshop (2011) 10. Geller, J., Perl, Y., Halper, M., Cornet, R.: Special issue on auditing of terminologies. Journal of Biomedical Informatics 42(3), 407–411 (2009) 11. Golbeck, J., Fragoso, G., Hartel, F.W., Hendler, J.A., Oberthaler, J., Parsia, B.: The National Cancer Institute’s Thésaurus and Ontology. J. Web Sem. 1(1), 75–80 (2003) 12. Hartel, F.W., de Coronado, S., Dionne, R., Fragoso, G., Golbeck, J.: Modeling a description logic vocabulary for cancer research. Journal of Biomedical Informatics 38(2) (2005) 13. Jiménez-Ruiz, E., Rebholz-Schuhmann, D., Lewin, I.: Exploitation of cross-references be- tween terminological resources within the CALBC context. In: 1st Intl. Workshop on Ex- ploiting Large Knowledge Repositories, DEXA Workshops (2011) 14. Jiménez-Ruiz, E., Cuenca Grau, B.: LogMap: Logic-based and Scalable Ontology Matching. In: Proc. of the 10th International Semantic Web Conference (ISWC). pp. 273–288 (2011) 15. Jiménez-Ruiz, E., Cuenca Grau, B., Horrocks, I., Berlanga, R.: Logic-based assessment of the compatibility of UMLS ontology sources. J Biomed. Sem. 2 (2011) 16. Jiménez-Ruiz, E., Cuenca Grau, B., Zhou, Y., Horrocks, I.: Large-scale interactive ontology matching: Algorithms and implementation. In: Proc. of ECAI (2012) 17. Kirsten, T., Gross, A., Hartung, M., Rahm, E.: GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. Journal of Biomedi- cal Semantics 2, 6 (2011) 18. Meilicke, C.: Alignment Incoherence in Ontology Matching. Ph.D. thesis, University of Mannheim, Chair of Artificial Intelligence (2011) 19. Mejino Jr., J.L.V., Rosse, C.: Symbolic modeling of structural relationships in the founda- tional model of anatomy. In: Proc. of First International Workshop on Formal Biomedical Knowledge Representation (KR-MED 2004). pp. 48–62 (2004) 20. Motik, B., Shearer, R., Horrocks, I.: Hypertableau Reasoning for Description Logics. Journal of Artificial Intelligence Research 36, 165–228 (2009) 21. Rebholz-Schuhmann, D., Jimeno Yepes, A., Van Mulligen, E.M., Kang, N., Kors, J., Mil- ward, D., Corbett, P., Buyko, E., Beisswanger, E., Hahn, U.: CALBC Silver Standard Corpus. J Bioinform Comput Biol. pp. 163–179 (2010) 22. Shvaiko, P., Euzenat, J.: Ten challenges for ontology matching. In: On the Move to Mean- ingful Internet Systems (OTM Conferences) (2008) 23. Shvaiko, P., Euzenat, J.: Ontology matching: State of the art and future challenges. IEEE Trans. Knowl. Data Eng. 99 (2011) 24. Spackman, K.: SNOMED RT and SNOMED CT. Promise of an international clinical ontol- ogy. M.D. Computing 17 (2000) 25. Tran, Q.V., Ichise, R., Ho, B.Q.: Cluster-based similarity aggregation for ontology matching. In: Proc. of 6th Ontology Matching Workshop. pp. 142–147 (2011)