=Paper=
{{Paper
|id=None
|storemode=property
|title=Using a Reference Ontology with Semantic Similarity in Ontology Alignment
|pdfUrl=https://ceur-ws.org/Vol-897/session4-paper19.pdf
|volume=Vol-897
|dblpUrl=https://dblp.org/rec/conf/icbo/CrossSM12
}}
==Using a Reference Ontology with Semantic Similarity in Ontology Alignment==
Using a Reference Ontology with Semantic Similarity in Ontology Alignment Valerie Cross*, Pramit Silwal, and David Morell Computer Science and Software Engineering, Miami University Oxford OH USA ABSTRACT The contribution of this research is the use of a reference The current use of semantic similarity with a reference ontology ontology and semantic similarity measurement within the in ontology alignment (OA) systems is reviewed. An extended matcher is described that incorporates semantic similarity with the reference ontology to improve the OA process. Section 2 use of a reference ontology. This matcher has been implemented overviews semantic similarity and its use with background using as a basis AgreementMaker’s mediating matcher. Specific knowledge in existing OA systems. Section 3 first describes experiments using the OAEI anatomy track are performed using the a recent experiment to use different biomedical ontologies Uberon ontology as the reference ontology. The results of these experiments are compared to the OAEI 2011 results for the anatomy as reference ontologies without using semantic similarity to track. These show that semantic similarity measures can be useful improve alignment results for the OAEI anatomy track. for discovering mappings missed by the original mediating matcher. Section 4 presents the proposed method that extends the The use of semantic similarity with a reference ontology should be previous approaches with semantic similarity measurement. further investigated in the effort to improve the OA process. The experiments results using this method on the OAEI anatomy track are described and compared with those of one 1 INTRODUCTION the experiments described in section 3. Finally, conclusions Ontology alignment (OA) systems typically produce a set and a summary of the research efforts as well as future re- MST of mapping pairs (si, ti) between a source ontology OS search plans are presented in section 5. and a target ontology OT with each pair having a similarity degree dsim in (0, 1]. The mapping indicates that the concept 2 REFERENCE ONTOLOGY WITH si in OS is similar to the concept ti in OT with dsim. Most SEMANTIC SIMILARITY matchers in OA systems rely on only the internal infor- mation available within the ontologies to be aligned. Exter- Much research is being undertaken to use background nal knowledge sources are increasingly being used to im- knowledge sources to aid the ontology alignment process. prove the alignment process (Shvaiko & Euzenat, 2012). A Many forms of background knowledge have been used such standard approach has been to create a matcher that uses a as partial alignments, existing alignments, domain specific reference ontology or creates a lexicon using a thesaurus. corpora, web pages, linked data, upper ontologies and do- The main operation typically is some function of the overlap main specific ontologies (Shavaiko & Euzenat, 2012). between the synonym sets found in the reference ontology However, the use of simple background knowledge sources or the lexicon for the source and target concepts. The prob- such as thesauri, for example, WordNet, has been wide- lem occurs when no overlap between the two sets exists. spread for some time. More recently research has examined Semantic similarity measures can be used to find a possible the use of domain specific ontologies especially in the med- mapping from a source concept to a target concept based on ical domain or a collection of ontologies selected from the the similarity between the source’s identified concept and Semantic Web. These ontologies have been referred to as the target’s identified concept in the reference ontology. reference (Sabou et al., 2008), intermediate (Gross et al., Measuring similarity between a source concept s and a 2011) or mediating ontologies (Cruz et al., 2011). target concept t in the two different ontologies can then be The outcome of several OAEI competitions has not been translated into finding corresponding bridge concepts bS and consistent when it comes to OA systems using background bT in the reference ontology and then measuring the degree knowledge (Shvaiko & Euzenat, 2012). For example, in the of similarity between bS and bT. Several important issues 2007 and 2008 OAEI competitions, the OA systems utiliz- to using background knowledge sources have been identi- ing background knowledge were undoubtedly the best per- fied (Shvaiko & Euzenat, 2012). For example the selection forming. The best performing OA system in 2009, however, of the reference ontology should ensure that it has suitable did not use any background knowledge. In 2011 the best coverage of the ontologies being aligned. Another im- performing systems in the anatomy track made use of do- portant consideration is the means of finding the corre- main specific ontologies (Euzenat et al., 2011). For the OA sponding entities bS and bT in the reference ontology. systems actually competing in the OAEI competition, the background knowledge sources are manually selected. * To whom correspondence should be addressed: crossv@muohio.edu 1 Cross et al. 2.1 Semantic Similarity in Ontologies between the concepts from the two ontologies being aligned. In ontology alignment, numerous similarity measures are The systems are presented in chronological order of their used to determine the similarity between concepts in two references. A complete overview of the state of the art for different ontologies. The purpose is to create a list of con- OA systems can be found in (Euzenat et al., 2011). cept mappings between the two ontologies. Semantic simi- larity, however, unlike similarity measurement typically 2.3.1 OLA (Euzenat and Valtchev, 2003). A modified ver- used within OA, measures the similarity between two con- sion of the Wu-Palmer semantic similarity measure (Wu and cepts within a single ontology. Due to space limitations, Palmer, 1989) is used in determining lexical similarity be- only a historical review of such measures is presented. The- tween a pair of identifiers which are each first converted se measures or slight variations represent those used in the into a set of atomic terms. Next pairs of terms, one from OA systems described in the next section. A detailed over- each set, are compared using WordNet. The pair’s similarity view of current semantic similarity measures and research is calculated as the ratio between the depth of the most spe- can be found in (Yu, 2010) and (Cross and Yu, 2012). cific common hypernym (ancestor in the WordNet hierar- The earliest semantic distance measures were developed chy) and the sum of depth of each term. Then a degree of for use in semantic networks and were simple path distance proximity between the sets of terms is calculated. measures, i.e., the count of the number of edges or nodes, 2.3.2 Imapper (Su, 2004). The similarity value determined between two concepts (Rada et al., 1989). This simple for the mapping between two concepts may be increased path-based distance has been used in ontologies viewed as using the distance of the two concepts in WordNet. The graphs. Wu and Palmer (Wu & Palmer, 1994) improved concepts are found in WordNet using their descriptive la- upon the early path-based semantic distance measures by bels. A simple path based semantic distance between two proposing a semantic similarity measure between two con- terms x and y found in WordNet is used. If they belong to cepts that is the ratio of twice the distance of their lowest the same synset in WordNet, then the path distance is 1. common subsumer to the root concept and the sum of the Otherwise, the path length is determined by the number of distance of each concept from the root concept. nodes rather than the links in the path so that the length be- Another approach to semantic similarity is based on us- tween sibling nodes is 3. If no path can be found between ing a measure of information content (IC) for a concept. IC them (they exist in unconnected WordNet subontologies), measures how specific a concept is within a given ontology. then they are unrelated. Their similarity value is, therefore, The more specific a concept is the higher its information not strengthened. content, the more general the lower its IC. IC has been de- 2.3.3 ASMOV (Jean-Mary et al., 2009). Semantic similarity termined by either a corpus-based (Resnik, 1995) or an on- measures may be used in determining the lexical similarity tology-based method (Seco et al., 2004). The corpus-based between concept labels. If the string labels for the source IC uses an external resource such as an associated corpus and target concepts are identical, the lexical similarity is 1.0. for the problem domain and is determined using the nega- If they are not identical and an external ontology such as tive log of the probability of the concept with respect to the WordNet or UMLS is available, then various thesaurus rela- corpus. The ontology-based IC method simply uses the tionships are used. If the source label string is in the syno- structure of ontology itself to determine a concept’s IC val- nym set of the target label, then their lexical similarity is set ue. It is a function of the number of descendents of a con- to 0.99. If one is an antonym of the other, then their lexical cept and the total number of concepts in the ontology. similarity is set to 0.0. If neither of those relationships hold The first IC-based semantic similarity measure is de- and if both string labels exist in the external ontology, their fined as the maximum information content two concepts lexical similarity is set to the Lin (1998) semantic similarity share (Resnik, 1995). The common ancestor of the two con- measure between the two. Otherwise, the minimum inclu- cepts having the maximum IC value must be found and its sion measure between the two sets of tokens is used. IC value is taken as the semantic similarity between the two. 2.3.4 CIDER (Gracia & Mena, 2008). The alignment pro- An improvement to Resnik’s measure was proposed by Lin cess uses a modified version of a sense semantic similarity (1998). It is formulated as the ratio of twice the maximum measure to evaluate similarity between the possible senses shared information content between the two concepts and of a keyword and their synonyms to perform disambigua- the sum of each concept’s individual information content. tion. The techniques used in CIDER are adapted from the PowerMap WordNet based algorithm (Lopez et al., 2006). 2.2 OA Systems Using Semantic Similarity 2.3.5 UFOme (Pirro and Talia, 2010). A set of matchers, Here a brief survey of only OA systems using a background many of which have already been developed previously for knowledge source, WordNet, UMLS, or both as a reference numerous OA systems, are integrated into UFOme. Its ontology with semantic similarity is presented. They apply WordNet matcher also uses the Lin semantic similarity standard semantic similarity measures or their variations measure between WordNet synsets when the concepts do between the concepts within the reference ontology and not not map to the same synset in WordNet. 2 Using a Reference Ontology with Semantic Similarity in Ontology Alignment 3 RECENT EXPERIMENTS WITH 3.2 AgreementMaker Mediating Matcher REFERENCE ONTOLOGIES For OAEI 2011, AgreementMaker (Cruz et al., 2011) added Two very recent experiments using reference ontologies to a new matcher, the mediating matcher (MM). The mediating improve the alignment mapping process are presented. In matcher inputs two ontologies to be aligned and a reference (Gross et al., 2011), the reference ontology is called an in- ontology and then uses AgreementMaker’s BSMlex (base termediate ontology and in (Cruz et al., 2011) it is called a similarity matcher with lexicon) to match the MA and the mediating ontology. Both follow a very similar approach. HA ontologies with the reference ontology. The BSMlex The differences exist in the alignment methods used to pro- matcher is calculates the similarity between two concepts by duce the mappings from the source and target ontologies to comparing all the strings associated with those two con- the reference ontology and what aggregation method of sim- cepts, that is, the concept name, label, and comments. ilarity values are used to produce the final mapping from a AgreementMaker’s approach is similar to that in (Gross source concept to a target concept through a reference con- et al., 2011). Both require an exact match on the bridge con- cept. Neither incorporates semantic similarity measurement cept, i.e., bS = bT. It differs in the sophistication of the between concepts within the reference ontology matcher used to find the bridge concepts for the source and target ontologies in the reference ontology, i.e., BSMlex al- 3.1 Composition-Based Matching gorithm versus linguistic trigram similarity. Based on the In (Gross et al., 2011) the OA system uses intermediate on- success of the Uberon ontology as a reference ontology in tologies OI to composes mappings MSI from the source OS (Gross et al., 2011), AgreementMaker also chose to use it as to OI with mappings MIT from OI to the target OT to produce the mediating ontology for the OAEI 2011 anatomy track. a set of mappings MST from the OS to the OT. More formal- The BSMlex also used Uberon to develop its lexicon in ly, the final alignment result is defined as matching the MA and HA ontologies to Uberon to take ad- vantage of the extra synonyms defined in Uberon. MST = {(cS, cT, aggSim (mapSimSI, mapSimIT)) | In the reported OAEI 2011 results (Euzenat et al., 2011), cSOS, cI OI, cT OT : AgreementMaker had the best performance with respect to (cS, cI , mapSimSI,)MSI ( cI, cT ,mapSimIT,)MIT} (1) F-measure (91.7%). These results are better than those in (Gross et al., 2011). AgreementMaker used only the one The aggregation operator aggSim combines the mapping reference ontology Uberon while the best results in (Gross similarities for MSI and MIT. Different operators could be et al., 2011) were based on merging results using four dif- used. They state average was used. They suggest that MSI ferent reference ontologies. Another difference is that and MIT could be existing mappings such as those in AgreementMaker’s final mappings are determined by a hi- BioPortal. MSI and MIT in their experiments were deter- erarchically arrangement of its Linear Weighted Combina- mined using linguistic trigram similarity between concept tion (LWC) matchers. A single combined alignment is pro- names and synonyms with a threshold of 0.8. In effect, two duced using mapping quality measures to choose the best simplified ontology alignments were first performed to cre- mappings from each matcher, of which its MM is only one. ate the mappings MSI and MIT before the composition-based Each matcher produces a similarity matrix between the mapping is done. One point not clear is the method if mul- source concepts and the target concepts. A LWC takes as tiple cI exist, i.e., if 1-1 mapping is not enforced. The meth- input two or more matchers’ similarity matrix and produces od to produce intermediate mappings may enforce 1-1 map- a weighted aggregation of them. The output is another ma- pings. An optional step tries to find direct mappings from trix mapping the source and target concepts. the set of unmapped concepts in OS to the set of unmapped AgreementMaker’s OAEI 2011 final matcher used three concepts in OT. These two sets are matched against each different LWCs. LWC1 produces a weighted average of the other using a string similarity match algorithm. similarity matrices for the LSM (Lexical Similarity Match- They evaluate the proposed composition approach using er) and the MM. LWC2 produces a weighted average for the the Adult Mouse Anatomy ontology (MA) and the anatomi- PSM (Parametric String-based Matcher) and the VMM cal part (human anatomy HA) of the NCI Thesaurus, the (Vector-based Multi-word Matcher). LWC3 determines the OAEI anatomy track. The four reference ontologies are final confidence factor for each alignment as a weighted FMA, Uberon, RadLex, and UMLS, all late 2010 versions. average of the LWC1 and LWC2 similarity matrices. Separate experiments were done for each of the ontologies. Only F-measures are reported. Uberon produced the best 4 MEDIATING MATCHER + SEMANTIC results ( F-measure of 88.2%) with the two step process 1) SIMILARITY produce mappings first using Uberon as the intermediate ontology and 2) add direct mappings between the MA and This proposed method of combining a reference ontology HA. Their paper points out that none of the previous ap- with semantic similarity builds on the work of early OA proaches participating in OAEI 2010 anatomy track exceed- systems as described section 2.2. The recent uses of com- ed an 87% F-measure. . position-based mapping and a mediating matcher described 3 Cross et al. in section 3.1 and 3.2, respectively, also motivate this work. To be consistent with previous work in section 3, the Neither OA system presented in those two sections, howev- OAEI anatomy track was used. Its reference alignment con- er, makes use of semantic similarity measures with a refer- tains 1516 mappings. Table 1 shows the results of the exper- ence ontology. Our research extends AgreementMaker’s iments which are divided into two groups. First, only the mediating matcher and has produced a new mediating mappings from the MM are compared to only those from matcher that incorporates semantic similarity measurement the MMSS with varying thresholds as listed. The results of (MMSS) between the corresponding bridge concepts in the the first group are listed in the rows before the row labeled mediating ontology. First the extension is described and OAEI 2011. AgreementMaker’s LWC matchers are not af- then the experimental results are presented. fecting these results. The second group compares the two First AgreementMaker’s MM is used in a first pass to different mediating matchers with the full OAEI 2011 produce the mappings between the source and target con- AgreementMaker LWC matchers as described at the end of cepts where there is an exact match on the bridge concepts section 3.2. The second group investigates the interaction in the mediating ontology, i.e., bS = bT. When an exact between the mappings of the MMSS and those produced by match occurs, MM produces a mapping between s and t as the other OAEI 2011 matchers as well as the effects of its LWC matchers combining the various mappings results. MST = {(s, t, mapSimSI * mapSimTI) | sOS, bS , bT OI, tOT : For the first group, the MMSS with no threshold had the (s,bS,mapSimSI,)MSI (t,bT,mapSimTI,)MTI bS=bT} (2) best recall but the worst precision. As the threshold in- creases the MMSS is still able to find more correct map- Here MSI is the mapping from the source O S to the interme- pings than the MM and improve its precision. Of the nine diate OI using BSMlex. Similarly, MTI is the mapping from more correct ones (1152-1143) found by the MMSS, four the target OT to the intermediate OI using BSMlex. The next were also found by the OAEI 2011 matcher with the MM. step is to determine US and UT, all the source concepts s in The reason is the MA concept string name is an exact match the mapping set from source to mediating ontology and all or a substring of the HA concept. The MMSS found these the target concepts t in the mapping set from target to medi- four through using semantic similarity within Uberon. ating ontology, respectively, which did not get selected by The OAEI 2011 results using MMSS always produced the original mediating matcher. These two sets are given as more mappings than that using the MM. An interesting ob- servation though is the 1350 correct for the MM and the US = {s | sOS : (s, bS, mapSimSI,)MSI MMSS with 0.90 threshold are not the same ones. Each ∄ tOT : (s, t, simST)MST} found 3 different correct ones from each other. The goal is UT = {t | tOT : ( t, bT, mapSimTI,)MTI to study the interaction among the other OAEI 2011 match- ∄ sOS : (s, t, simST)MST}. (3) ers with the MMSS and the MM to try to keep both sets of 3 For each pair (s, t) in US x UT, the semantic similarity be- correct matches instead of replacing them with each other. tween all bridge concepts for s and all bridge concepts for t are calculated, and the maximum is used in determining the Mapped Correct Precision Recall F-measure enhanced mapping set as MM 1200 1143 95.2 75.4 84.2 MMSS, 0.0 1322 1152 87.1 76 81.2 EST = {(s, t, agg(mapSimSI, mapSimTI, bridgeSim )) | MMSS, 0.65 1301 1151 88.5 75.9 81.7 sUS, bS , bT OI, tUT : (s,bS,mapSimSI) MSI MMSS, 0.85 1240 1150 92.7 75.9 83.5 ( t, bT, mapSimTI,)MTI : MMSS, 0.90 1229 1148 93.4 75.7 83.6 bridgeSim = max bS , bT OI (semSim(bS , bT))}. (4) OAEI 2011 MST EST is returned as the result of the MMSS and is in- MM 1443 1350 93.6 89.1 91.2 MMSS, 0.85 1447 1348 93.2 88.9 91.0 put to the LWC1 in place of simply MST. Different agg op- MMSS, 0.90 1447 1350 93.3 89.1 91.1 erators may be used. For the experiments reported below, the minimum is used since this aggregator looks for the Table 1. Experimental Results on the OAEI Anatomy Track weakest similarity between the three pairs of concepts. The final mapping between s and t is not considered any stronger Table 2 shows thethree correct mappings produced with the than the weakest similarity of the three being aggregated. OAEI 2011 matcher and MMSS and not produced with Different measures can be used for semSim. For the experi- MM. Table 3 shows the three correct mappings produced by ments reported below, the standard Lin semantic similarity the OAEI 2011 with MM and not produced with MMSS. The measure is used with IC as defined in (Seco et al., 2004) MMSS incorrectly mapped the MA sources to the HA con- since it has frequently been used in current OA systems. An cepts matching the Uberon BT column of Table 3 since each additional threshold value may be set to eliminate mappings of these concepts exists in the HA ontology and were in EST whose aggregated similarity falls below the threshold. mapped from the HA to the corresponding Uberon concept. 4 Using a Reference Ontology with Semantic Similarity in Ontology Alignment MA Source HA MMSS Uberon BS Uberon BT Other source and target ontologies with different structures Target and more varied labeling should also be tested. gastrointestinal gastrointestinal system mesentery Mesentery system mesentery Mesentery ACKNOWLEDGEMENTS Limb long bone Long bone Limb long bone Long bone The authors would like to thank Dr. Isabel Cruz and Cosmin Brain ependyma Ependyma Brain ependyma Ependyma Stroe for their support in this research effort. Table 2. New Mappings, OAEI MMSS but not OAEI MM REFERENCES MA Source HA MM Target Uberon BS Uberon BT Cruz, I. F., Stroe, C., Caimi, F., Fabiani, A., Pesquita, C., Couto, F. M., Cerebral Palmonari, M. (2011) Using AgreementMaker to Align Ontologies for Brain arach- Arachnoid Brain arach- OAEI 2011. Ontology Matching Workshop, International Semantic noid matter Membrane noid mater leptomeninges Web Conference. Iliac circum- circumflex iliac Iliac circum- Deep circumflex Cross, V. and Yu, X. (2011) Investigating ontological similarity theoreti- cally with fuzzy set theory, information content, and Tversky similarity flex artery artery flex artery iliac artery and empirically with the Gene Ontology, Proc. of the 5th International Vagina Vagina Vagina Stratified Conference on Scalable Uncertainty Management. squamous squamous squamous squamous Euzenat, J., Valtchev, P. (2003) An integrative proximity measure for epithelium epithelium epithelium epithelium ontology alignment. In: Proc. ISWC-2003 Workshop on semantic in- Table 3. Lost Mappings, OAEI MM but not OAEI MMSS formation integration, Sanibel Island (FL US), 33—38. Euzenat, J.et al. (2011) The Results of the Ontology Alignment Evaluation For the three new correct mappings found by MMSS, Initiative 2011. Ontology Matching Workshop, International Semantic none of the AgreementMaker matchers (PSM, VMM, LSM, Web Conference. and MM) found the third mapping. The PSM found the Jean-Mary, Y.R. , Shironoshita, E. P., and. Kabuka, M. R (2009) Ontology second mapping but the VSM incorrectly mapped the “fore- matching with semantic verification, Journal of Web Semantics, vol. 7, limb long bone” to “long bone” instead with a higher confi- no. 3, 235–251. dence than the PSM had. LWC2 which combines the VSM Gross, A., Hartung, M., Kirsten, T., and Rahm, E. (2011) Mapping Compo- and PSM produced the VSM mapping. Only the VSM pro- sition for Matching Large Life Science Ontologies. In Proc. of the In- duced the first mapping. Since the PSM did not, the LWC2 ternational Conference on Biomedical Ontology, pages 109–116. did not produce this correct mapping. LWC1 could not Lin D. (1998) An Information-theoretic Definition of Similarity. In: Proc. produce any of three mappings since it combines the LSM of the 15th International Conference on Machine Learning. Morgan and MM, neither of which produced any of these mappings. Kaufmann. pp. 296-304. For the three correct mappings lost with the MMSS, the Rada R, Mili H, Bicknell E, Blettner M (1989) Development and Applica- PSM did produce all three, and the VSM did produce the tion of a Metric on Semantic Nets. IEEE Transaction on Systems, Man, first two. The MMSS, however, mapped the MA sources to and Cybernetics vol. 19, 17–-30. incorrect targets for all three. The LWC2 did produce the Pirro, G., Talia D. (2010) UFOme: an ontology mapping system with strat- three correct mappings but the LWC1 using the MMSS and egy prediction capabilities. Data Knowl.Eng. 69.5, 444-471, 2010, LSM produced the three incorrect mappings. When LWC3 Resnik, P. (1995) Using Information Content to Evaluate Semantic Similar- combines the LWC1 and LWC2 results, the LWC1 results ity in Taxonomy. In: Proc. of the 14th International Joint Conference had higher confidence values so the second and third on Artificial Intelligence, 448—453. MMSS incorrect mappings were selected. The first incor- Sabou, M., d’Aquin, M. and Motta, E. (2008) Exploring the semantic web rect MMSS mapping is lost in LWC3 probably because its as background knowledge for ontology matching, Journal on Data quality evaluation does not satisfy the cutoff threshold, Semantics, vol. XI, 156–190. Seco N, Veale T, Hayes J. (2004) An Intrinsic Information Content Metric 5 CONCLUSIONS AND FUTURE WORK for Semantic Similarity in WordNet. In: ECAI. 1089 -1090. Shvaiko, P. and Euzenat, J. (2012) Ontology Matching: State of the Art and The MMSS is successful at discovering more correct map- pings than AgreementMaker’s MM. The drawback, how- Future Challenges, IEEE Transactions on Data and Knowledge Engi- ever, is it suggests more mappings. More experimentation neering, to appear in print. is needed to better understand the interaction between the Wu Z, Palmer M. S. (1994) Verb Semantics and Lexical Selection. In: MMSS and the other matchers in the OAEI 2011 configura- Proc. of the 32nd. Annual Meeting of the Association for Computa- tion so that other possible LWC schemes can be developed tional Linguistics, 133-138. to better combine the strengths of the MMSS with the other Yu, X. (2010). A Mathematical and Experimental Investigation of Ontolog- matchers. In addition, different semantic similarity measures ical Similarity Measures and their Use in Biomedical Domains. Mas- need to be investigated with different reference ontologies ter’s Thesis, Computer Science and Software Engineering, Miami University, Oxford OH 5