Combining Ontology Mapping Methods Using Bayesian Networks Ondřej Šváb, Vojtěch Svátek University of Economics, Prague, Dep. Information and Knowledge Engineering, Winston Churchill Sq. 4, 130 67 Praha 3, Prague, Czech Republic svabo@vse.cz, svatek@vse.cz Abstract. Bayesian networks (BNs) can capture interdependencies among ontology mapping methods and thus possibly improve the way they are combined. Experiments on ontologies from the OAEI collection are shown, and the possibility of modelling explicit mapping patterns in combination with methods is discussed. 1 Introduction Most existing systems for ontology mapping combine various methods for achiev- ing higher performance in terms of recall and precision. Our approach relies on Bayesian networks (BNs) as well-known formal technique that can capture in- terdependencies among random variables. A Bayesian network (BN) [3] is a di- rected acyclic graph with attached local probability distributions. Nodes in the graph represent random variables with mutually exclusive and exhaustive sets of values (states). Edges in the graph represents direct interdependences between two random variables. We believe that this approach can bring additional ben- efits compared to ad hoc combination of methods, mainly resulting from better adaptability (training from data within a well-established formal framework). Two approaches that use BNs for Ontology Mapping have recently been re- ported. The first is OMEN [4], which mainly serves for enhancing existing map- pings. Its input are results of another mapping tool, while its output are more precise mappings as well as and new mappings. Nodes in the BN represent pairs of concepts that can potentially be mapped. Edges follow the taxonomy given in original ontologies. The network structure thus mimics that of ontologies them- selves, though heuristics for graph pruning are employed in this transformation. For constructing conditional probability tables (CPTs) for each node meta-rules are used, such as : “if two nodes match and so do two arrows coming out of these nodes then the probability that nodes at the other end of the arrows match is increased”. The second project, BayesOWL ([5]), is rather a framework for on- tology mapping than a mapping method per se. The probabilistic ontological information is assumed to be learnt (in forms of probabilistic constraints) from web data using a text-classification-based learner; this information is translated to BNs. Mappings among concepts from two different ontologies then can be discovered using so-called evidential reasoning across two BNs. Fig. 1. Example of mapping pattern across two ontologies 2 Modelling Dependencies among Mapping Methods Our approach differs from prior approaches in the sense that we don’t apply BN modelling to ontologies or their mappings themselves but rather to different mapping methods. The BNs are assumed to contain nodes (or sub-networks) rep- resenting the results of individual methods plus one representing the final output. This will allow us not only to combine the methods (in the probabilistic frame- work) but also to talk about conditionally in/dependent methods, a minimal required subset of methods and the like. The mapping methods can have vary- ing degree of granularity: we focus on low-level methods, understood as mapping justifications. Moreover, in the work-in-progress part of our research, we account for mapping patterns encompassing small structural fragments of ontologies. The patterns will capture, to some degree, similar information as OMEN meta-rules, we however prefer to model them directly within the BN formalism. We distinguish among families of methods (string-based, linguistic-resource- based, graph-based, logic-based etc.) sharing some generic principle and input resources. Each family encompasses multiple low-level methods; for example, a string-based method can be built upon diverse string distance measures. We dedicate a separate node of the BN to each low-level method, viewed here as mapping justification. We believe that such methods are a meaningful target for BN modelling, as their statistical dependencies are likely to reflect plausible relationships even interpretable by a human. The notion of mapping pattern is a natural counterpart to that of intra- ontology (‘design’) pattern [1]. Mapping patterns have been implicitly proposed by Ghidini & Serafini [2], who even consider mappings among different modelling constructs (such as concept-to-relation). A mapping pattern is, essentially, a structure containing some (at least one) constructs from each of the two (or more) ontologies plus some (candidate) mapping among them. The simplest mapping pattern only connects one concept from each of the two ontologies. An example of a bit more complex mapping pattern is in Figure 1. The left-hand side (class A) is from O1 and the right-hand side (class B and its subclass X) is from O2 . We try to map class A simultaneously to class B and to class X. The input to the process of BN training for ontology mapping are positive and negative examples with results of individual methods (‘mapping justifications’), and possibly also the network structure, unless we want to learn it as well. The positive examples correspond to pairs for which mapping has previously been established, while the negative ones are (all or a subset of) pairs that have been identified as non-matching. Then CPTs and possibly the structure are learnt. In the phase of using the trained BN, the mapping justifications for unseen cases (pairs of concepts) are counted and inserted into the BN as evidence. The result of alignment is calculated via propagation of this evidence. 3 Experiments For experiments we choose ontologies from the OntoFarm collection (http: //nb.vse.cz/~svabo/oaei2006/), which is currently part of the OAEI 2006 setting. It models the domain of conference organisation; individual ontologies were designed independently by different people and based on different resources: personal experience with conference organisation, conference web pages or con- ference organisation support tools. We restricted the first experiments to ten string distance measures imple- mented in the SecondString library (http://secondstring.sourceforge.net/: Levenshtein, Jaro, Jaccard, Char-Jaccard, Smith-Waterman, Monge-Elkan, SLIM, TokenFelligiSunter, UnSmoothedJS and TFIDF. Because of the local nature of distance string measures, capturing context by means of mapping patterns does not seem to bring great benefits; we thus only focused on the combination of low-level methods. We extracted classes from two ontologies (ekaw.owl and ConfOf.owl). Our training data consist of 798 pairs, of which 149 were manu- ally labelled as positives and 649 as negatives. They were ‘semi-randomly’ picked from different parts of the ontology; the overall number of possible pairs would be about 2500 (the product of concept counts in both ontologies). The results were transformed from the [0, 1] scale to two categories: ‘true’ if the value is over 0.5 and ‘false’ if the value is lower or equal to 0.5. To learn the BN we use the Hugin tool (http://www.hugin.com/): the struc- ture was trained using the NPC method and CPTs were trained using the EM algorithm. We learnt two Bayesian networks in this way. The first one has been enforced the naive Bayesian structure, which assumes independence of methods; only the CPTs were learned from data. For the second network, we also learnt the structure; in this way we could also explore interdependencies among low- level methods. The learnt structure is in Figure 2. From the structure and the defintion of so-called Markov blanket [3] we can conclude that if we know the mapping justifications of TFIDF, Smith-Waterman, Jaccard, Jaro, and SLIM, other methods do not matter. Methods unrelated to some other method (Token- FelligiSunter and UnSmoothedJS) are not in the BN at all. To evaluate the performance of each proposed Bayesian classifier we used the one-leave-out method. For the naive Bayesian classifier, we got the best result with probability threshold 80%: 73% precision, 60% recall (F-measure was then Fig. 2. BN - automatically learnt structure Fig. 3. Fragment of BN reflecting the mapping pattern from Fig. 1 0.66) and 88% accuracy. For the Bayesian classifier with learnt structure we got 84% precision, 53% recall (F-measure was 0.65) and 89% accuracy as best result, for whatever threshold between 40% and 70%. Both our classifiers out- perform trivial classifiers that always predict true or false, respectively. Overall, the Bayesian classifier with learnt structure outperformed the naive Bayesian classifier. On the other hand, the best individual method (Jaccard) performed the same as the Bayesian classifier with learnt structure (84% precision, 53% of recall and 89% accuracy) with threshold around 50%. By this result, we can say that the combination (using BN) of string distance measures does not bring a direct benefit. However, the (second) Bayesian classifier is less sensitive to the change of threshold, while Jaccard moves towards 100% precision but rather low recall of 23% as soon as the threshold increases to 60%. 4 Conclusions and Future Work We suggested to use low-level methods as ‘mapping justifications’ in order to train a Bayesian network on a sample of mappings to produce new mappings. Results of preliminary experiments with string distance measures as low-level methods are not entirely convincing in terms of performance, which can be ex- plained by strong correlation among these methods; this correlation was actually discovered when learning the BN structure. The main role of this initial phase of research was to gain deeper insight into the problems addressed. The possi- bility to model explicit mapping patterns in combination with methods was also studied but not yet reflected in experiments. In the future, we plan to employ, in the role of mapping justifications, not only string-based (low-level) techniques, but also e.g. graph-based or thesauri- based techniques. A more challenging task is however to design BNs reflecting the structure of patterns. Each method (and the final result) will be represented with a set of nodes corresponding to the given pattern. For example, a fragment of BN reflecting the mapping pattern from Fig. 1 is depicted in Fig. 3. It considers not only the equivalence relation but also the (proper) subsumption relation, and has four nodes that represent the alignment of each pair and each relation (equivalence of A and B, equivalence of A and X, subsumption of A and B and subsumption of A and X). align1 represents the equivalence mapping between A and B. align1sub represents the subsumption mapping between A a B (B ⊃ A). align2 represents the equivalence mapping between A a X. Finally, align2sub represents the subsumption mapping between A a X (A ⊃ X). Edges then should automatically be learnt for the pairs of nodes align1 and align2sub, and align2 and align1sub, respectively, due to strict dependencies. We thank Jiřı́ Vomlel for his assistance with Bayesian Networks. The research was partially supported by the IGA VSE grants no.26/05 “Methods and tools for ontological engineering”, no.12/06 “Integration of approaches to ontological engineering: design patterns, mapping and mining”, and by the Knowledge Web Network of Excellence (IST FP6-507482). References 1. W3C Semantic Web Best Practices and Deployment Working Group. Ontology Engineering and Patterns Task Force (OEP). Online at http://www.w3.org/2001/ sw/BestPractices/OEP/ 2. Ghidini C., Serafini L.: Reconciling concepts and relations in heterogeneous on- tologies. In: Proc. ESWC 2006, Budva, Montenegro, 2006. 3. Jensen F. V.: Bayesian Networks and Decision Graphs. Springer, 2001. 4. Mitra P., Noy N. F., Jaiswal A. R.: OMEN: A Probabilistic Ontology Mapping Tool. In: Workshop on Meaning coordination and negotiation at the Third Inter- national Semantic Web Conference (ISWC-2004), Hiroshima, Japan, 2004. 5. Pan R., Ding Z., Yu Y., Peng Y.: A Bayesian Network Approach to Ontology Mapping. In: Proceedings ISWC 2005.