Introduction

Combining Sum-Product Network and Noisy-Or Model for Ontology Matching

Weizhuo Li

liweizhuo@amss.ac.cn 0 0 Institute of Mathematics,Academy of Mathematics and Systems Science, Chinese Academy of Sciences , Beijing , P. R. China

Ontology matching is the key challenge to achieve semantic interoperability in building the Semantic Web. We present an alternative probabilistic scheme, called GMap, which combines the sum-product network and the noisy-or model. More precisely, we employ the sum-product network to encode the similarities based on individuals and disjointness axioms across ontologies and calculate the contributions by the maximum a posterior inference. The noisy-or model is used to encode the probabilistic matching rules, which are independent of each other as well as the value calculated by the sum-product network. Experiments show that GMap is competitive with many OAEI top-ranked systems. Futhermore, GMap, bene ted from these two graphical models, can keep inference tractable in the whole matching process.

Introduction

Ontology matching is the process of nding relationships or correspondences between entities of di erent ontologies[ 5 ]. Many e orts have been conducted to automate the discovery in this process, e.g., incorporating more elaborate approaches including scaling strategies[ 3, 6 ], ontology repair techniques to ensure the alignment coherence[ 8 ], employing machine learning techniques[ 4 ], using external resources to increase the available knowledge for matching[ 2 ] and utilizing probabilistic graphical models to describe the related entities[ 1, 10, 11 ].

In this paper, we propose an alternative probabilistic schema, called GMap, based on two special graphical models|sum-product network(SPN) and noisyor model. SPN is a directed acyclic graph with variables as leaves, sums and products as internal nodes, and weighted edges[ 12 ]. As it can keep inference tractable and describe the context-speci c independence[ 12 ], we employ it to encode the similarities based on individuals and disjointness axioms and calculate the contributions by the maximum a posterior inference. Noisy-or model is a special kind of Bayesian Network[ 9 ]. When the factors are independent of each other, it is more suitable than other graphical models, specially in the inference e ciency[ 9 ]. Hence, we utilize it to encode the probabilistic matching rules. Thanks to the tractable inference of these special graphical models, GMap can keep inference tractable in the whole matching process. To evaluate GMap, we adopt the data sets from OAEI ontology matching campaign. Experimental results indicate that GMap is competitive with many OAEI top-ranked systems.

Methods

In this section, we brie y introduce our approach. Given two ontologies O1 and O2, we calculate the lexical similarity based on edit-distance, external lexicons and TFIDF[ 5 ]. Then, we employ SPN to encode the similarities based on individuals and disjointness axioms and calculate the contributions. After that, we utilize the noisy-or model to encode the probabilistic matching rules and the value calculated by SPN. With one-to-one constraint and crisscross strategy in the re ne module, GMap obtains initial matches. The whole matching procedure is iterative. If it does not produce new matches, the matching is terminated. 2.1

Using SPN to encode individuals and disjointness axioms In open world assumption, individuals or disjointness axioms are missing at times. Therefore, we de ne a special assignment|"U nknown" for the similarities based on these individuals and disjointness axioms.

For the similarity based on individuals, we employ the string equivalent to judge the equality of them. When we calculate the similarity of concepts based on individuals across ontologies, we regard individuals of each concept as a set and use Ochiai coe cient1 to measure the value. We use a boundary t to divide the value into three assignments(i.e., 1, 0 and U nknown). Assignment 1(or 0) means that the pair matches(or mismatches). If the value ranges between 0 and t or the individuals of one concept are missing, the assignment is U nknown.

For the similarity based on disjointness axioms, we utilize these axioms and subsumption relations within ontologies and de ne some rules to determine its value. For example, x1, y1 and x2 are concepts that come from O1 and O2. If x1 matches x2 and x1 is disjoint with y1, then y1 is disjoint with x2. The similarity also have three assignments. Assignment 1(or 0) means the pair mismatches(or overlaps). Otherwise, the similarity based on disjointness axioms is U nknown.

SPN |= (I ! M |D0)

× · D0 × + + × + × + + · · · I1 I2 I3 · M1

SPN |= (I ? M |D1) ×

+ + · M2 · D1 ·

As shown in Figure 1, we designed a sum-product network S to encode above similarities and calculate the contributions, where M represents the contributions and leaves M1, M2, M3 are indicators that comprise the assignments of M . All the indicators are binary-value. M1 = 1(or M2 = 1) means that the contributions are positive(or negative). If M3 = 1, the contributions 1 https://en.wikipedia.org/wiki/Cosine similarity are U nknown. Leaves I1, I2, I3, D0, D1 are also binary-value indicators that correspond to the assignments of similarities based on individuals(I) and disjointness axioms(D). The concrete assignment metrics are listed in Table 1{2.

Probabilistic matching rules two classes probably match if their fathers match two classes probably match if their children match two classes probably match if their siblings match two classes about domain probably match if related objectproperties match and range of these property match two classes about range probably match if related objectproperties match and domain of these properties match two classes about domain probably match if related dataproperties match and value of these properties match

When we focus on calculating the matching probability of one pair, the matching rules are independent of each other as well as the value calculated by SPN. Therefore, we utilize the noisy-or model to encode them.

R1 S1

R2 S2 probability of one pair, P (S = 1jS0; R1; :::; R6), is calculated according to the formulas in the lower-right corner. ci is the count of satis ed Ri and sigmoid function f (ci) is used to limit the upper bound of contribution of Ri. As the inference in the noisy-or model can be computed in time linear in size of nodes[ 9 ], GMap can keep inference tractable in the whole matching process. 3

Evaluation

To evaluate our approach2, we adopt three tracks(i.e., Benchmark, Conference and Anatomy) from OAEI ontology matching campaign in 20143. 3.1

Comparing against the OAEI top-ranked systems Table 4 shows a comparison of the matching quality of GMap and other OAEI top-ranked systems, which indicates that GMap is competitive with these promising existent systems. For Anatomy track, GMap does not concentrate on language techniques and it emphasizes one-to-one constraint. Both of them may cause a low alignment quality. In addition, all the top-ranked systems employ alignment debugging techniques, which is helpful to improve the quality of alignment. However, we do not employ these techniques in the current version. We separate SPN and the noisy-or model from GMap and evaluate their contributions respectively. As listed in Table 5, SPN is suitable to the matching task that the linguistic levels across ontologies are di erent and both of ontologies use same individuals to describe the concepts such as Biblio(201{210) in Benchmark track. Thanks to the contributions of individuals and disjointness axioms, SPN can improve the precision of GMap. When the structure information is very rich across the ontologies, the noisy-or model is able to discover some hidden matches with the existing matches and improve the recall such as in Anatomy track. However, if the ontology does not contain above features such as in Conference track, the improvement is not evident. Nevertheless, thanks to the complementary of these two graphical models to some extent, combining the sum-product network and the noisy-or model can improve the alignment quality as a whole. 2 The software and results are available at https://github.com/liweizhuo001/GMap. 3 http://oaei.ontologymatching.org/2014/ 4

Conclusion and Future Work

We have presented GMap, which is suitable for the matching task that many individuals and disjointness axioms are declared or the structure information is very rich. However, it still has a lot of room for improvement. For example, language techniques is essential to improve the quality of initial matches. In addition, dealing with alignment incoherent is also one of our future works. Acknowledgments. This work was supported by the Natural Science Foundation of China (No. 61232015). Many thanks to Songmao Zhang, Qilin Sun and Yuanyuan Wang for their helpful discussion on the design and implementation of the GMap.

1. Albagli , S. , Ben- Eliyahu-Zohary, R. , Shimony , S.E. : Markov network based ontology matching . Journal of Computer and System Sciences 78 ( 1 ), 105 { 118 ( 2012 )

2. Zhang , S. , Bodenreider , O. : Experience in aligning anatomical ontologies . International journal on Semantic Web and information systems 3(2) , 1 { 26 ( 2007 )

3. Djeddi , W.E, Khadir, M.T.: XMAP: a novel structural approach for alignment of OWL-full ontologies . In: Proc. of Machine and Web Intelligence(ICMWI) . pp. 368 { 373 ( 2010 )

4. Doan , A.H , Madhavan , J. , Dhamankar , R. , et al.: Learning to match ontologies on the semantic web . The VLDB Journal 12 ( 4 ), 303 { 319 ( 2003 )

5. Euzenat , J. , Shvaiko , P. : Ontology Matching(2nd Edition) . Springer ( 2013 )

6. Faria , D. , Pesquita , C. , Santos , E. , et al.: The agreementmakerlight ontology matching system In: 2013 OTM Conferences . pp. 527 { 541 ( 2013 )

7. Gens , R. , Pedro , D. : Learning the structure of sum-product networks . In: Proc. of International Conference on Machine Learning(ICML) . pp. 873 { 880 ( 2013 )

8. Jimenez-Ruiz , E. , Grau , B.C. : LogMap: Logic-based and scalable ontology matching . In: Proc. of International Semantic Web Conference(ISWC) . pp. 273 { 288 ( 2011 )

9. Koller , D. , Friedman , N.: Probabilistic Graphical Models . MIT press ( 2009 )

10. Mitra , P. , Noy , N.F. , Jaiswal , A.R. OMEN : A probabilistic ontology mapping tool . In: Proc. of International Semantic Web Conference(ISWC) . pp. 537 { 547 ( 2005 )

11. Niepert , M. , Noessner , J. , Meilicke , C. , Stuckenschmidt , H.: Probabilistic-logical web data integration . Reasoning Web . pp. 504 { 533 ( 2011 )

12. Poon , H. , Domingos , P. : Sum-product networks: A new deep architecture . In: Proc of International Conference on Computer Vision Workshops(ICCV Workshops) . pp. 689 { 690 ( 2011 )