=Paper=
{{Paper
|id=Vol-1111/oaei13_paper12
|storemode=property
|title=System for Parallel Heterogeneity Resolution (SPHeRe) results for OAEI 2013
|pdfUrl=https://ceur-ws.org/Vol-1111/oaei13_paper12.pdf
|volume=Vol-1111
|dblpUrl=https://dblp.org/rec/conf/semweb/KhanAKHL13
}}
==System for Parallel Heterogeneity Resolution (SPHeRe) results for OAEI 2013==
System for Parallel Heterogeneity Resolution (SPHeRe) results for OAEI 2013 Wajahat Ali Khan, Muhammad Bilal Amin, Asad Masood Khattak, Maqbool Hussain, and Sungyoung Lee Department of Computer Engineering Kyung Hee University Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do, Republic of Korea, 446-701 {wajahat.alikhan, mbilalamin, asad.masood, maqbool.hussain, sylee}@oslab.khu.ac.kr Abstract. SPHeRe is an ontology matching system that utilizes cloud infrastruc- ture for matching large scale ontologies and focus on alignment representation to be stored in the Mediation Bridge Ontology (MBO). MBO is the mediation ontology that stores all the alignments generated between the matched ontolo- gies and represents it in a manner that provides maximum metadata information. SPHeRe is a new initiative therefore it only participates in the large biomedical ontologies track of the OAEI 2013 campaign. The objectives of SPHeRe system participation in OAEI is to shift focus of ontology matching community towards areas such as cloud utilization, effective mapping representation, and flexible and extendable design of the matching system. 1 Presentation of the system Ontology mappings enables accessibility of information by aligning the resources in ontologies belonging to diverse organizations [3]. These also resolves semantic het- erogeneities among data sources. Mainly two steps are required to overcome seman- tic heterogeneity: Matching resources to determine alignments and interpreting those alignments according to application requirements [5]. We have started developing SPHeRe system in 2013 and its an ongoing project. The objectives of SPHeRe system are performance [2], accuracy, mapping representation, and flexible and extendible design of the system. 1.1 State, purpose, general statement SPHeRe system target a complete package of a system with main objectives as accu- racy, mapping representation, and flexible and extendible system. Its precision is on the higher side in large biomedical ontologies track, that shows its potential of improving the accuracy. It is based on different algorithms such as String Matching Bridge, Syn- onym Bridge, Child Based Structural Bridge (CBSB), Property Based Structural Bridge (PBSB), and Label Bridge. We plan to include further bridge algorithms in next version of the proposed system by incorporating new matching techniques. Parallelism has been overlooked by ontology matching systems. SPHeRe avails this opportunity and provides a solution by: (i) creating and caching serialized subsets of candidate ontologies with single-step parallel loading; (ii) lightweight matcher-based and redundancy-free subsets result in smaller memory footprints and faster load time; and (iii) implementing data parallelism based distribution over subsets of candidate on- tologies by exploiting the multicore distributed hardware of cloud platform for parallel ontology matching and execution [2]. Mapping representation is another aspect of SPHeRe system which is not covered in this paper. We have followed OAEI alignment representation format, but we consider mapping representation as an important dimension to be worked by ontology matching research community. The more expressive the alignments should be, the easy its expert verification and the more will be confidence level in transformation process. 1.2 Specific techniques used SPHeRe system is based on bridge algorithms run in the parallel execution environment to generate alignments to be stored in the MBO as shown in Fig. 1. Matcher Library components stores all the bridge algorithms to be run on the parallel execution envi- ronment represented by Parallel Matching Framework. Communication between these two components is regulated by SPHeRe Execution Control module that behaves as a controller. The alignments are stored in the MBO; generated by the bridge algorithms stored in Matcher Library that are run by the Parallel Matching Framework. Matcher Library Synonym Label Bridge String Bridge Matching Bridge CBSB PBSB SPHeRe Execution Control Parallel Matching Framework Distributor Aggregator Mediation Bridge Ontology (MBO) Parallel Hardware Interface Fig. 1. SPHeRe System Working Model String Matching Bridge provides matching results by finding similar concepts based on string matching techniques in the matching ontologies. Mainly the algorithm is based on applying edit distance technique [4] of string matching. For any two concepts Ci and Cj of the ontologies Oi and Oj respectively, edit distance is applied to find matching value, SimScore ←− Ci . EditDistance (Cj ). A threshold T hreshold value of n is set for matching in String Matching Bridge algorithm to limit the number of impure mappings. Label Bridge uses the labels of the source and target concepts for matching. Ini- tially, concept labels are normalized e.g. using stop word elimination, then list of the source concept labels are matched with list of the target concept labels. The source and target concepts label list LabelListi and LabelListj are matched using ((LabelListi ∩ LabelListj ) 6= φ). If any label in the lists matches, the source and target concepts are stored in the MBO as mappings. Synonym Bridge is based on finding the similarity between concepts using word- net [1]. The relationship is identified based on matching the synonyms of the concepts accessed using wordnet. Initially synonyms of source Listl := Ci .GetSynonymWordnetList() and target Listm := Cj .GetSynonymWordnetList() concepts are extracted using word- net; where Ci and Cj are the source and target concepts respectively. The number of common synonyms M atchedItems is found for calculating the matching value SimScore. If its value is less than the threshold then this alignment is discarded, oth- erwise stored in the MBO. Child Based Structural Bridge (CBSB) bridge generates mappings between source and target ontologies based on matching children of the concepts. Initially, children of source Ci and target Cj concepts are accessed as lists ChildListi and ChildListj re- spectively. The number of common children in the lists is identified as M atchChildren. Finally the matching value SimScore is calculated and compared with the threshold T hreshold that is assigned value n. The matching value is calculated using SimScore ←− M atchedChildren / Average(ChildListi , ChildListj ). Property Based Struc- tural Bridge (PBSB) uses String Matching Bridge techniques to match properties of source and target concepts for finding similar properties. This information is utilized as in CBSB for matching the source and target ontologies concepts based on their prop- erties. These bridge algorithms are run on a parallel execution environment for better performance of the system. Multiphase design of SPHeRe system is represented in Fig. 2(taken from [2]) that describes the parallelism inclusion in ontology matching process for better per- formance. The first phase of the system is ontology loading and management, in which the source and target ontologies are loaded in parallel by multithreaded ontology load interface (OLI). The main tasks of OLI includes; parallel loading of source and target ontologies, parsing for object model creation, and finally ontology model serialization and de-serialization. This is an important phase for data parallelism over multi-threaded execution into the second phase of distribution and matching [2]. Serialized subsets of source and target ontologies are loaded in parallel by multi- threaded ontology distribution interface (ODI). ODI is responsible for task distribution of ontology matching over parallel threads (Matcher Threads). ODI currently imple- ments size-based distribution scheme to assign partitions of candidate ontologies to be matched by matcher threads. In a single node, matcher threads correspond to the num- ber of available cores for the running instance. In multi-nodes, each node performs its own parallel loading and internode control messages which are used to communicate Fig. 2. Performance of SPHeRe System [2] regarding the ontology distribution and matching algorithms. Matched results provided by matcher threads are submitted to accumulation and delivery phase for the MBO creation and delivery [2]. Ontology Aggregation Interface (OAI) accumulates matched results provided by matcher threads. OAI is responsible for MBO creation by combining matched results as mappings and delivering MBO via cloud storage platform. OAI provides a thread-safe mechanism for all matcher threads to submit their matched results. After the completion of all matched threads, OAI invokes MBO creation process which accumulates all the matched results in a single MBO instance [2]. In case of multi-node distribution, OAI also accumulates results from remote nodes after completion of their local matcher threads. This is a summary version of the performance oriented ontology matching process of SPHeRe system, extracted from [2], that provides a detailed version of the overall process. 1.3 Link to the system and parameters file https://sites.google.com/a/bilalamin.com/sphere/results 1.4 Link to the set of provided alignments (in align format) https://sites.google.com/a/bilalamin.com/sphere/results 2 Results SPHeRe is deployed in multi-node configuration on virtual instances (VMs) over a tri- node private cloud equipped with commodity hardware. Each node is equipped with Intel(R) Core i7(R) CPU, 8GB memory with Xen Hypervisor. Jena API is utilized for its inferencing capabilities. As SPHeRe is using cloud infrastructure therefore initially we have only targeted large biomedical ontologies track. The results are as follows: 2.1 Large biomedical ontologies SPHeRe is a cloud based ontology matching system that provides the facility to user for matching large scale ontologies without changing their hardware specifications. Fig- ure 3 shows the results of our proposed system in large biomedical ontologies track. It has shown better precision values in almost all the tracks except task 4, while the recall of the system needs to be improved. Tasks Time #Mappings Scores Incoherence Analysis (s) Precision Recall F-Measure Unsat. Degree Task 1 16 2359 0.960 0.772 0.856 367 3.6% Task 2 8136 2610 0.846 0.753 0.797 1054 0.7% Task 3 154 1577 0.916 0.162 0.275 805 3.4% Task 4 20664 2338 0.614 0.160 0.254 6523 3.2% Task 5 2486 9389 0.924 0.469 0.623 ≥ 46256 ≥ 61.6% Task 6 10584 9776 0.881 0.466 0.610 ≥ 105,418 ≥ 55.7% Fig. 3. SPHeRe Large Biomedical Ontologies Track Results 3 General comments 3.1 Comments on the results Performance and precision are the strengths of our system. The design of proposed system also adds to its strength as it is a extendible and reusbale system. Recall is the main weakness of our system, but with the addition of new matching techniques as bridge algorithms can improve this aspect and therefore accuracy can be improved. Extendibility allows adoption of new bridge algorithms easily into the proposed system. 3.2 Discussions on the way to improve the proposed system New bridge algorithms incorporating new matching techniques is the next line of plan for the proposed system. Object oriented and ontology alignment design patterns are to be implemented for matching different tracks of OAEI campaign. We also tend to include instance based matching, and incorporate change management techniques in the system. 4 Conclusion SPHeRe system is a new initiative that relies on parallel execution of matcher bridge al- gorithms for achieving better performance and accuracy. The system is still working on improving the accuracy by incorporating more matcher bridge algorithms to increase the recall value of the system. Performance of the proposed system is better as com- pare to other system due to running large biomedical ontologies on a single system in appropriate time. References 1. Wordnet a lexical database for english. http://wordnet.princeton.edu/, last visited in October 2013 2. Amin, M.B., Batool, R., Khan, W.A., Huh, E.N., Lee, S.: Sphere: A performance initiative towards ontology matching by implementing parallelism over cloud platform. In: Journal of Supercomputing. Springer, in Press 3. Li, L., Yang, Y.: Agent-based ontology mapping and integration towards interoperability. Ex- pert Systems 25(3), 197–220 (2008) 4. Navarro, G.: A guided tour to approximate string matching. ACM computing surveys (CSUR) 33(1), 31–88 (2001) 5. Pavel, S., Euzenat, J.: Ontology matching: state of the art and future challenges. Knowledge and Data Engineering, IEEE Transactions on (25), 158–176 (2013)