<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>System for Parallel Heterogeneity Resolution (SPHeRe) results for OAEI 2013</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wajahat Ali Khan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Bilal Amin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asad Masood Khattak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maqbool Hussain</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sungyoung Lee</string-name>
          <email>syleeg@oslab.khu.ac.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Engineering Kyung Hee University Seocheon-dong</institution>
          ,
          <addr-line>Giheung-gu, Yongin-si, Gyeonggi-do</addr-line>
          ,
          <country>Republic of</country>
          <addr-line>Korea, 446-701</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>SPHeRe is an ontology matching system that utilizes cloud infrastructure for matching large scale ontologies and focus on alignment representation to be stored in the Mediation Bridge Ontology (MBO). MBO is the mediation ontology that stores all the alignments generated between the matched ontologies and represents it in a manner that provides maximum metadata information. SPHeRe is a new initiative therefore it only participates in the large biomedical ontologies track of the OAEI 2013 campaign. The objectives of SPHeRe system participation in OAEI is to shift focus of ontology matching community towards areas such as cloud utilization, effective mapping representation, and flexible and extendable design of the matching system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>State, purpose, general statement</title>
      <p>SPHeRe system target a complete package of a system with main objectives as
accuracy, mapping representation, and flexible and extendible system. Its precision is on the
higher side in large biomedical ontologies track, that shows its potential of improving
the accuracy. It is based on different algorithms such as String Matching Bridge,
Synonym Bridge, Child Based Structural Bridge (CBSB), Property Based Structural Bridge
(PBSB), and Label Bridge. We plan to include further bridge algorithms in next version
of the proposed system by incorporating new matching techniques.</p>
      <p>
        Parallelism has been overlooked by ontology matching systems. SPHeRe avails this
opportunity and provides a solution by: (i) creating and caching serialized subsets of
candidate ontologies with single-step parallel loading; (ii) lightweight matcher-based
and redundancy-free subsets result in smaller memory footprints and faster load time;
and (iii) implementing data parallelism based distribution over subsets of candidate
ontologies by exploiting the multicore distributed hardware of cloud platform for parallel
ontology matching and execution [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Mapping representation is another aspect of SPHeRe system which is not covered
in this paper. We have followed OAEI alignment representation format, but we consider
mapping representation as an important dimension to be worked by ontology matching
research community. The more expressive the alignments should be, the easy its expert
verification and the more will be confidence level in transformation process.
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Specific techniques used</title>
      <p>SPHeRe system is based on bridge algorithms run in the parallel execution environment
to generate alignments to be stored in the MBO as shown in Fig. 1. Matcher Library
components stores all the bridge algorithms to be run on the parallel execution
environment represented by Parallel Matching Framework. Communication between these
two components is regulated by SPHeRe Execution Control module that behaves as a
controller. The alignments are stored in the MBO; generated by the bridge algorithms
stored in Matcher Library that are run by the Parallel Matching Framework.</p>
      <p>String
Matching
Bridge</p>
      <sec id="sec-3-1">
        <title>Matcher Library</title>
        <p>Synonym
Bridge
CBSB
Label Bridge
PBSB</p>
      </sec>
      <sec id="sec-3-2">
        <title>SPHeRe Execution Control</title>
      </sec>
      <sec id="sec-3-3">
        <title>Parallel Matching Framework</title>
        <p>Distributor
Aggregator
Parallel Hardware Interface</p>
      </sec>
      <sec id="sec-3-4">
        <title>Mediation Bridge Ontology (MBO)</title>
        <p>
          String Matching Bridge provides matching results by finding similar concepts based
on string matching techniques in the matching ontologies. Mainly the algorithm is based
on applying edit distance technique [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] of string matching. For any two concepts Ci and
Cj of the ontologies Oi and Oj respectively, edit distance is applied to find matching
value, SimScore Ci. EditDistance (Cj ). A threshold T hreshold value of n is
set for matching in String Matching Bridge algorithm to limit the number of impure
mappings.
        </p>
        <p>Label Bridge uses the labels of the source and target concepts for matching.
Initially, concept labels are normalized e.g. using stop word elimination, then list of the
source concept labels are matched with list of the target concept labels. The source and
target concepts label list LabelListi and LabelListj are matched using ((LabelListi
\ LabelListj ) 6= ). If any label in the lists matches, the source and target concepts are
stored in the MBO as mappings.</p>
        <p>
          Synonym Bridge is based on finding the similarity between concepts using
wordnet [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The relationship is identified based on matching the synonyms of the concepts
accessed using wordnet. Initially synonyms of source Listl := Ci.GetSynonymWordnetList()
and target Listm := Cj .GetSynonymWordnetList() concepts are extracted using
wordnet; where Ci and Cj are the source and target concepts respectively. The number
of common synonyms M atchedItems is found for calculating the matching value
SimScore. If its value is less than the threshold then this alignment is discarded,
otherwise stored in the MBO.
        </p>
        <p>Child Based Structural Bridge (CBSB) bridge generates mappings between source
and target ontologies based on matching children of the concepts. Initially, children of
source Ci and target Cj concepts are accessed as lists ChildListi and ChildListj
respectively. The number of common children in the lists is identified as M atchChildren.
Finally the matching value SimScore is calculated and compared with the threshold
T hreshold that is assigned value n. The matching value is calculated using SimScore</p>
        <p>M atchedChildren / Average(ChildListi, ChildListj ). Property Based
Structural Bridge (PBSB) uses String Matching Bridge techniques to match properties of
source and target concepts for finding similar properties. This information is utilized as
in CBSB for matching the source and target ontologies concepts based on their
properties. These bridge algorithms are run on a parallel execution environment for better
performance of the system.</p>
        <p>
          Multiphase design of SPHeRe system is represented in Fig. 2(taken from [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ])
that describes the parallelism inclusion in ontology matching process for better
performance. The first phase of the system is ontology loading and management, in which
the source and target ontologies are loaded in parallel by multithreaded ontology load
interface (OLI). The main tasks of OLI includes; parallel loading of source and target
ontologies, parsing for object model creation, and finally ontology model serialization
and de-serialization. This is an important phase for data parallelism over multi-threaded
execution into the second phase of distribution and matching [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>
          Serialized subsets of source and target ontologies are loaded in parallel by
multithreaded ontology distribution interface (ODI). ODI is responsible for task distribution
of ontology matching over parallel threads (Matcher Threads). ODI currently
implements size-based distribution scheme to assign partitions of candidate ontologies to be
matched by matcher threads. In a single node, matcher threads correspond to the
number of available cores for the running instance. In multi-nodes, each node performs its
own parallel loading and internode control messages which are used to communicate
regarding the ontology distribution and matching algorithms. Matched results provided
by matcher threads are submitted to accumulation and delivery phase for the MBO
creation and delivery [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>
          Ontology Aggregation Interface (OAI) accumulates matched results provided by
matcher threads. OAI is responsible for MBO creation by combining matched results as
mappings and delivering MBO via cloud storage platform. OAI provides a thread-safe
mechanism for all matcher threads to submit their matched results. After the completion
of all matched threads, OAI invokes MBO creation process which accumulates all the
matched results in a single MBO instance [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In case of multi-node distribution, OAI
also accumulates results from remote nodes after completion of their local matcher
threads. This is a summary version of the performance oriented ontology matching
process of SPHeRe system, extracted from [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], that provides a detailed version of the
overall process.
1.3
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Link to the system and parameters file</title>
      <p>1.4</p>
    </sec>
    <sec id="sec-5">
      <title>Link to the set of provided alignments (in align format)</title>
      <p>https://sites.google.com/a/bilalamin.com/sphere/results</p>
      <sec id="sec-5-1">
        <title>Results</title>
        <p>SPHeRe is deployed in multi-node configuration on virtual instances (VMs) over a
trinode private cloud equipped with commodity hardware. Each node is equipped with
Intel(R) Core i7(R) CPU, 8GB memory with Xen Hypervisor. Jena API is utilized for
its inferencing capabilities. As SPHeRe is using cloud infrastructure therefore initially
we have only targeted large biomedical ontologies track. The results are as follows:
2.1</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Large biomedical ontologies</title>
      <p>SPHeRe is a cloud based ontology matching system that provides the facility to user for
matching large scale ontologies without changing their hardware specifications.
Figure 3 shows the results of our proposed system in large biomedical ontologies track. It
has shown better precision values in almost all the tracks except task 4, while the recall
of the system needs to be improved.</p>
      <p>Tasks
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6</p>
      <p>Time
(s)
16
8136
154
20664
2486
10584
2359
2610
1577
2338
9389
9776
#Mappings</p>
      <p>Scores</p>
      <p>Incoherence Analysis
Precision Recall</p>
      <p>F-Measure</p>
      <p>Unsat.
0.960
0.846
0.916
0.614
0.924
0.881
0.772
0.753
0.162
0.160
0.469
0.466
0.856
0.797
0.275
0.254
0.623
0.610
367
1054
805
6523
≥ 46256
≥ 105,418</p>
      <p>Degree
3.6%
0.7%
3.4%
3.2%
≥ 61.6%
≥ 55.7%
Performance and precision are the strengths of our system. The design of proposed
system also adds to its strength as it is a extendible and reusbale system. Recall is
the main weakness of our system, but with the addition of new matching techniques
as bridge algorithms can improve this aspect and therefore accuracy can be improved.
Extendibility allows adoption of new bridge algorithms easily into the proposed system.</p>
    </sec>
    <sec id="sec-7">
      <title>Discussions on the way to improve the proposed system</title>
      <p>New bridge algorithms incorporating new matching techniques is the next line of plan
for the proposed system. Object oriented and ontology alignment design patterns are
to be implemented for matching different tracks of OAEI campaign. We also tend to
include instance based matching, and incorporate change management techniques in
the system.
4</p>
      <sec id="sec-7-1">
        <title>Conclusion</title>
        <p>SPHeRe system is a new initiative that relies on parallel execution of matcher bridge
algorithms for achieving better performance and accuracy. The system is still working on
improving the accuracy by incorporating more matcher bridge algorithms to increase
the recall value of the system. Performance of the proposed system is better as
compare to other system due to running large biomedical ontologies on a single system in
appropriate time.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Wordnet</surname>
          </string-name>
          <article-title>a lexical database for english</article-title>
          . http://wordnet.princeton.edu/, last visited in
          <source>October 2013</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Amin</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batool</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>W.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huh</surname>
            ,
            <given-names>E.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Sphere: A performance initiative towards ontology matching by implementing parallelism over cloud platform</article-title>
          .
          <source>In: Journal of Supercomputing</source>
          . Springer, in Press
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Agent-based ontology mapping and integration towards interoperability</article-title>
          .
          <source>Expert Systems</source>
          <volume>25</volume>
          (
          <issue>3</issue>
          ),
          <fpage>197</fpage>
          -
          <lpage>220</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Navarro</surname>
          </string-name>
          , G.:
          <article-title>A guided tour to approximate string matching</article-title>
          .
          <source>ACM computing surveys (CSUR) 33(1)</source>
          ,
          <fpage>31</fpage>
          -
          <lpage>88</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pavel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Euzenat</surname>
          </string-name>
          , J.:
          <article-title>Ontology matching: state of the art and future challenges. Knowledge and Data Engineering</article-title>
          ,
          <source>IEEE Transactions on (25)</source>
          ,
          <fpage>158</fpage>
          -
          <lpage>176</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>