=Paper=
{{Paper
|id=Vol-1963/paper606
|storemode=property
|title=On Partitioning for Ontology Alignment
|pdfUrl=https://ceur-ws.org/Vol-1963/paper606.pdf
|volume=Vol-1963
|authors=Sunny Pereira,Valerie Cross,Ernesto Jiménez-Ruiz
|dblpUrl=https://dblp.org/rec/conf/semweb/PereiraCJ17
}}
==On Partitioning for Ontology Alignment==
<pdf width="1500px">https://ceur-ws.org/Vol-1963/paper606.pdf</pdf>
<pre>
            On Partitioning for Ontology Alignment?

                Sunny Pereira1 , Valerie Cross1 , Ernesto Jiménez-Ruiz2
                   1
                       Miami University, Oxford, OH 45056, United States
                               2
                                 University of Oslo, Norway


1   Introduction
Ontology Alignment (OA) is the process of determining the mappings between two
ontologies. A number of systems currently exists and many of them are participating in
the annual Ontology Alignment Evaluation Initiative (OAEI).3
    Ontology alignment for two very large ontologies becomes time consuming and
memory intensive. For example, the largebio track in the OAEI campaign still poses
serious challenges to participants and only 4 out of 11 systems managed to complete
the largest largebio task. A general approach to address these challenges is to partition
each ontology into cohesive blocks. The matching task is then divided into smaller tasks
involving only relevant pair of blocks (i.e., partitions). Ontology partitioning brings new
challenges: how best to partition each ontology into blocks and whether the partitioning
process on each ontology should be independent of each other. Three main strategies
exist: (i) totally independent partitioning of both ontologies using various clustering
algorithms, (ii) independent partitioning of the better structured ontology and then use
its partitioning to direct the partitioning of the other, and (iii) dependent partitioning
between the two using a quick and efficient initial mapping of the two and then this
mapping directs their partitioning.
    A preliminary study of these three partitioning strategies and their effects on ontol-
ogy alignment is presented. The objective of this preliminary work is to determine the
suitability of these strategies to improve the performance of OA systems when dealing
with large ontologies, especially those unable to cope with the largest tasks.


2   Partitioning Algorithms
Partitioning strategies in [3], [4], and [5] all follow a similar method but differ in
whether ontology partitioning is done dependent or independent of the alignment task
and when the dependence is incorporated. The simplest approach, Partition Block Match-
ing (PBM) [4], first partitions the source and target ontologies separately into blocks.
Then I-SUB, an edit-distance based string comparison method, is used on the concepts’
labels to determine similarities between the source and target concepts. If the concept
labels’ string similarity meets a predefined user-settable value in [0,1], then the two
concepts become an anchor pair (aT , aS ).
?
   This work was partially funded by the BIGMED project (IKT 259055) and the SIRIUS Centre
   for Scalable Data Access (Research Council of Norway, project no.: 237889)
 3
   http://oaei.ontologymatching.org/
    Once the anchor pairs are found, a block similarity between each pair of blocks,
one from the source and one from the target, is determined using Dice’s coefficient
calculated as the ratio of the intersection of the anchor pairs between the two blocks bs
and bt over the sum of the total number of anchors in bs and the total number of anchors
in bt . A user-settable similarity threshold η in [0,1] must be met between two blocks
before marking them as a matched block pair. A block may be paired with more than
one block. After block matching, then the alignment between concepts in the matched
blocks can begin. Alignment only occurs between the concepts in each matched block
pair, not between the whole source and target ontologies.


    For dependent partitioning with PAP (partition, anchor, partition) and APP (anchor,
partition, partition) [3] anchor pairs are used to direct partitioning of one (PAP) or both
(APP) ontologies. If one ontology is more structured than the other, it is first indepen-
dently partitioned. Then the anchor pairs are determined and used to partition the other
ontology (PAP). If not, anchor pairs are first found and used to dependently partition
the two ontologies (APP).


     For PAP, the first two steps are identical to that of PBM: (i) independently partition
the more structured ontology OT , and (ii) find anchor pairs between OS and OT .The
less structured ontology OS is then partitioned using the blocks bTi built for OT in step
(i), and the anchor pairs (aT , aS ) identified in step (ii). Centers CBSi for a prospective
block bSi in OS are determined from the anchor pairs existing for bTi . For each aT , its
corresponding aS becomes a center CBSi for a prospective matching block bSi . A fu-
ture block bSi may have multiple centers since multiple anchor pairs may be associated
with block bTi . The centers CBSi are used to initialize the PBM algorithm for parti-
tioning instead of its simply using each concept in OS as an individual block. These
centers are given the highest cohesiveness value to begin growing the blocks from these
centers. A final block bSi built from a center is matched with the corresponding block
bTi . Not handled by PAP are blocks in OT and in OS that have no anchors in them.
These blocks are simply ignored and not considered in the mathcing.


    The APP method first finds anchors between OS and OT . It uses them to partition
OT by favoring the fusion of blocks sharing anchors with OS . It then partitions OS by
favoring the fusion of blocks sharing anchors with the blocks in the partitioned OT . The
blocks of OT are generated using PBM but with a modified measure that incorporates
not only the strength of the link between blocks bTi and bTj within OT but also the
strength of the link of BTj to OS as measured by the number of anchors in BTj relative
to the total number of anchors between OT and OS . The blocks of OS are generated by
PBM but with another modified measure that uses both the strength of the link between
the blocks bSi and bSj within OS and the strength of the link of bSj to bTk which is the
block in OT having the highest number of anchors with block bSi . Blocks of OS and
OT sharing the highest number of anchors become a matched block pair. One block of
OS can be matched with only one block of OT . Then alignment between the concepts
in each matched block pair is performed.
        Table 1. Experiments in largebio task 1 suing Wu-Palmer. Matching with LogMap.
                FMA Blocks     NCI Blocks       Matching                                     Time (s)
     Method                                              Coverage Precision Recall
                 #  Isolated    #   Isolated     Tasks                               Partitioning Matching
      PBM       55     15      141     60          87     0.821     0.845   0.743      40.248       85.162
      PAP       60     13      141     60          58     0.451     0.870   0.410      39.827       58.517
      APP       50     15      143     60          48     0.518     0.870   0.472      41.644       53.157

              Table 2. Experiments in largebio task 1 using Lin. Matching with LogMap.
                FMA Blocks      NCI Blocks      Matching                                     Time (s)
     Method                                              Coverage Precision Recall
                 #  Isolated    #    Isolated    Tasks                               Partitioning Matching
      PBM       46      6      180      53         83     0.801     0.833   0.728      52.454       81.689
      PAP       37      5      180      53         37     0.348     0.861   0.321      56.508       39.423
      APP       46      6      180      53         46     0.483     0.862   0.439      56.704       49.938


3      Experimental Methods

The PBM, PAP and APP partitioning methods have been implemented as independent
methods from the alignment system. In the preliminary experiments included in this pa-
per we report results for the systems LogMap [6] and FCA-Map [9]. In [3], [4], and [5]
a path-based semantic [8] similarity measure is used to determine link strength between
concepts within an ontology when creating blocks. In these experiments, the path-based
Wu-Palmer [8] as well as information content based Lin [7] semantic similarity mea-
sures are considered. The ontology structure is used in determining the information
content (IC) for a concept. The link strengths are calculated between concepts that only
differ by one in their depth within the ontology. The authors of the PBM method use
ISUB to find the anchors between concepts. In our experiments, anchors are found us-
ing an exact label match between two concepts in the two different ontologies. Each
identified block pair represents a matching (sub)task, however, since blocks are only
characterized by a set of concepts, they are first converted to (locality-based) ontology
modules [2] and then given to the ontology alignment system as input.
    The initial experiments were performed on task 1 of the OAEI largebio track,4 in-
volving small fragments of FMA and NCI, using all three methods. The results using
Wu-Palmer are shown below in Table 1 and those for Lin in Table 2. The parameters
used are an η of 0.05 for PBM, an α of 0.75 for APP. A maximum block size of 500 and
a depth difference of one for semantic similarity calculation is used for all three meth-
ods. Blocks with only one concept are considered isolated blocks. Coverage represents
how many of the entities occurring in the OAEI reference alignments are present in the
identified block pairs. The precision and recall are calculated over the combined align-
ment results for all the matching tasks (i.e., pair of modules extracted from the block
pairs). FMA blocks (resp. NCI blocks) represents the number of total blocks produced
after partitioning of the FMA ontology (resp. NCI ontology).
    The results from task 1 suggest that the PBM method provides much higher recall
values than the other two methods. The Wu-Palmer measure performed slightly better
than Lin. The next experiments examined how the PBM with the Wu-Palmer performed
on the OAEI largebio tasks that use the whole ontologies, that is, task 2, task 4 and task
6. The maximum block size is 3000. Table 3 presents these results.
 4
     http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/
        Table 3. Experiments with largebio whole ontologies using PBM with Wu-Palmer.
                   Source Blocks Target Blocks Matching                                   Time (s)
        Task      System                                Coverage Precision Recall
                     # Isolated # Isolated Tasks                                  Partitioning Matching
            LogMap                                                 0.468 0.675                    76.7
  FMA-NCI           151     2     256     91     69      0.763                        649
           FCA-Map                                                 0.506 0.698                  ≈ 8 hrs
FMA-SNOMED LogMap 388       9    3352 3273       154     0.594     0.571 0.423       4,807         385
SNOWMED-NCI LogMap 3357 3160 693         427     443     0.666     0.725 0.491       6,623         937


4      Discussion and future work
In this paper we have presented a preliminary evaluation of state of the art partitioning
algorithms for ontology alignment. The obtained results are not good as expected since,
after the partitioning and identification of the (sub)matching tasks, the coverage of the
entities in the reference alignments is rather low. For example, in the FMA-SNOMED
case only 59% of the entities appearing in the reference alignment are covered by the
modules in the identified matching tasks. In this case 41% of the entities were lost in
either isolated blocks or blocks for which a suitable pair could not be found.
    As expected, given the coverage of entities in the reference alignment, the results
obtained by LogMap are very low as compared to the results reported for LogMap in last
OAEI campaign [1]. In addition the partitioning step represents a considerable overhead
with respect LogMap’s computation times. Nevertheless, FCA-Map was successfully
run in task 2 of the largebio track using partitioning,5 while the system could not cope
with the task when given the whole FMA and NCI ontologies [1].
    In the close future we aim at investigating new algorithms to provide a suitable parti-
tioning for ontology alignment where the loss of coverage in the identified (sub)matching
tasks, in terms of entities of the reference alignments, is minimized. We also intend to
perform an extensive evaluation of the novel partitioning algorithms with all OAEI par-
ticipating systems, especially those failing to cope with the largest tasks.


References
1. Achichi, M., et al.: Results of the Ontology Alignment Evaluation Initiative 2016. In: Pro-
   ceedings of the 11th International Workshop on Ontology Matching. pp. 73–129 (2016)
2. Grau, B.C., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: Theory and
   practice. J. Artif. Intell. Res. 31, 273–318 (2008)
3. Hamdi, F., et al.: Alignment-based partitioning of large-scale ontologies. In: Advances in
   knowledge discovery and management (2010)
4. Hu, W., Qu, Y.: Block matching for ontologies. In: Int’l Sem. Web Conf. (2006)
5. Hu, W., et al.: Matching large ontologies: A divide-and-conquer approach. DKE (2008)
6. Jiménez-Ruiz, E., Grau, B.C.: LogMap: Logic-based and scalable ontology matching. In: Int’l
   Sem. Web Conf. (2011)
7. Lin, D., et al.: An information-theoretic definition of similarity. In: ICML (1998)
8. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: 32nd annual meeting on Asso-
   ciation for Computational Linguistics (1994)
9. Zhao, M., Zhang, S.: FCA-Map results for OAEI 2016. In: Proceedings of the 11th Interna-
   tional Workshop on Ontology Matching (2016)

 5
     Not tested in tasks 4 and 6 due to limited experimental time

</pre>