=Paper= {{Paper |id=Vol-2721/paper500 |storemode=property |title=Towards Multiple Ontology Merging with CoMerger |pdfUrl=https://ceur-ws.org/Vol-2721/paper500.pdf |volume=Vol-2721 |authors=Samira Babalou,Birgitta Kรถnig-Ries |dblpUrl=https://dblp.org/rec/conf/semweb/BabalouK20 }} ==Towards Multiple Ontology Merging with CoMerger== https://ceur-ws.org/Vol-2721/paper500.pdf
    Towards Multiple Ontology Merging with CoMerger

                        Samira Babalou ID , Birgitta Koฬˆnig-Ries ID ?

                    Heinz-Nixdorf Chair for Distributed Information Systems
          Institute for Computer Science, Friedrich Schiller University Jena, Germany
                 {samira.babalou,birgitta.koenig-ries}@uni-jena.de



        Abstract. To obtain a knowledge graph representing a domain of interest, it is
        often necessary to combine several, independently developed ontologies. Existing
        approaches are mostly limited to binary merge and lack scalability. This paper
        presents the development of an efficient multiple ontologies merging method,
        CoMerger. For efficient processing, rather than directly merging a large number
        of ontologies, we group related concepts across ontologies into partitions and
        merge first within and then across those partitions. Experiments on real-life
        datasets confirm the feasibility of our approach and demonstrate its superiority
        over binary strategies. Our implementation is available through a live web portal.

        Keywords: Ontology merging . Partitioning . N-ary merge


1    Introduction

Ontologies represent the semantic model of data on the web. For many usecases,
individual ontologies cover just part of or one specific perspective on the domain of
interest. By merging them into one knowledge graph their complementarity can be
leveraged. The merge process plays an important role in multiple different aspects of
the Semantic Web (cf. [1]). Most existing approaches [2, 3] are limited to merging two
ontologies at a time, due to using a binary merge. Merging n > 2 ontologies in a single
step using a so called n-ary strategy [4], has not been extensively studied so far, but
seems to be promising to obtain better scalability. We aim to investigate to which extent
this is the case.
    Let us consider five source ontologies with their correspondences depicted
by dashed lines (Fig. 1). To estimate the merge effort, we measure combining
the corresponding entities into an integrated entity |combine|, reconstructing the
relationship |reconst|, and output generation |output|. In the binary-ladder strategy [4],
a popular type of binary merge, first, O1 and O2 are combined into an O12 . Then, O12 is
merged with O3 and so on (see Fig. 1 and Table 1). The merged ontology obtained by an
n-ary approach has the same structure as the final merged ontology of the binary-ladder
merge for our example. The binary merge approach needs 10 combinations, 32
reconstructions, and 4 times output generation, while the n-ary method needs 6, 28, and
1, respectively. This means that for our example, the n-ary method reduces the effort to
?
    Copyright ยฉ 2020 for this paper by its authors. Use permitted under Creative Commons
    License Attribution 4.0 International (CC BY 4.0).
    ๐’ช๐’ช1 ๐ถ๐ถ1                     ๐’ช๐’ช2          ๐ถ๐ถ7                 ๐’ช๐’ช5                  ๐’ช๐’ช12                                                          ๐’ช๐’ช1234                                             ๐’ช๐’ช12345
                         ๐ถ๐ถ5                                                                      ๐ถ๐ถ1             ๐ถ๐ถ5               ๐ถ๐ถ7      ๐ถ๐ถ12              ๐ถ๐ถ1        ๐ถ๐ถ5                                           ๐ถ๐ถ5                     ๐ถ๐ถ7
                                                                          ๐ถ๐ถ25                                                                                                       ๐ถ๐ถ7                         ๐ถ๐ถ1
                                                                                                                                                                                                                                ๐ถ๐ถ 4.
                                                                                                        ๐ถ๐ถ4.                                                     ๐ถ๐ถ4.                                                                                  ๐ถ๐ถ26
     ๐ถ๐ถ2      ๐ถ๐ถ4                     ๐ถ๐ถ8          ๐ถ๐ถ10                                   ๐ถ๐ถ2                                  ๐ถ๐ถ10                                                                               ๐ถ๐ถ6
                                                                                                                                                                                                                                   8.
                                                                                                                                                                                                                                   25
                                                                 ๐ถ๐ถ26
                                                                                                            8                                         ๐ถ๐ถ2             8                                  ๐ถ๐ถ2                                   ๐ถ๐ถ10
                         ๐ถ๐ถ6                                                                                           ๐ถ๐ถ9                 ๐ถ๐ถ11                                     ๐ถ๐ถ10        ๐ถ๐ถ12                                                    ๐ถ๐ถ12
                                            ๐ถ๐ถ9           ๐ถ๐ถ12                                                                                                                                                            ๐ถ๐ถ 9.
     ๐ถ๐ถ3                                                                  ๐ถ๐ถ27          ๐ถ๐ถ3           ๐ถ๐ถ6                                                       ๐ถ๐ถ6       ๐ถ๐ถ 9.                                               13
                                                   ๐ถ๐ถ11                                                                                             ๐ถ๐ถ 3.                   13                         ๐ถ๐ถ 3.                       ๐ถ๐ถ11.
                                                                                                                                                                                   ๐ถ๐ถ11.                                                         ๐ถ๐ถ27
                                                                                      ๐’ช๐’ช123                                                           14                                                 14             ๐ถ๐ถ18            19.
 ๐’ช๐’ช3                                                                                                            ๐ถ๐ถ5           ๐ถ๐ถ7          ๐ถ๐ถ12                           ๐ถ๐ถ18
                                                                                                                                                                                     19
                                                                                                                                                                                                ๐ถ๐ถ24                                    28
                  ๐ถ๐ถ13          ๐’ช๐’ช4                              ๐ถ๐ถ28          ๐ถ๐ถ29             ๐ถ๐ถ1                                                                                                        ๐ถ๐ถ15 ๐ถ๐ถ16.                                   ๐ถ๐ถ29
                                            ๐ถ๐ถ19                                                        ๐ถ๐ถ4.                                            ๐ถ๐ถ15                                                      31
                                                                                                                                                                                                                                   ๐ถ๐ถ30
           ๐ถ๐ถ14                                                                                                               ๐ถ๐ถ10                                                       ๐ถ๐ถ22                           ๐ถ๐ถ 17                  ๐ถ๐ถ32
                         ๐ถ๐ถ18                             ๐ถ๐ถ24                          ๐ถ๐ถ2                 8                              ๐ถ๐ถ11                 ๐ถ๐ถ16                                                     .20                            ๐ถ๐ถ24
                                 ๐ถ๐ถ20                                   ๐ถ๐ถ30                                                                                              ๐ถ๐ถ 17
    ๐ถ๐ถ15          ๐ถ๐ถ17                       ๐ถ๐ถ22                                                     ๐ถ๐ถ6             ๐ถ๐ถ 9.         ๐ถ๐ถ18
                                                                                        ๐ถ๐ถ 3.                           13                                                  .20                 ๐ถ๐ถ23                      ๐ถ๐ถ21
                                                                                                                                                                                                                                        ๐ถ๐ถ22
                                  ๐ถ๐ถ21               ๐ถ๐ถ23         ๐ถ๐ถ31         ๐ถ๐ถ32       14
                                                                                                                              ๐ถ๐ถ17                                                                                                                    ๐ถ๐ถ23
           ๐ถ๐ถ16                                                                                       ๐ถ๐ถ15        ๐ถ๐ถ16                                                            ๐ถ๐ถ21




           Fig. 1: Five ontologies with the merged ontologies in each step of binary merge.

                         Table 1: Number of operations for merging the five sample ontologies.
                                              N-ary                           Binary
                                     O1 & O2 & O3 & O4 & O5 O1 & O2 O12 & O3 O123 & O4 O1234 & O5 Total
                         |combine|              6              1        2        4          3      10
                          |reconst|            28              5        8        6         13      32
                           |out put|            1              1        1        1          1       4



60, 87,5, and 25% of the effort needed with the binary method. The general pattern will
be the same for other examples. The achieved improvements are significant compared
to binary approaches, especially when dealing with a large number of ontologies.
     To handle a large number of source ontologies, we aim to reduce the time
and operational complexity while achieving at least the same quality of the final
result compared to the binary merge or even improve upon it. In our n-ary method,
CoMerger1 , we develop an efficient merging technique that scales to many ontologies.
It takes as input a set of source ontologies OS = {O1 , ..., On } with a set of corresponding
sets CS extracted from their mappings and generates a merged ontology O M . At first,
the n source ontologies are partitioned into k blocks (k << n). After that, the blocks are
individually refined and finally combined to produce the merged ontology followed by
a global refinement.

2           Proposed Method
Our method implemented in CoMerger consists of three main phases: initialization,
partitioning, and combining. Mappings between source ontologies can be given or can
be created with a matching tool [5] embedded in CoMerger in a preprocessing step.
These mappings consists of sets of corresponding entities. The relationship determining
correspondence can be equality, similarity, or subset (is-a) types. In CoMerger, we
consider the similarity type with at least a given similarity value (1:1 mapping). We
maintain a model M which keeps the information across a group of correspondences
over multiple ontologies. We denote the cardinality of each corresponding set by
Card(cs j ), where cs j โˆˆ CS.
    (1) Initialization Phase. We build an initial merge model I M and parse the source
ontologies by extracting all entities into the I M . After that, the list of corresponding
 1
     The tool as well as all data used for evaluation are available online at
     http://comerger.uni-jena.de and https://github.com/fusion-jena/CoMerger
sets CS from the given mappings are processed to build the model of mappings M
over multiple ontologies. For each entry of M, a new integrated entity in I M is
created. To construct the initial relations between the entities in I M , all axioms from
the source ontologies should be processed. Thus, we translate the source ontologiesโ€™
axioms concerning the corresponding sets that exist in M. If an axiomโ€™s entities have a
corresponding entity in M, the axiom is translated with the generated integrated entity;
if not, the axiom is imported into I M without any changes.
    (2) Partitioning Phase. An ontology block L is a subset (or whole) of one (or more)
source ontologies with no overlap with other blocks. The number of blocks is denoted
by k and CL = {L1 , ..., Lk } is the set of all blocks. We decompose the ontologies
merging task Merge(O1 , ..., On ) to the blocks merging task Merge(L1 , ..., Lk ), k << n.
To achieve this, we use a set of pivot P classes:
    (i) Finding pivot classes P. To find the pivot set of classes P for the divide process,
we measure a reputation degree of each class belonging to CS, as reputation(ct ) =
Card(ct ) ร— Conn(ct ). It is based on the number of corresponding classes Card(ct ) along
with their connectivity degree Conn(ct ) from the respective source ontologies. The
classes with high Card(ct ) values in CS have the best overlap within OS . However,
contemplating only this metric tends to choose isolated classes as the pivot classes.
Thus, class connectivity has been taken into account. The connectivity degree Conn(ct )
of a class is indicated by the number of associated taxonomic and non-taxonomic
relations for that class. Thus, P is realized by a sorted list of CSโ€™s elements based
on their reputation degrees.
    (ii) Partitioner. This step divides all classes from I M into a set of blocks
CL = {L1 , ..., Lk } based on a structural-based similarity, in which classes close in the
hierarchy are considered similar and should be placed in the same block. Thus, once
a class is assigned to a block, all its adjacent classes consequently will also be added.
The first block L1 is created by picking the first Pโ€™s element, which has the highest
reputation degree. For each corresponding class of Pโ€™s element, all classes with their
adjacent classes on their respective ontologies are added to the block. Then, the next
element of P is selected to create a new block, if at least one of its classes has not
been assigned to the previous blocks. This process is continued until all elements of P
are processed. The overall number of blocks is determined based on the number of Pโ€™s
elements and the amount of shared classes between Pโ€™s elements.
    (3) Combining Phase. This phase is split into two steps. In the intra-combination
step, the entities inside the blocks are combined to create local sub-ontologies. We
retrieve all existing axioms from I M . Each class is augmented by the original or
translated properties axioms, including taxonomic and non-taxonomic relations. We
assign each axiom to a block in which at least all its entities are contained. If the
classes of each axiom are distributed across multiple blocks, they are not added to
any block and are marked as distributed axioms distaxiom . Finally, the local refinement
process takes place for each sub-ontology. To further improve performance, this step
can utilize parallel processing. In the inter-combination step, the final merged ontology
O M is constructed. We follow a sequential merging based on the inter-block relatedness,
|distaxiom (Li ) โˆฉ distaxiom (L j )|. It is based on the number of shared distributed axioms
between two blocks Li and L j . At first, the two blocks with the highest inter-block
    Table 2: Comparing n-ary (N), balanced (B), and ladder (L) merge strategies.
           dataset4         dataset6          dataset10          dataset11          dataset12
         N   B       L   N     B      L   N      B       L   N      B       L   N      B       L
  time 1.3 3.5      4.5 1.8     7    8.3 61.8 252.5 507     62.3 230.3 592.2 150.7 806.3 4738.8
 |Cor| 47    65     64  82 127 128 266          368     351 267    387     387 1139 2605      2154
   |tr| 790 1270 1339 1310 2784 3462 6949 15773 31500 6960 16019 35458 30035 71884 153879
  |RG | 21   27     23  41     48    51  143    167     191 143    179     200 977   1429     1553
|Mer.| 1         3       1         6      1         16       1         18        1         54




similarity value are merged. This includes adding all distributed axioms to them. Then,
the next block with the highest inter-block relatedness will be selected to be merged.
This process is continued until all blocks are merged. A set of global refinement will
be applied on the last combined block. Upon that, the merged ontology O M is built. To
apply local and global refinement within CoMerger, we include a list of General Merge
Requirements (GMR)s [6], containing, e.g., entities preservation, one type restriction,
acyclicity, and connectivity.


3   Experimental Evaluation

We have selected sets of ontologies frequently used for evaluation tasks (see our
repository), with 134 โ‰ค |axioms| โ‰ค 30364 and 2 โ‰ค n โ‰ค 55 and have created mappings
with SeeCOnt [5]. To demonstrate the feasibility of our n-ary strategy, we compared it
with binary ladder and balanced strategies [4]. The runtime performance (in seconds)
for each merge strategy is shown in the third row (time) of Table 2. The n-ary merge
is on average 4 (9) times faster than the balance (ladder) binary merge, respectively.
Moreover, we present the operation complexity of these three strategies in Table 2.
In each test of a binary merge, only the correspondence between two entities can be
integrated simultaneously into a new entity. However, in n-ary merge, the corresponding
entities from multiple source ontologies can be integrated into the new entities. Thus,
the number of total corresponding entities |Cor| in the binary merge is much higher
than in the n-ary approach. Consequently, the required amount of combining them into
new entities and translating their axioms |tr| is high. For global refinement, in most
cases, the n-ary approach requires less actions compared to binary merges. We also
compare the number of merge processes |Mer.|. While the n-ary approach only uses one
iteration, ladder and balance methods require n โˆ’ 1 merge processes. Overall the result
demonstrates that the n-ary approach has better (runtime and complexity) performance
than binary approaches. In terms of quality, the n-ary approach achieves comparable
results to its binary counterparts (see full result in our repository).


4   State of the Art

Merging strategies have been divided into two main categories [4]: โ€œbinaryโ€ and
โ€œn-aryโ€. The binary approach allows the merging of two ontologies at a time, while
the n-ary strategy merges n ontologies (n > 2) in a single step. Most methodologies in
the literature, such as [2, 3] agree on adopting a binary strategy. The few existing n-ary
approaches [7, 8] suffer certain drawbacks: In [8], the final merge result depends on the
order in which the source tree-structured XML schemas are matched and merged. In [7],
the experimental tests were carried out on a few small source ontologies, only. Despite
the efforts of these research studies, developing an efficient, scalable n-ary method has
not been practically applied.


5   Conclusion
Existing ontology merge approaches scale rather poorly to the merging of multiple
ontologies. We proposed the n-ary multiple ontologies merging approach CoMerger
based on a partitioning based method, to overcome this issue. Taking advantage of the
parallelization is on our future research agenda.



Acknowledgments
S. Babalou is supported by a scholarship from German Academic Exchange Service
(DAAD).


References
1. M. T. Finke, R. W. Filice, and C. E. Kahn Jr, โ€œIntegrating ontologies of human diseases,
   phenotypes, and radiological diagnosis,โ€ J AM MED INFORM ASSN, vol. 26, no. 2,
   pp. 149โ€“154, 2019.
2. M. Priya and C. A. Kumar, โ€œAn approach to merge domain ontologies using granular
   computing,โ€ Granular Computing, pp. 1โ€“26, 2019.
3. S. Raunich and E. Rahm, โ€œTarget-driven merging of taxonomies with ATOM,โ€ Inf. Syst.,
   vol. 42, pp. 1โ€“14, 2014.
4. C. Batini, M. Lenzerini, and S. B. Navathe, โ€œA comparative analysis of methodologies for
   database schema integration,โ€ In CSUR, 1986.
5. A. Algergawy, S. Babalou, M. J. Kargar, and S. H. Davarpanah, โ€œSeeCOnt: A new
   seeding-based clustering approach for ontology matching,โ€ in ADBIS, 2015.
6. S. Babalou and B. Koฬˆnig-Ries, โ€œGMRs: Reconciliation of generic merge requirements in
   ontology integration,โ€ In SEMANTICS Poster and Demo., 2019.
7. N. Maiz, M. Fahad, O. Boussaid, and F. Bentayeb, โ€œAutomatic ontology merging by
   hierarchical clustering and inference mechanisms,โ€ in I-KNOW, pp. 1โ€“3, 2010.
8. K. Saleem, Z. Bellahsene, and E. Hunt, โ€œPorsche: Performance oriented schema mediation,โ€
   Inf. Syst., vol. 33, no. 7, pp. 637โ€“657, 2008.