=Paper=
{{Paper
|id=Vol-2721/paper500
|storemode=property
|title=Towards Multiple Ontology Merging with CoMerger
|pdfUrl=https://ceur-ws.org/Vol-2721/paper500.pdf
|volume=Vol-2721
|authors=Samira Babalou,Birgitta Kรถnig-Ries
|dblpUrl=https://dblp.org/rec/conf/semweb/BabalouK20
}}
==Towards Multiple Ontology Merging with CoMerger==
Towards Multiple Ontology Merging with CoMerger
Samira Babalou ID , Birgitta Koฬnig-Ries ID ?
Heinz-Nixdorf Chair for Distributed Information Systems
Institute for Computer Science, Friedrich Schiller University Jena, Germany
{samira.babalou,birgitta.koenig-ries}@uni-jena.de
Abstract. To obtain a knowledge graph representing a domain of interest, it is
often necessary to combine several, independently developed ontologies. Existing
approaches are mostly limited to binary merge and lack scalability. This paper
presents the development of an efficient multiple ontologies merging method,
CoMerger. For efficient processing, rather than directly merging a large number
of ontologies, we group related concepts across ontologies into partitions and
merge first within and then across those partitions. Experiments on real-life
datasets confirm the feasibility of our approach and demonstrate its superiority
over binary strategies. Our implementation is available through a live web portal.
Keywords: Ontology merging . Partitioning . N-ary merge
1 Introduction
Ontologies represent the semantic model of data on the web. For many usecases,
individual ontologies cover just part of or one specific perspective on the domain of
interest. By merging them into one knowledge graph their complementarity can be
leveraged. The merge process plays an important role in multiple different aspects of
the Semantic Web (cf. [1]). Most existing approaches [2, 3] are limited to merging two
ontologies at a time, due to using a binary merge. Merging n > 2 ontologies in a single
step using a so called n-ary strategy [4], has not been extensively studied so far, but
seems to be promising to obtain better scalability. We aim to investigate to which extent
this is the case.
Let us consider five source ontologies with their correspondences depicted
by dashed lines (Fig. 1). To estimate the merge effort, we measure combining
the corresponding entities into an integrated entity |combine|, reconstructing the
relationship |reconst|, and output generation |output|. In the binary-ladder strategy [4],
a popular type of binary merge, first, O1 and O2 are combined into an O12 . Then, O12 is
merged with O3 and so on (see Fig. 1 and Table 1). The merged ontology obtained by an
n-ary approach has the same structure as the final merged ontology of the binary-ladder
merge for our example. The binary merge approach needs 10 combinations, 32
reconstructions, and 4 times output generation, while the n-ary method needs 6, 28, and
1, respectively. This means that for our example, the n-ary method reduces the effort to
?
Copyright ยฉ 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
๐ช๐ช1 ๐ถ๐ถ1 ๐ช๐ช2 ๐ถ๐ถ7 ๐ช๐ช5 ๐ช๐ช12 ๐ช๐ช1234 ๐ช๐ช12345
๐ถ๐ถ5 ๐ถ๐ถ1 ๐ถ๐ถ5 ๐ถ๐ถ7 ๐ถ๐ถ12 ๐ถ๐ถ1 ๐ถ๐ถ5 ๐ถ๐ถ5 ๐ถ๐ถ7
๐ถ๐ถ25 ๐ถ๐ถ7 ๐ถ๐ถ1
๐ถ๐ถ 4.
๐ถ๐ถ4. ๐ถ๐ถ4. ๐ถ๐ถ26
๐ถ๐ถ2 ๐ถ๐ถ4 ๐ถ๐ถ8 ๐ถ๐ถ10 ๐ถ๐ถ2 ๐ถ๐ถ10 ๐ถ๐ถ6
8.
25
๐ถ๐ถ26
8 ๐ถ๐ถ2 8 ๐ถ๐ถ2 ๐ถ๐ถ10
๐ถ๐ถ6 ๐ถ๐ถ9 ๐ถ๐ถ11 ๐ถ๐ถ10 ๐ถ๐ถ12 ๐ถ๐ถ12
๐ถ๐ถ9 ๐ถ๐ถ12 ๐ถ๐ถ 9.
๐ถ๐ถ3 ๐ถ๐ถ27 ๐ถ๐ถ3 ๐ถ๐ถ6 ๐ถ๐ถ6 ๐ถ๐ถ 9. 13
๐ถ๐ถ11 ๐ถ๐ถ 3. 13 ๐ถ๐ถ 3. ๐ถ๐ถ11.
๐ถ๐ถ11. ๐ถ๐ถ27
๐ช๐ช123 14 14 ๐ถ๐ถ18 19.
๐ช๐ช3 ๐ถ๐ถ5 ๐ถ๐ถ7 ๐ถ๐ถ12 ๐ถ๐ถ18
19
๐ถ๐ถ24 28
๐ถ๐ถ13 ๐ช๐ช4 ๐ถ๐ถ28 ๐ถ๐ถ29 ๐ถ๐ถ1 ๐ถ๐ถ15 ๐ถ๐ถ16. ๐ถ๐ถ29
๐ถ๐ถ19 ๐ถ๐ถ4. ๐ถ๐ถ15 31
๐ถ๐ถ30
๐ถ๐ถ14 ๐ถ๐ถ10 ๐ถ๐ถ22 ๐ถ๐ถ 17 ๐ถ๐ถ32
๐ถ๐ถ18 ๐ถ๐ถ24 ๐ถ๐ถ2 8 ๐ถ๐ถ11 ๐ถ๐ถ16 .20 ๐ถ๐ถ24
๐ถ๐ถ20 ๐ถ๐ถ30 ๐ถ๐ถ 17
๐ถ๐ถ15 ๐ถ๐ถ17 ๐ถ๐ถ22 ๐ถ๐ถ6 ๐ถ๐ถ 9. ๐ถ๐ถ18
๐ถ๐ถ 3. 13 .20 ๐ถ๐ถ23 ๐ถ๐ถ21
๐ถ๐ถ22
๐ถ๐ถ21 ๐ถ๐ถ23 ๐ถ๐ถ31 ๐ถ๐ถ32 14
๐ถ๐ถ17 ๐ถ๐ถ23
๐ถ๐ถ16 ๐ถ๐ถ15 ๐ถ๐ถ16 ๐ถ๐ถ21
Fig. 1: Five ontologies with the merged ontologies in each step of binary merge.
Table 1: Number of operations for merging the five sample ontologies.
N-ary Binary
O1 & O2 & O3 & O4 & O5 O1 & O2 O12 & O3 O123 & O4 O1234 & O5 Total
|combine| 6 1 2 4 3 10
|reconst| 28 5 8 6 13 32
|out put| 1 1 1 1 1 4
60, 87,5, and 25% of the effort needed with the binary method. The general pattern will
be the same for other examples. The achieved improvements are significant compared
to binary approaches, especially when dealing with a large number of ontologies.
To handle a large number of source ontologies, we aim to reduce the time
and operational complexity while achieving at least the same quality of the final
result compared to the binary merge or even improve upon it. In our n-ary method,
CoMerger1 , we develop an efficient merging technique that scales to many ontologies.
It takes as input a set of source ontologies OS = {O1 , ..., On } with a set of corresponding
sets CS extracted from their mappings and generates a merged ontology O M . At first,
the n source ontologies are partitioned into k blocks (k << n). After that, the blocks are
individually refined and finally combined to produce the merged ontology followed by
a global refinement.
2 Proposed Method
Our method implemented in CoMerger consists of three main phases: initialization,
partitioning, and combining. Mappings between source ontologies can be given or can
be created with a matching tool [5] embedded in CoMerger in a preprocessing step.
These mappings consists of sets of corresponding entities. The relationship determining
correspondence can be equality, similarity, or subset (is-a) types. In CoMerger, we
consider the similarity type with at least a given similarity value (1:1 mapping). We
maintain a model M which keeps the information across a group of correspondences
over multiple ontologies. We denote the cardinality of each corresponding set by
Card(cs j ), where cs j โ CS.
(1) Initialization Phase. We build an initial merge model I M and parse the source
ontologies by extracting all entities into the I M . After that, the list of corresponding
1
The tool as well as all data used for evaluation are available online at
http://comerger.uni-jena.de and https://github.com/fusion-jena/CoMerger
sets CS from the given mappings are processed to build the model of mappings M
over multiple ontologies. For each entry of M, a new integrated entity in I M is
created. To construct the initial relations between the entities in I M , all axioms from
the source ontologies should be processed. Thus, we translate the source ontologiesโ
axioms concerning the corresponding sets that exist in M. If an axiomโs entities have a
corresponding entity in M, the axiom is translated with the generated integrated entity;
if not, the axiom is imported into I M without any changes.
(2) Partitioning Phase. An ontology block L is a subset (or whole) of one (or more)
source ontologies with no overlap with other blocks. The number of blocks is denoted
by k and CL = {L1 , ..., Lk } is the set of all blocks. We decompose the ontologies
merging task Merge(O1 , ..., On ) to the blocks merging task Merge(L1 , ..., Lk ), k << n.
To achieve this, we use a set of pivot P classes:
(i) Finding pivot classes P. To find the pivot set of classes P for the divide process,
we measure a reputation degree of each class belonging to CS, as reputation(ct ) =
Card(ct ) ร Conn(ct ). It is based on the number of corresponding classes Card(ct ) along
with their connectivity degree Conn(ct ) from the respective source ontologies. The
classes with high Card(ct ) values in CS have the best overlap within OS . However,
contemplating only this metric tends to choose isolated classes as the pivot classes.
Thus, class connectivity has been taken into account. The connectivity degree Conn(ct )
of a class is indicated by the number of associated taxonomic and non-taxonomic
relations for that class. Thus, P is realized by a sorted list of CSโs elements based
on their reputation degrees.
(ii) Partitioner. This step divides all classes from I M into a set of blocks
CL = {L1 , ..., Lk } based on a structural-based similarity, in which classes close in the
hierarchy are considered similar and should be placed in the same block. Thus, once
a class is assigned to a block, all its adjacent classes consequently will also be added.
The first block L1 is created by picking the first Pโs element, which has the highest
reputation degree. For each corresponding class of Pโs element, all classes with their
adjacent classes on their respective ontologies are added to the block. Then, the next
element of P is selected to create a new block, if at least one of its classes has not
been assigned to the previous blocks. This process is continued until all elements of P
are processed. The overall number of blocks is determined based on the number of Pโs
elements and the amount of shared classes between Pโs elements.
(3) Combining Phase. This phase is split into two steps. In the intra-combination
step, the entities inside the blocks are combined to create local sub-ontologies. We
retrieve all existing axioms from I M . Each class is augmented by the original or
translated properties axioms, including taxonomic and non-taxonomic relations. We
assign each axiom to a block in which at least all its entities are contained. If the
classes of each axiom are distributed across multiple blocks, they are not added to
any block and are marked as distributed axioms distaxiom . Finally, the local refinement
process takes place for each sub-ontology. To further improve performance, this step
can utilize parallel processing. In the inter-combination step, the final merged ontology
O M is constructed. We follow a sequential merging based on the inter-block relatedness,
|distaxiom (Li ) โฉ distaxiom (L j )|. It is based on the number of shared distributed axioms
between two blocks Li and L j . At first, the two blocks with the highest inter-block
Table 2: Comparing n-ary (N), balanced (B), and ladder (L) merge strategies.
dataset4 dataset6 dataset10 dataset11 dataset12
N B L N B L N B L N B L N B L
time 1.3 3.5 4.5 1.8 7 8.3 61.8 252.5 507 62.3 230.3 592.2 150.7 806.3 4738.8
|Cor| 47 65 64 82 127 128 266 368 351 267 387 387 1139 2605 2154
|tr| 790 1270 1339 1310 2784 3462 6949 15773 31500 6960 16019 35458 30035 71884 153879
|RG | 21 27 23 41 48 51 143 167 191 143 179 200 977 1429 1553
|Mer.| 1 3 1 6 1 16 1 18 1 54
similarity value are merged. This includes adding all distributed axioms to them. Then,
the next block with the highest inter-block relatedness will be selected to be merged.
This process is continued until all blocks are merged. A set of global refinement will
be applied on the last combined block. Upon that, the merged ontology O M is built. To
apply local and global refinement within CoMerger, we include a list of General Merge
Requirements (GMR)s [6], containing, e.g., entities preservation, one type restriction,
acyclicity, and connectivity.
3 Experimental Evaluation
We have selected sets of ontologies frequently used for evaluation tasks (see our
repository), with 134 โค |axioms| โค 30364 and 2 โค n โค 55 and have created mappings
with SeeCOnt [5]. To demonstrate the feasibility of our n-ary strategy, we compared it
with binary ladder and balanced strategies [4]. The runtime performance (in seconds)
for each merge strategy is shown in the third row (time) of Table 2. The n-ary merge
is on average 4 (9) times faster than the balance (ladder) binary merge, respectively.
Moreover, we present the operation complexity of these three strategies in Table 2.
In each test of a binary merge, only the correspondence between two entities can be
integrated simultaneously into a new entity. However, in n-ary merge, the corresponding
entities from multiple source ontologies can be integrated into the new entities. Thus,
the number of total corresponding entities |Cor| in the binary merge is much higher
than in the n-ary approach. Consequently, the required amount of combining them into
new entities and translating their axioms |tr| is high. For global refinement, in most
cases, the n-ary approach requires less actions compared to binary merges. We also
compare the number of merge processes |Mer.|. While the n-ary approach only uses one
iteration, ladder and balance methods require n โ 1 merge processes. Overall the result
demonstrates that the n-ary approach has better (runtime and complexity) performance
than binary approaches. In terms of quality, the n-ary approach achieves comparable
results to its binary counterparts (see full result in our repository).
4 State of the Art
Merging strategies have been divided into two main categories [4]: โbinaryโ and
โn-aryโ. The binary approach allows the merging of two ontologies at a time, while
the n-ary strategy merges n ontologies (n > 2) in a single step. Most methodologies in
the literature, such as [2, 3] agree on adopting a binary strategy. The few existing n-ary
approaches [7, 8] suffer certain drawbacks: In [8], the final merge result depends on the
order in which the source tree-structured XML schemas are matched and merged. In [7],
the experimental tests were carried out on a few small source ontologies, only. Despite
the efforts of these research studies, developing an efficient, scalable n-ary method has
not been practically applied.
5 Conclusion
Existing ontology merge approaches scale rather poorly to the merging of multiple
ontologies. We proposed the n-ary multiple ontologies merging approach CoMerger
based on a partitioning based method, to overcome this issue. Taking advantage of the
parallelization is on our future research agenda.
Acknowledgments
S. Babalou is supported by a scholarship from German Academic Exchange Service
(DAAD).
References
1. M. T. Finke, R. W. Filice, and C. E. Kahn Jr, โIntegrating ontologies of human diseases,
phenotypes, and radiological diagnosis,โ J AM MED INFORM ASSN, vol. 26, no. 2,
pp. 149โ154, 2019.
2. M. Priya and C. A. Kumar, โAn approach to merge domain ontologies using granular
computing,โ Granular Computing, pp. 1โ26, 2019.
3. S. Raunich and E. Rahm, โTarget-driven merging of taxonomies with ATOM,โ Inf. Syst.,
vol. 42, pp. 1โ14, 2014.
4. C. Batini, M. Lenzerini, and S. B. Navathe, โA comparative analysis of methodologies for
database schema integration,โ In CSUR, 1986.
5. A. Algergawy, S. Babalou, M. J. Kargar, and S. H. Davarpanah, โSeeCOnt: A new
seeding-based clustering approach for ontology matching,โ in ADBIS, 2015.
6. S. Babalou and B. Koฬnig-Ries, โGMRs: Reconciliation of generic merge requirements in
ontology integration,โ In SEMANTICS Poster and Demo., 2019.
7. N. Maiz, M. Fahad, O. Boussaid, and F. Bentayeb, โAutomatic ontology merging by
hierarchical clustering and inference mechanisms,โ in I-KNOW, pp. 1โ3, 2010.
8. K. Saleem, Z. Bellahsene, and E. Hunt, โPorsche: Performance oriented schema mediation,โ
Inf. Syst., vol. 33, no. 7, pp. 637โ657, 2008.