Multi-view Embedding for Biomedical Ontology
                          Matching?

Weizhuo Li1,2 , Xuxiang Duan3 , Meng Wang1,2 , XiaoPing Zhang4(B) , and Guilin Qi1,2
      1
        School of Computer Science and Engineering, Southeast University, Nanjing, China.
             liweizhuo@amss.ac.cn, {meng.wang,gqi}@seu.edu.cn
    2
      Key Laboratory of Computer Network and Information Integration (Southeast University),
                                Ministry of Education, China.
      3
        School of Mathematical Sciences, Chongqing Normal University, Chongqing, China.
                                  duanxx9156@163.com
                4
                   China Academy of Chinese Medical Sciences, Beijing, China.
                               xiao ping zhang@139.com


          Abstract. The goal of ontology matching (OM) is to identify mappings be-
          tween entities from different yet overlapping ontologies so as to facilitate se-
          mantic integration, reuse and interoperability. Representation learning methods
          have been applied to OM tasks with the development of deep learning. How-
          ever, there still exist two limitations. Firstly, these methods are of poor capability
          of encoding sparse entities in ontologies. Secondly, most methods focus on the
          terminological-based features to learn word vectors for discovering mappings,
          but they do not make full use of structural relations in ontologies. It may cause
          that these methods heavily rely on the performance of pre-training and are limited
          without dictionaries or sufficient textual corpora. To address these issues, we pro-
          pose an alternative ontology matching framework called MultiOM, which models
          the matching process by embedding techniques from multiple views. We design
          different loss functions based on cross-entropy to learn the vector representations
          of concepts, and further propose a novel negative sampling skill tailored for the
          structural relations asserted in ontologies. The preliminary result on real-world
          biomedical ontologies indicates that MultiOM is competitive with several OAEI
          top-ranked systems in terms of F1-measure.

          Key words: Ontology Matching, Embedding, Cross-Entropy, Negative Sampling


1      Introduction
In the Semantic Web, ontologies aim to model domain conceptualizations so that appli-
cations built upon them can be compatible with each other by sharing the same mean-
ings. Life science is one of the most prominent application areas of ontology technol-
ogy. Many biomedical ontologies have been developed and utilized in real-world sys-
tems including Foundational Model of Anatomy (FMA)5 , Adult Mouse Anatomy (MA)
?
   Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons Li-
   cense Attribution 4.0 International (CC BY 4.0). This work was partially supported by the
   National Key Research and Development Program of China under grant (2018YFC0830200)
   the Natural Science Foundation of China grants (U1736204), the Fundamental Research Funds
   for the Central public welfare research institutes (ZZ11-064), the Fundamental Research Funds
   for the Central Universities (3209009601).
 5
   http://si.washington.edu/projects/fma
for anatomy6 , National Cancer Institute Thesaurus (NCI)7 for disease and Systematized
Nomenclature of Medicine-Clinical Terms (SNOMED-CT)8 for clinical medicine. To
integrate and migrate data among applications, it is crucial to first establish mappings
between the entities of their respective ontologies. As ontologies in the same domain are
often developed for various purposes, there exist several differences in coverage, gran-
ularity, naming, structure and many other aspects. It severely impedes the sharing and
reuse of ontologies. Therefore, ontology matching (OM) techniques devote to identify
mappings across ontologies in order to alleviate above heterogeneities [1].
     In the last ten years, many automatic systems are developed so as to discover map-
pings between independently developed ontologies and obtain encouraging results (see
[2, 3] for a comprehensive and up-to-date survey). Up to now, the mainstream meth-
ods (e.g., LogMap [4], AML [5], FCA-Map [6], XMap [7]) still focus on engineering
features from terminological, structural, extensional (individuals of concepts) informa-
tion and external resource [1]. These features are utilized to compute the similarities
of ontological entities (i.e., concepts, properties, individuals) for guiding the ontology
matching. With the development of deep learning [8], there also exist several works
(e.g., ERSOM [9], DeepAlignment [10], SCBOW + DAE(O) [11] OntoEmma [12])
that try to shift from feature engineering to representation learning. The assumption is
that semantically similar or related words appear in similar contexts. Therefore, word
vectors own the potentials that can bring significant value to OM given the fact that a
great deal of ontological information comes in textual form [10]. Nevertheless, there
still exist two challenges that need to be solved:

 – Sparsity Problem for Embedding Learning: One of the main difficulties for em-
   bedding learning is of poor capability of encoding sparse entities. Even in large-
   scale medical ontologies with lots of relations, most knowledge graph embedding
   techniques (e.g., TransE [13]) are still not applicable. Zhang et al. [14] observed
   that the prediction results of entities were highly related to their frequency, and the
   results of sparse entities were much worse than those of frequent ones.
 – Limitation Problem for External Resource: Thesaurus is one kind of external re-
   source that is usually employed in matching systems such as WordNet [15], UMLS
   Metathesaurus9 . In addition, textual descriptions can also be employed for ontol-
   ogy matching [11, 12]. Nevertheless, these methods based on representation learn-
   ing rely heavily on the performance of pre-training. Therefore, it may limit their
   scalability if there exist no dictionaries or sufficient textual corpora.

    To address above problems, we propose MultiOM, an alternative ontology match-
ing framework based on embedding techniques from multiple views. The underlying
idea is to divide the process of OM into different modules (i.e., lexical-based module,
structural-based module, resource-based module) and employ embedding techniques
to soften these modules. Existing works [16, 17] show that identifying multiple views
can sufficiently represent the data and improve the accuracy and robustness of corre-
sponding tasks. Therefore, we employ this idea to characterize the process of OM and
try to alleviate the sparsity problem for embedding learning indirectly. More precisely,
different loss functions are designed based on cross-entropy to model different views
 6
   http://informatics.jax.org/vocab/gxd/ma ontology
 7
   https://ncit.nci.nih.gov/
 8
   http://www.snomed.org/snomed-ct/
 9
   https://www.nlm.nih.gov/research/umls/knowledge sources/metathesaurus/index.html
among ontologies and learn the vector representations of ontological entities. With con-
tinuous vector representation, we can obtain more similar concepts and discover more
potential mappings among ontologies. Furthermore, we design a novel negative sam-
pling tailored for structural relations (e.g., subclassOf relations, disjointWith relations)
asserted in ontologies, which can obtain better vector representations of entities for OM.
    The contributions of our study are summarized as follows.
  – We propose an alternative ontology matching framework with embedding tech-
    niques from multiple views, and design loss functions based on cross-entropy to
    model different views for learning vector representations of ontological entities.
  – We design a novel negative sampling skill tailored for structural relations asserted
    in ontologies, which can obtain better vector representations of concepts.
  – We implement our method and evaluate it on real-world biomedical ontologies.
    The preliminary result indicates that MultiOM is competitive with several OAEI
    top-ranked systems in terms of F1-measure.

2     Related work
2.1   Feature-based methods for biomedical ontology matching
There exist various feature-based strategies applied on the scenarios biomedical ontol-
ogy matching, including terminological-based features, structural-based features and
employing external semantic thesauruses for discovering semantically similar entities.
     LogMap [4] relies on lexical and structural indexes to enhance its scalability. To
scale to large ontologies and minimize the number of logical errors in the aligned on-
tologies, LogMap uses a horn propositional logic representation of the extended hierar-
chy of each ontology together with all existing mappings and employs Dowling-Gallier
algorithm to model propositional horn satisfiability.
     AML [5] is originally developed to tackle the challenges of matching biomedical
ontologies. It employs various sophisticated features and domain-specific thesauruses to
perform OM. Besides, AML introduces a modularization-based technique to extract the
core fragments of the ontologies that contain solely the necessary classes and relations
caused by disjoint restrictions, which can repair the incoherent alignments effectively.
     FCA-Map [6] is an ontology matching system based on formal concept analysis
(FCA), in which five types of formal contexts are constructed in an incremental way,
and their derived concept lattices are used to cluster the commonalities among classes
and properties at various lexical and structural levels, respectively.
     XMap [7] is a scalable matching system that implements parallel processing tech-
niques to enable the composition of basic sophisticated features. It also relies on the
employment of external resources such as UMLS Metathesarus to improve the perfor-
mance of ontology matching.
     PhenomeNet [18] exploits an axiom-based approach for aligning phenotype ontolo-
gies, which makes use of the PATO ontology and Entity-Quality definition patterns so
as to complement several shortcomings of feature-based methods.
     Feature-based methods mainly employ crafting features of the data to achieve spe-
cific tasks. Unfortunately, these hand-crafted features will be limited for a given task
and face the bottleneck of improvement. Cheatham and Hitzler showed that the per-
formance of ontology matching based on such engineered features varies greatly with
the domain described by ontologies [19]. As a complement to feature engineering, con-
tinuous vectors representing ontological entities can capture the potential associations
among features, which is helpful to discover more mappings among ontologies.
2.2   Representation learning methods for biomedical ontology matching

Representation learning have so far limited impacts on OM, specifically in biomedical
ontologies. To the best of our knowledge, only five approaches have explored the use of
unsupervised representation learning techniques for ontology matching.
    Zhang et al. [20] is one of the first that investigate the use of word vectors for
ontology matching. They align ontologies based on word2vec vectors [21] trained on
Wikipedia. In addition, they use the semantic transformations to complement the lexical
information such as names, labels, comments and describing entities. The strategy of
entity matching is based on maximum similarity.
    Xiang et al. [9] propose an entity representation learning algorithm based on Stacked
Auto-Encoders, called ERSOM. To describe an ontological entity (i.e., concept, prop-
erty), They design a combination of its ID, labels, comments, structural relations and
related individuals. The similarity of entities is computed with a fixed point algorithm.
Finally, ERSOM generates an alignment based on the stable marriage strategy.
    DeepAlignment [10] is an unsupervised matching system, which refines pre-trained
word vectors aiming at deriving the descriptions of entities for OM. To represent the
ontological entities better, the authors represent words by learning their representations
and using synonymy and antonymy constraints extracted from general lexical resources
and information captured implicitly in ontologies.
    SCBOW + DAE(O) [11] is representation learning framework based on termino-
logical embeddings, in which the retrofitted word vectors are introduced and learned
by the domain knowledge encoded in ontologies and semantic lexicons. In addition,
SCBOW + DAE(O) incorporates an outlier detection mechanism based on a denoising
autoencoder that is shown to improve the performance of alignments.
    Wang et al. [12] propose a neural architecture tailored for biomedical ontology
matching called OntoEmma, It can encode a variety of information and derive large
amounts of labeled data for training the model. Moreover, they utilize natural language
texts associated with entities to further improve the quality of alignments.
    However, there exist two limitations for above methods. One is the sparsity prob-
lem of structural relations. To avoid the poor capability of encoding sparse relations,
above methods prefer terminological-based features to learn word vectors for discover-
ing mappings, but they do not make full use of structural relations in ontologies. The
other is that these methods rely heavily on the performance of pre-training, which may
limit their scalability if there exist no dictionaries or sufficient textual corpora.


3     Muti-view Embedding for Biomedical Ontology Matching

In the scenario of biomedical ontology matching, matching systems mainly focus on
mappings of concepts with equivalent relations (Ci , Cj , ≡, n). Thus, in the remainder
of the paper, we only consider these type of mapping for biomedical ontology matching.


3.1   MultiOM

Existing works [16, 17] show that identifying multiple views that can sufficiently repre-
sent the data and improve the accuracy and robustness of corresponding tasks. Inspired
by their works, we characterize the process of OM from multiple views and try to alle-
viate the sparsity problem for embedding learning indirectly.
                                   Ontology


                                Resource-view
                                 Embedding                                        O1
             O1                 Structural-view          Muti-view
                                 Embedding              Combination

                                 Lexical-view                                O2
             O2                  Embedding

                             Fig. 1: The framework of MultiOM


    The framework of MutiOM is shown in Fig. 1. Given two biomedical ontologies
O1 and O2 , we first extract the concepts and their information of ontologies. Then, we
divide the process of OM into three embedding modules from different views, which
compose of lexical-view embedding, structural-view embedding and resource-view em-
bedding. Domain ontologies in the resource-based module, are treated as bridges to con-
nect source ontology and target one for discovering more potential mappings. With a
designed combination strategy based on mutual assessment, we obtain a final alignment
among given ontologies.
    Different from feature-based methods, we utilize ontological information to learn
the continuous vector representations of concepts by embedding techniques, which can
discover more potential mappings among ontologies. There exist different granularity
of vector representations of modules in MultiOM. In lexical-based module, each con-
cept is divided into several tokens {t1 , t2 , ..., tn } that are represented as k-dimensional
continuous vectors ti , tj ∈ Rk . The similarity of concepts is measured based on these
word vectors by the designed algorithm. Relatively, for structural-based module and
resource-based module, each concept C is represented as a d-dimensional continuous
vector C ∈ Rd , and their similarities are calculated based on cosine measure.

Lexical-view Embedding The lexical-view embedding module is mainly based on TF-
IDF algorithm, which is one of the most effective string similarity metrics for ontology
matching [19]. According to the assumption of TF-IDF, concepts in one ontology can
be represented as a bag of tokens. Then, every concept Ci is regarded as a document
and the tokens {t1 , t2 , ..., tl } of each concept are treated as terms. Inspired by the idea
soft TF-IDF [19], we propose an embedding-based TF-IDF strategy to calculate the
similarities of concepts, More precisely, the similarity of each concept pair is calculated
according to the similarities of their tokens, which is obtained based on the cosine
measure of tokens’ vectors representations rather than the string equivalent of them.
The corresponding formula is defined as follows.
                                           X
                     Sim(C1 , C2 ) =          wi · arg max cos(t1i , t2j ),                (1)
                                                    j
                                       i=1

where C1 and C2 are concepts from O1 and O2 , t1i and t2j are vector representations
of tokens t1i and t2i that belong to C1 and C2 . wi is a weight of token t1i in C1 that is
calculated as follows.
                                          TFIDF(t1i )
                                    wi = P
                                         n             ,                                        (2)
                                           TFIDF(t1l )
                                            l=1
where n is the number of tokens, TFIDF(·) returns the TF-IDF value of each token.
    As cosine measure of t1i and t2j is a continuous value, so this embedding-based
TF-IDF strategy is able to obtain more similar concepts and discover more potential
mappings. Nevertheless, our softened strategy depends on the quality of embedding
of tokens and may generate more wrong mappings. Therefore, we utilize pre-training
vectors to cover the tokens of ontologies as soon as possible (see Section 4.2). On the
other hand, we employ the mappings generated by other embedding modules to assess
the quality of these mappings in lexical-view module (see Section 3.3).

Structural-view Embedding As mentioned before, most proposed methods focus on
the terminological-based features to learn word vectors for ontology matching, but they
do not make full use of structural relations in ontologies. Relatively, we try to generate
mappings from the structural view. To obtain more candidate mappings for training
embedding of concepts, we assume that the mappings generated by equivalent strings
or their synonym labels are correct, and define a loss function based on cross-entropy to
optimize the vector representations of concepts. The loss function is defined as follows.
                X                                      X
 lSE = −                   logfSE (C1 , C2 ) −                   log(1 − fSE (C10 , C20 )),
           (C1 ,C2 ,≡,1.0)∈M                        (C10 ,C20 ,≡,1.0)∈M0
                                                                                          (3)
where M is a set of candidate mappings {(C1 , C2 , ≡, 1.0)} generated by our assump-
tion, M0 is a set of negative mappings. We employ the negative sampling skill [13] to
generate M0 for training the loss function. For each mapping (Ci , Cj , ≡, 1.0) ∈ M, we
corrupt it and randomly replace Ci or Cj to generate a negative triple (Ci 0 , Cj , ≡, 1.0)
or (Ci , Cj 0 , ≡, 1.0). fSE (C1 , C2 ) is a score function defined in Eq. 4 to calculate the
score of concept pairs, where C1 , C2 ∈ Rd are d-dimensional continuous vectors of
concepts C1 and C2 from different ontologies, || · ||2 is the L2 -norm. We hope that
fSE (C1 , C2 ) is large if concepts C1 and C2 are similar.
                                                          1
                           fSE (C1 , C2 ) = 2 ·                        .                        (4)
                                                  1 + e(||C1 −C2 ||2 )
Furthermore, we design a negative sampling skill tailored for structural relations as-
serted in ontologies (e.g., subclassOf relations, disjointWith ) relations. Unlike the uni-
form negative sampling method that samples its replacer from all the concepts, we
limit the sampling scope to a group of candidates. More precisely, for each mapping
(Ci , Cj , ≡, 1.0) ∈ M, if there exist subclassOf relations (e.g., (Ci 0 , subclassOf , Ci ) or
(Cj 0 , subclassOf , Cj )) asserted in ontologies, we need to exclude this replace case. Rel-
atively, for disjointWith relations (e.g.,(Ci 0 , disjointWith , Ci ) or (Cj , disjointWith , Cj 0 )),
we need to give the highest priority to these relations for replace cases (see Section 4.2).
With these constrains for negative sampling, we can obtain better vector representations
of concepts for ontology matching.

Resource-view Embedding Inspired by the work in [22], we consider external on-
tology as a bridge to connect two concepts from source ontology and target one. We
observe that there exist many different yet overlapping biomedical ontologies such as
MA—NCI—FMA, FMA—NCI—SNOMED-CT. Compared with textual descriptions
or thesaurus, ontologies as external resources can provide some structural assertions,
which is helpful to refine the quality of discovered mappings [22]. Nevertheless, the
original idea is mainly based on string equality, which may not discover more simi-
lar concepts. Therefore, we employ embedding techniques to soft this framework to
discover more potential mappings from this view.


                                                                              C31
                                                                            C32
                                                                                    C33
                                                                 M13                 M23

                                                                                           C21
                                                                C11
                                                                                                 C22


Fig. 2: Left: The original framework for employing external ontology to connect concepts. Right:
The embedding framework for employing external ontology to connect concepts


    Fig. 2 shows a change of the framework from string equality to the softened idea,
where every concept C is represented as a d-dimensional continuous vector C ∈ Rd .
We assume that there exist some concept pairs (C1 , C2 ) involving their synonyms from
ontologies O1 and O2 will share the same concept C3 or its synonyms in external on-
tology O3 . The tuple is labeled as (C1 , C2 , C3 ). Then, we introduce two matrices and
train them based on these tuples in order to obtain more potential mappings. The loss
function is defined as follows.
               X                                      X
lRE = −                 logfRE (C1 , C2 , C3 )−                log(1−fRE (C10 , C20 , C3 )),
          (C1 ,C2 ,C3 )∈T                           (C10 ,C20 ,C3 )∈T 0
                                                                                          (5)
where T is a set of tuples {(C1 , C2 , C3 )} generated by the shared assumption, T 0 is a set
of negative tuples that randomly replace C1 or C2 . fRE (C1 , C2 , C3 ) is a score function
defined in Eq. 6 to calculate the score of projected concepts, where C1 , C2 , C3 ∈ Rd
are d-dimensional continuous vectors of concepts C1 , C2 , C3 from different ontolo-
gies, M13 and M23 are two matrices that project C1 , C2 into the embedding space of
O3 , respectively. We hope that the similar concepts will be projected near their shared
concept. Conversely, there existed a semantic distance between dissimilar ones.

                                                           1
            fRE (C1 , C2 , C3 ) = 2 ·                                                .                 (6)
                                        1 + e(||C1 ∗M13 −C3 ||2 +||C2 ∗M23 −C3 ||2 )
    In order to train two matrices better, we maintain all the vectors of concepts in O3
unchanged and only update the parameters of matrices and concepts from O1 and O2 .
Furthermore, we take advantage of structural relations in O3 to pre-train the vector
representations of concepts, which can adjust semantic distances of concept vectors. As
existing KG embedding models face the sparsity problem, we design a loss function
based on cross-entropy to achieve this goal that is defined as follows.
                X                                  X
                                                                         0     0
 lP T = −               logfr (C31 , C32 ) −                log(1 − fr (C31 , C32 )), (7)
            (C31 ,r,C32 )∈R                         0 ,r,C 0 )∈R0
                                                  (C31    32


                                                           1
                        fr (C31 , C32 ) = 2 ·                              ,              (8)
                                                1 + e(||C31 −C32 ||2 −α)
where R is a set of relation assertions, involving {(C31 , subClassOf , C32 )} ∪ (C31 ,
PartOf , C32 )}, R0 is a set of negative ones that randomly replace C31 or C32 . fr (C31 , C32 )
is a score function that measures the score of (C31 , r, C32 ), C31 and C32 are vector
representations of concepts C31 and C32 . Notice that, subClassOf and PartOf are not
equivalent relations, so we utilize a hyper-parameter α to controls the semantic dis-
tances of concept vectors.

3.2   View-Embedding Combination
After obtained mappings from different modules, we need to combine them together. A
straightforward strategy is collecting all the mappings from these modules and filtering
out them with one threshold or stable marriage algorithm. Although this strategy can
obtain a high recall in the final alignment, it may also introduce lots of wrong mappings
and miss n:m cases about mappings. Therefore, we propose a combination strategy
based on mutual assessment.
    For convenience, we use OM-L, OM-S, OM-R to represent the alignments gener-
ated by lexical-based module, structural-based module, resource-based module, respec-
tively. The concrete procedures are achieved as follows.
Step 1 Merge the mappings from OM-S and OM-R. Their merged result is labeled
       as OM-SR, in which the similarity of each mapping is selected the large one
       between OM-S and OM-R.
Step 2 Select the “reliable” mappings of OM-L and OM-SR based on the correspond-
       ing thresholds δ1 and δ2 .
Step 3 Assess these “reliable” mappings from OM-L and OM-SR mutually. For exam-
       ple, if one “reliable” mapping belongs to OM-L and its similarity in OM-SR is
       lower than threshold δ3 , then we need to remove it. Relatively, if one “reliable”
       mapping belongs to OM-SR and its similarity in OM-L is lower than threshold
       δ4 , then this mapping will be removed.
Step 4 Merge assessed mappings from OM-L and OM-SR and generate a final align-
       ment. For each mapping appearing in OM-L and OM-SR at the same time, its
       similarity is selected the large one.

4     Experiments
To verify the effectiveness of MultiOM, we used Python to implement our approach
with the aid of TensorFlow10 and parse ontologies by OWLAPI11 . The experiments
were conducted on a personal workstation with an Intel Xeon E5-2630 V4 CPU which
has 64GB memory and TiTAN XP GPU. Our approach12 can be downloaded together
with the datasets and results.
10
   https://www.tensorflow.org/
11
   http://owlapi.sourceforge.net/
12
   https://github.com/chunyedxx/MultiOM
4.1     Datasets

We collect the biomedical ontologies from Anatomy Track and Large BioMed Track
in OAEI13 (Ontology Alignment Evaluation Initiative), which is an annual campaign
for evaluating ontology matching systems that attracts many participants all over the
world. Furthermore, this campaign provides uniform test cases and standard alignments
for measuring precision, recall and F1-measure for all participating systems.


4.2     Experiment Settings

We select several strategies to construct the baseline systems to verify the effectiveness
of our model. The following is the detail construction of strategies in our experiments.
 – StringEquiv: It is a string matcher based on string equality applied on local names
   of entities.
 – StringEquiv + Normalization (StringEquiv-N): It employs normalization techniques
   before execute StringEquiv matcher.
 – StringEquiv + Synonym (StringEquiv-S): It extends the synonym of concepts when
   executing the StringEquiv matcher.
 – StringEquiv + Synonym + Reference Ontology (StringEquiv-SR): It introduces ex-
   ternal ontologies as bridges to connect concepts based on StringEquiv-S.
 – StringEquiv + Synonym + Normalization (StringEquiv-NS): It extends the syn-
   onym of concepts when executing the StringEquiv-N.
 – StringEquiv + Normalization+ Synonym + Reference Ontology (StringEquiv-NSR):
   employs normalization techniques before execute StringEquiv-SR.
    For MultiOM, we use stochastic gradient descent (SGD) as an optimizer and the
configuration of hyper-parameters is listed below: Dimensions of concepts and ma-
trices are set to d={50, 100} and dM ={50, 100}. The mini-batch size of SGD is set
to Nbatch={5, 10, 20, 50}. We select the learning rate λ among {0.01, 0.02, 0.001}
and {1, 3, 5, 10} negative triples sampled for each positive triple. The whole training
spent 1000 epochs. In lexical-based module, the vector presentations of tokens mainly
come from the linkage14 of the work [11], whose dimension is set to 200. For some to-
kens without vector presentations, we initialize them randomly and enforce constrains
as ||t1i ||2 ≤ 1 and ||t2j ||2 ≤ 1. In resource-view embedding module, we employ
TransE [13], ConvE [23] and pre-training function 7 to initialize the vector represen-
tations of concepts in external ontologies. α is set to {0.01, 0.05, 0.10} in function 7
for controlling the semantic distances of concept vectors. For negative sampling strat-
egy, we collect all the related structural assertions of concepts. When one concept is
selected as a replacer, we retrieve the structural assertions of this concept and execute
the replacement based on its relations with the original concept. During this process of
replacement, disjointWith relations own the highest priority and subclassOf relations
should be excluded. Finally, the result of MultiOM is generated by the combination
strategy, and we set the related thresholds δ1 = 0.8, δ2 = 0.95, δ3 = 0.65, δ4 = 0.3.
    In order to show the effect of our proposed negative sampling, a symbol “-” added
to the symbol represented module (or merged one) indicates that this module is not
equipped with negative sampling tailored for structural relations.
13
     http://oaei.ontologymatching.org/
14
     https://doi.org/10.5281/zenodo.1173936.
4.3   Evaluation Results
Table 1 lists the matching results of MultiOM compared with baseline systems. We ob-
serve that merging more strategies can improve the number of mappings. Although it
slightly decreases the precision of alignments, it can increase the recall and F1-measure
as a whole. Relatively, MultiOM further improves the recall and F1-measure of align-
ments because continue vector representations of concepts can obtain more similar con-
cepts and discover more potential mappings. Moreover, the performance of MultiOM
is better than MultiOM− in term of F1-measure. The main reason is that employing
structural relations are helpful to distinguish the vector representations of concepts.


                 Table 1: The comparison of MultiOM with baseline systems

                             MA-NCI                  FMA-NCI-small
       Methods
                  Number Correct P    R    F1 Number Correct P    R    F1
     StringEquiv    935   932 0.997 0.615 0.761 1501 1389 0.995 0.517 0.681
   StringEquiv-N    992   989 0.997 0.625 0.789 1716 1598 0.995 0.595 0.745
    StringEquiv-S  1100 1057 0.961 0.697 0.808 2343 2082 0.974 0.775 0.863
   StringEquiv-SR 1162 1094 0.941 0.722 0.817 2343 2082 0.974 0.775 0.863
   StringEquiv-NS 1153 1109 0.962 0.732 0.831 2464 2200 0.975 0.819 0.890
  StringEquiv-NSR 1211 1143 0.943 0.753 0.838 2467 2203 0.975 0.820 0.891
     MultiOM−      1484 1296 0.873 0.855 0.864 2500 2173 0.947 0.809 0.872
      MultiOM      1445 1287 0.891 0.849 0.869 2538 2195 0.942 0.817 0.875


  Table 2: The results about combining with different embedding modules in Anatomy Track

                      Methods                      Number Correct P      R    F1
               TFIDF (threshold= 0.8)                985   976 0.991 0.644 0.780
               OM-L (threshold= 0.8)                1286 1175 0.914 0.775 0.839
              OM-S − (threshold= 0.95)              1836 1109 0.604 0.732 0.662
               OM-S (threshold= 0.95)               1189 1097 0.923 0.724 0.811
      OM-R (Random initialization, threshold= 0.95) 709    680 0.959 0.449 0.661
           OM-R (TransE, threshold= 0.95)             22    4    0.182 0.003 0.005
           OM-R (ConvE, threshold= 0.95)             835   790 0.946 0.521 0.672
        OM-R (loss function 7, threshold= 0.95)      833   789 0.948 0.520 0.672
             OM-RS − (threshold= 0.95)              1271 1147 0.902 0.757 0.823
              OM-RS (threshold= 0.95)               1237 1138 0.920 0.751 0.827
                     MultiOM−                       1484 1296 0.873 0.855 0.864
                      MultiOM                       1445 1287 0.891 0.849 0.869


     Table 2 shows the results of different combination with embedding-view modules.
Overall, merge more embedding modules, the performances of alignments are better.
For lexical-view module, softened TF-IDF (denoted as OM-L) is better than original
TF-IDF in terms of F1-measure because continuous vectors representing tokens can
provide more semantic information than single strings for calculating the similarity of
concepts. For resource-view embedding module (denoted as OM-R), ConvE and our
pre-training function are better than random initialization because both of them can
utilize structural relations to adopt vector representations of concepts in the semantic
space. Nevertheless, compared with 20 minutes spent in function 7, ConvE took nearly
24 hours to obtain the vector presentations of concepts. Notice that, it is not suitable for
TransE to pre-train the vector presentations of concepts. We analyze that sparse struc-
tural relations of ontologies and its simplified score function limit its capability. Overall,
we observe that employing new negative sampling strategy in embedding-view modules
(i.e., OM-S, OM-RS, MultiOM) is helpful to improve the quality of alignments further
in terms of precision and F1-measure.
     Table 3 lists the comparison of MultiOM with OAEI 2018 top-ranked systems based
on feature engineering and representation learning. Preliminary result shows that Mul-
tiOM can be competitive with several promising matching systems (e,g, FCAMapX,
XMap) in terms of F1-measure. Nevertheless, there still exists a gap compared with the
best systems (e.g., AML, SCBOW + DAE (O)). We analyze that lexical-based module
and simplified combination strategy may become the main bottlenecks of MultiOM.
Benefited from thesauruses (e.g., UMLS) and optimized combination strategy, most
top-ranked systems can obtain better performances of OM tasks. In addition, most sys-
tems (e.g., AML, LogMap) employ alignment debugging techniques, which is helpful
to improve the quality of alignment further. But we do not employ these techniques in
the current version. We leave these issues in our future work.

          Table 3: The comparison of MultiOM with OAEI 2018 top-ranked systems

                          MA-NCI                 FMA-NCI-small
      Methods
               Number Correct P   R    F1 Number Correct P    R    F1
      AML       1493 1418 0.95 0.936 0.943 2723 2608 0.958 0.910 0.933
 SCBOW + DAE(O) 1399 1356 0.969 0.906 0.938 2282 2227 0.976 0.889 0.930
   LogMapBio    1550 1376 0.888 0.908 0.898 2776 2632 0.948 0.902 0.921
    POMAP++     1446 1329 0.919 0.877 0.897 2414 2363 0.979 0.814 0.889
      XMap      1413 1312 0.929 0.865 0.896 2315 2262 0.977 0.783 0.869
     LogMap     1387 1273 0.918 0.846 0.880 2747 2593 0.944 0.897 0.920
     SANOM      1450 1287 0.888 0.844 0.865   –    –     –    –     –
    FCAMapX     1274 1199 0.941 0.791 0.859 2828 2681 0.948 0.911 0.929
    MultiOM     1445 1287 0.891 0.849 0.869 2538 2195 0.942 0.817 0.875


5   Conclusion and future work
In this paper, we presented an alternative OM framework called MultiOM, in which
different loss functions were designed based on cross-entropy to model different views
among ontologies and learn the vector representations of concepts. We further proposed
a novel negative sampling skill tailored for structural relations, which could obtain bet-
ter vector representations of concepts. We implemented our method and evaluated it on
real-world biomedical ontologies. The preliminary result indicated that MultiOM was
competitive with several OAEI top-ranked systems in terms of F1-measure.
    In the future work, we will explore following research directions: (1) As candidate
mappings and tuples are not enough, we will extend MultiOM to an iterative framework.
(2) Recently, Zhang et al. [17] presented combination strategies for entity alignment
based on embedding techniques. Incorporating these combination strategies into Mul-
tiOM may facilitate improving the quality of mappings. (3) Senior symbolic reasoning
techniques (e.g., incoherent checking) could be served for training process and align-
ment generation. We will merge them into MultiOM for improving its performances.
References
 1. Jérôme Euzenat and Pavel Shvaiko. Ontology Matching. Springer Science, 2013.
 2. Lorena Otero-Cerdeira, Francisco J Rodrı́guez-Martı́nez, and Alma Gómez-Rodrı́guez. On-
    tology matching: A literature review. Expert Syst. Appl., 42(2):949–971, 2015.
 3. Ian Harrow, Ernesto Jiménez-Ruiz, Andrea Splendiani, Martin Romacker, Peter Woollard,
    Scott Markel, Yasmin Alam-Faruque, Martin Koch, James Malone, and Arild Waaler. Match-
    ing disease and phenotype ontologies in the ontology alignment evaluation initiative. Journal
    of Biomedical Semantics, 8(1):1–13, 2017.
 4. Ernesto Jiménez-Ruiz and Bernardo Cuenca Grau. Logmap: Logic-Based and Scalable On-
    tology Matching. In ISWC, pages 273–288, 2011.
 5. Daniel Faria, Catia Pesquita, Emanuel Santos, Matteo Palmonari, Isabel F Cruz, and Fran-
    cisco M Couto. The AgreementMakerLight Ontology Matching System. In OTM Confer-
    ences, pages 527–541, 2013.
 6. Mengyi Zhao, Songmao Zhang, Weizhuo Li, and Guowei Chen. Matching biomedical on-
    tologies based on formal concept analysis. Journal of Biomedical Semantics, 9(1):11, 2018.
 7. Warith Eddine Djeddi and Mohamed Tarek Khadir. A Novel Approach Using Context-Based
    Measure for Matching Large Scale Ontologies. In DaWaK, pages 320–331, 2014.
 8. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436, 2015.
 9. Chuncheng Xiang, Tingsong Jiang, Baobao Chang, and Zhifang Sui. ERSOM: A Struc-
    tural Ontology Matching Approach Using Automatically Learned Entity Representation. In
    EMNLP, pages 2419–2429, 2015.
10. Prodromos Kolyvakis, Alexandros Kalousis, and Dimitris Kiritsis. DeepAlignment: Unsu-
    pervised Ontology Matching with Refined Word Vectors. In NAACL, pages 787–798, 2018.
11. Prodromos Kolyvakis, Alexandros Kalousis, Barry Smith, and Dimitris Kiritsis. Journal of
    Biomedical Semantics, 9(1):21, 2018.
12. Lucy Wang, Chandra Bhagavatula, Mark Neumann, Kyle Lo, Chris Wilhelm, and Waleed
    Ammar. Ontology alignment in the biomedical domain using entity definitions and context.
    In BioNLP, pages 47–55, 2018.
13. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana
    Yakhnenko. Translating Embeddings for Modeling Multi-relational Data. In NeurIPS, pages
    2787–2795, 2013.
14. Wen Zhang, Bibek Paudel, Liang Wang, Jiaoyan Chen, Hai Zhu, Wei Zhang, Abraham Bern-
    stein, and Huajun Chen. Iteratively learning embeddings and rules for knowledge graph
    reasoning. In WWW, pages 2366–2377, 2019.
15. George A Miller. WordNet: A Lexical Database for English. Communications of the ACM,
    38(11):39–41, 1995.
16. Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and Jiawei Han. An attention-
    based collaboration framework for multi-view network representation learning. In CIKM,
    pages 1767–1776, 2017.
17. Qingheng Zhang, Zequn Sun, Wei Hu, Muhao Chen, Lingbing Guo, and Yuzhong Qu. Multi-
    view Knowledge Graph Embedding for Entity Alignment. In Proceedings of IJCAI, 2019.
18. Miguel Ángel Rodrı́guez-Garcı́a, Georgios V Gkoutos, Paul N Schofield, and Robert Hoehn-
    dorf. Integrating phenotype ontologies with PhenomeNET. Journal of Biomedical Semantics,
    8(1):58, 2017.
19. Michelle Cheatham and Pascal Hitzler. String Similarity Metrics for Ontology Alignment.
    In ISWC, pages 294–309, 2013.
20. Yuanzhe Zhang, Xuepeng Wang, Siwei Lai, Shizhu He, Kang Liu, Jun Zhao, and Xueqiang
    Lv. Ontology Matching with Word Embeddings. In CCL, pages 34–45. 2014.
21. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed
    Representations of Words and Phrases and their Compositionality. In NeurIPS, pages 3111–
    3119, 2013.
22. Songmao Zhang and Olivier Bodenreider. Experience in Aligning Anatomical Ontologies.
    International Journal on Semantic Web and Information Systems, 3(2):1–26, 2007.
23. Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional
    2D Knowledge Graph Embeddings. In AAAI, pages 1811–1818, 2018.