OTMapOnto: Optimal Transport-based
               Ontology Matching

                  Yuan An, Alex Kalinowski, and Jane Greenberg

Metadata Research Center, College of Computing and Informatics, Drexel University,
                          Philadelphia, PA 19104, USA
                      {ya45,ajk437,jg3243}@drexel.edu


        Abstract. This paper describes OTMapOnto, an optimal transport-
        based ontology matching system. It leverages the techniques developed
        in computational optimal transport to match terms across different on-
        tologies. The system starts with converting ontology elements into em-
        bedding vectors which can incorporate linguistic, structural, and logical
        information. It then applies optimal transport to the problem of moving
        masses from the source embedding space to the target embedding space.
        The solution to the optimal transport problem consists a shape-based
        Wasserstein distance and a coupling matrix between the embeddings of
        the source and target ontologies. The coupling matrix gives rise to a set of
        candidate matchings which can be refined through further process. The
        current version of the OTMapOnto system makes use of pre-trained word
        embeddings, such as fasttext and BioWordVec, for embedding the labels
        of ontology elements. We report that optimal transport is a promising so-
        lution to discovering ontology matching with higher recall when provided
        with good representations of ontologies1 .

        Keywords: ontology matching       · optimal transport · ontology embed-
        ding.


1     Presentation of the System

1.1    State, purpose, general statement

OTMapOnto is an ontology matching system that applies optimal transport to
ontology embeddings for discovering matchings. The system starts with repre-
senting an ontology using a set of numerical vectors/embeddings each of which
can encode the semantic and structural information of an ontology element. If
both the source and target ontologies are embedded in the same vector space,
we can calculate the geometric distances between pairs of source and target on-
tology elements. The distances can be utilized for deriving potential matchings
between ontologies. For example, Kolyvakis et al. [9] report an approach that
first represents ontology elements in numerical embeddings. The approach then
1
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2              Yuan An, Alex Kalinowski, and Jane Greenberg

applies the Stable Marriage algorithm to deriving candidate matchings based on
the cosine similarity between embedding vectors. Other embedding-based super-
vised [14] and distantly supervised [5] methods have been proposed for matching
ontologies. OTMapOnto differs, with an unsupervised method that formulates
the matching problem as a transformation problem from a source ontology to a
target ontology. Taking the two ontologies as two whole systems of knowledge
encoded in numerical vectors, OTMapOnto first finds an optimal way to trans-
port masses from the source ontology to the target ontology. It then derives
candidate matchings between individual elements from the coupling matrix of
the optimal transport, for instance, by a following Nearest Neighbor or Stable
Marriage algorithm. This paper presents the first version of the system that only
makes use of linguistic information for embedding ontology elements.

1.2        Specific techniques used
Figure 1 illustrates the architecture of the OTMapOnto system. In this paper,
we focus on two main components: ontology embeddings and optimal transport.


                      Ontology Embedding
                                               .


    source ontology
                                                            Optimal    Coupling   Candidate    Matching      Final
                                                           Transport    Matrix    Matchings   Refinement   Matchings


                      Ontology Embedding
                                               .


    target ontology


                       Fig. 1. The Architecture of the OTMapOnto System


Ontology Embeddings. We consider an ontology as a 7-tuple

                                           O = {C, R, G, L, D, S, P},

where C = {C1 , C2 , ..., Cn , e1 , e2 , ..., em } is a set of concepts Ci and entities ei ,
R = {R1 , R2 , ..., Rr } is a set of relations corresponding subClassOf, Object and
Datatype properties, G = {V, E} is the graph structure with a set of vertices
V = {V1 , V2 , ..., Vn+m } and edges E = {E1 , E2 , ..., Er }, L = {L1 , L2 , ..., Lk } is
a set of logical formulas, D = {D1 , D2 , ..., Dd } is a set of textual descriptions,
S = {S1 , S2 , ..., Ss } is a set of distant evidence sentences, and P is a set of literal
values associated with the Datatype properties.
    Various methods have been proposed for representing individual components
in an ontology as embeddings. For example, translational-based methods [4,
19, 10, 12] and graph neural networks (GNN) [8] encode an ontology based on
                   OTMapOnto: Optimal Transport-based Ontology Matching                          3

its graph structure. Text-enhanced methods for ontology embedddings [11, 16,
2, 17, 18] encode lexical words of ontology elements. Logic-aware methods [15,
7] incorporate logical constraints into ontology embeddings. For the current
OTMapOnto system, we apply pre-trained language models, such as fasttext
[3] and BioWordVec [20], to embedding the labels of the set of ontology con-
cepts, Object and Datatype properties, O = {C1 , C2 , ..., Cn , R1 , R2 , ..., Rr }, as a
set of numerical vectors XO = {xi ∈Rd , i = 1..n + r}.
    Specifically, for each element in O, the system first normalizes the element’s
label via a sequence of standard text processing steps. The normalized label
is split into individual words which in turn are fed into a pre-trained language
model to obtain their corresponding word embeddings. The embedding of the el-
ement is the average of the embeddings of the individual words in the normalized
label.

Optimal Transport. Given two sets of embeddings X = {xi ∈ Rd , i = 1..n}
and Y = {yj ∈ Rd , j =P   1..m}, where each embedding      is represented as a vector
                            n                     Pm
xi or yj ∈ Rd . Let µ = i=1 p(xi )δxi and ν = j=1 q(yj )δyj be two probability
distributions defined on the two sets X and Y, respectively, with δx as the
Dirac at the point x. p(xi ) and q(yj ) are probability weights associated with
each set. Usually, we consider uniform weights, e.g., p(xi ) = n1 , for i = 1..n,
               1
and q(yj ) = m   , for j = 1..m. However, if additional information is provided,
p(xi ) and q(yj ) can incorporate the information as non-uniform distributions.
Optimal transport (OT) defines a distance between two distributions analogous
to an optimal plan for mass transportation. Specifically, let C = [c(xi , yj )]i,j be
a cost matrix with c(xi , yj ) measuring a ground distance between the individual
embeddings xi and yj . Let T = [T (xi , yj )]i,j be a matrix of a transport plan
(or couplings) with T (xi , yj ) specifying how much mass will be transported from
point xi to point yj . Let Π(µ, ν) be the set of all feasible transport plans defined
                        |T1n = µ, T> 1m = ν}, where 1n and 1m are all one
             def
as: Π(µ, ν) = {T ∈ Rn×m
                    +
vectors, T1n = µ and T 1m = ν are marginal constraints on feasible plans.
                       >

The Optimal Transport problem is to find the map T : X → Y, where
                         n X
                           m
                                    c(xi , yj ) · T (xi , yj ), s.t., T1n = µ, T> 1m = ν
                         X
       T = argmin                                                                               (1)
             T∈Π(µ,ν) i=1 j=1


The map T also gives rise to a distance measure between the two distributions
called Wasserstein distance:
                                                         n X
                                                           m
                   def
                                                         X
        W (µ, ν) =         min   hC, Ti=        min                c(xi , yj ) · T (xi , yj )   (2)
                         T∈Π(µ,ν)             T∈Π(µ,ν)
                                                         i=1 j=1

The objective is a linear programming problem where the time complexity O(N 3 )
is prohibitively large for large N . The common speedup is to replace the objective
with an entropy regularized objective function. As a result, the problem can be
solved efficiently using Sinkhorn iterations [6].
4         Yuan An, Alex Kalinowski, and Jane Greenberg

Driving and Refining Ontology Matchings. Both the source and target
ontologies are embedded through the same pre-trained language model. For the
ground distance C = [c(xi , yj )]i,j , where i = 1..n, j = 1..m, we experimented
with following two approaches:
    – Using Euclidean distances between the embeddings of pairs of source and
      target ontology elements.
    – Using Wasserstein distances between the immediate neighborhoods of source
      and target ontology elements. The immediate neighborhood of an element
      includes parents and children in the subClassOf hierarchy, domain and range
      elements in direct relations, and the synsets in WordNet.
    The solution, T = [T (xi , yj )]i,j , i = 1..n, j = 1..m, to the optimal transport
problem provides the most efficient way to transform the entire source ontology
to the entire target ontology. Obviously, not every coupling in T = [T (xi , yj )]i,j
corresponds to an ontology matching. We derive a set of candidate matchings as
follows:
    – Mutual Nearest Neighbor (MNN): for a xp ∈{xi ∈ Rd , i = 1..n}, find
      yq ∈{yj ∈ Rd , j = 1..m}, such that, T (xp , yq ) = max{T (xp , yj ), j = 1..m}
      and T (xp , yq ) = max{T (xi , yq ), i = 1..n}.
    – Top-K Targets (TopK): for a xp ∈{xi ∈ Rd , i = 1..n}, find k tar-
      gets {yq1 , yq2 , .., yqk } ⊂ {yj ∈ Rd , j = 1..m}, such that, T (xp , yqz ) ≥
      max{T (xp , yj ), j 6= q1 ..qk }, for z = 1..k.
    The set of candidate matchings will go through a sequence of refinement
steps including exact label string checking, synonym verification, and context
distance measurement.

1.3     Link to the system and parameters file
https://github.com/anyuanay/otmaponto django

1.4     Link to the set of provided alignments
https://github.com/anyuanay/otmaponto django/tree/master/results

2      Results
2.1     Anatomy
For this track, we applied two different pre-trained models, BioWordVec [20]
and fasttext [3] . The BioWordVec was trained on 28 millions PubMed articles
and 2 millions MIMIC III Clinical notes. By a combination of optimal transport
and MNN, we achieved a 87% precision and 85% recall. By retrieving the Top10
targets, the recall was 93%, while the precision is very low. However, the size
of the BioWordVec is very large (26G). It is infeasible to submit the system for
evaluation. Instead, we created a running Web service with pre-loaded fasttext
model (6G). Our own evaluation results is 64% precision an 81% recall.
                OTMapOnto: Optimal Transport-based Ontology Matching           5

2.2   Conferences
For this track, we applied fasttext model. We achieved a precision of 23% and
a recall of 73% which is higher than the recall of most of the systems in the
OAEI2020 report. By just encoding the labels of elements, OTMapOnto has
shown it was able to retrieve more matchings based on the optimal transport
strategy. Improving the precision will be our future focus.

2.3   Material Sciences and Engineering
This is a new track without previous evaluation results. There are three test
cases. For the first test case, OTMapOnto was able to achieve a precision of
23% and a recall of 39%. For the second case, OTMapOnto achieved a precision
of 32% and a recall of 55%. For the third test case, OTMapOnto achieved a
precision of 14% and a recall of 90%. Compared to the methods used in Engy’s
thesis [13], OTMapOnto could achieve better recall in most of the cases.

2.4   Large Biomedical Ontologies
For this track, OTMapOnto encountered out of memory errors. It was only able
to perform matching on two small cases. For the FMA-SNOMED small, the
precision was 38% and recall was 67%. For the FMA-NCI small, the precision
was 45% and recall was 84%.

2.5   Disease and Phenotype
For this track, we evaluated OTMapOnto on two tasks: HP-MP 2017 and DOID-
ORDO 2017 task. For the HP-MP 2017 task, OTMapOnto achieved a precision
of 11% and a recall of 99.1%. For the DOID-ORDO 2017 task, OTMapOnto
achieved a precision of 16% and a recall of 99.3%. The recalls are consistently
better than the results in the OAEI2020 report.

2.6   Common Knowledge Graphs
For this track, the precision was 90% and the recall was 84%. No previous results
are available. We look forward to the evaluation results in OAEI2021.


3     General Comments
3.1   Comments on the results (strengths and weaknesses)
The coupling matrix returned by the optimal transport solver contains mappings
from all n source elements to all m target elements. The current OTMapOnto
derives candidate matchings from the n × m matrix mainly through retrieving
mutual nearest neighbors. On one hand, it is inevitably that the set of can-
didate matchings still contain many spurious ones. On the other hand, it is
6       Yuan An, Alex Kalinowski, and Jane Greenberg

expected that the process of optimal transport can discover the majority of ac-
curate matchings. This phenomenon has been shown in several tasks, where the
results have lower precision and higher recall compared to the previous results
of other systems.

3.2   Discussions on the way to improve the proposed system
In this paper, we report a work demonstrating that just with the embeddings of
the labels of ontology elements, the optimal transport-based method is promis-
ing for discovering more matchings to improve recall. In future work, we will
develop a method for representing an ontology as a set of embedding vectors by
integrating individual components including hyperbolic embeddings [1] for bet-
ter representing tree structures in ontologies. We also aim to improve precision
by developing a method to rank the TopK candidate matchings for each source
element. The ranking method will be based on the structural and semantic con-
text of the elements in each pair of candidate matching. The third improvement
is to modify the distributions defined on the sets of source and target ontology
embeddings for optimal transport. Instead of using uniform distributions for all
points, we will consider a distribution reflecting the ground distances between
the two sets. For example, if the ground distances from a source point to all tar-
get points are consistently large, the measure of the source point should be very
small or nil in terms of mass transportation. Finally, for large scale ontologies,
we need to break down the ontologies into smaller chunks for computing the op-
timal transport couplings. We will first partition the embeddings into clusters.
Consequently, we will find candidate matchings from the pairs of clusters that
have shorter Wasserstein distances.


References
 1. Alvarez-Melis, D., Mroueh, Y., Jaakkola, T.: Unsupervised hierarchy matching
    with optimal transport over hyperbolic spaces. In: Chiappa, S., Calandra, R. (eds.)
    Proceedings of the Twenty Third International Conference on Artificial Intelligence
    and Statistics. Proceedings of Machine Learning Research, vol. 108, pp. 1606–1617.
    PMLR (26–28 Aug 2020)
 2. An, B., Chen, B., Han, X., Sun, L.: Accurate text-enhanced knowledge graph rep-
    resentation learning. In: Proceedings of the 2018 Conference of the North Ameri-
    can Chapter of the Association for Computational Linguistics: Human Language
    Technologies, Volume 1 (Long Papers). pp. 745–755. Association for Computational
    Linguistics, New Orleans, Louisiana (Jun 2018). https://doi.org/10.18653/v1/N18-
    1068, https://www.aclweb.org/anthology/N18-1068
 3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
    subword information. Transactions of the Association for Computational Linguis-
    tics 5 (2017)
 4. Bordes, A., Usunier, N., Garcı́a-Durán, A., Weston, J., Yakhnenko, O.: Translat-
    ing embeddings for modeling multi-relational data. In: Burges, C.J.C., Bottou, L.,
    Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Pro-
    cessing Systems 26: 27th Annual Conference on Neural Information Processing
                  OTMapOnto: Optimal Transport-based Ontology Matching                7

    Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe,
    Nevada, United States. pp. 2787–2795 (2013), http://papers.nips.cc/paper/5071-
    translating-embeddings-for-modeling-multi-relational-data
 5. Chen, J., Jiménez-Ruiz, E., Horrocks, I., Antonyrajah, D., Hadian, A., Lee, J.:
    Augmenting ontology alignment by semantic embedding and distant supervision.
    In: Verborgh, R., Hose, K., Paulheim, H., Champin, P.A., Maleshkova, M., Corcho,
    O., Ristoski, P., Alam, M. (eds.) The Semantic Web (ESWC). pp. 392–408. Springer
    International Publishing, Cham (2021)
 6. Cuturi, M.: Sinkhorn Distances: Light speed Computation of Optimal Transport.
    Master’s thesis, Red Hook, NY, USA (2013)
 7. Demeester, T., Rocktäschel, T., Riedel, S.: Lifted rule injection for relation
    embeddings. In: Proceedings of the 2016 Conference on Empirical Methods in
    Natural Language Processing. pp. 1389–1399. Association for Computational
    Linguistics, Austin, Texas (Nov 2016). https://doi.org/10.18653/v1/D16-1146,
    https://www.aclweb.org/anthology/D16-1146
 8. Hamilton, W.L.: Graph representation learning. Synthesis Lectures on Articial
    Intelligence and Machine Learning 14(3), 1–159 (2020)
 9. Kolyvakis, P., Kalousis, A., Smith, B., Kiritsis, D.: Biomedical ontology alignment:
    an approach based on representation learning. J Biomed Semant 9(21) (2018)
10. Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embed-
    dings for knowledge graph completion. In: Proceedings of the Twenty-Ninth AAAI
    Conference on Artificial Intelligence. p. 2181–2187. AAAI’15, AAAI Press (2015)
11. Lu, F., Cong, P., Huang, X.: Utilizing textual information in knowledge graph
    embedding: A survey of methods and applications. IEEE Access 8, 92072–92088
    (2020)
12. Ma, S., Ding, J., Jia, W., Wang, K., Guo, M.: Transt: Type-based multiple em-
    bedding representations for knowledge graph completion. In: Ceci, M., Hollmén,
    J., Todorovski, L., Vens, C., Dzeroski, S. (eds.) Machine Learning and Knowledge
    Discovery in Databases - European Conference, ECML PKDD 2017, Skopje, Mace-
    donia, September 18-22, 2017, Proceedings, Part I. Lecture Notes in Computer
    Science, vol. 10534, pp. 717–733. Springer (2017). https://doi.org/10.1007/978-3-
    319-71249-9 43, https://doi.org/10.1007/978-3-319-71249-9 43
13. Nasr, E.: Evaluation of Automatic Ontology Matching for Materials Sciences and
    Engineering. Master’s thesis, Albert Ludwig University of Freiburg, Germany
    (2020)
14. Nkisi-Orji, I., Wiratunga, N., Massie, S., Hui, K.Y., Heaven, R.: Ontology align-
    ment based on word embedding and random forest classification. Machine Learning
    and Knowledge Discovery in Databases 11051I (2019)
15. Rocktäschel, T., Singh, S., Riedel, S.: Injecting logical background knowledge into
    embeddings for relation extraction. In: Proceedings of the 2015 Conference of
    the North American Chapter of the Association for Computational Linguistics:
    Human Language Technologies. pp. 1119–1129. Association for Computational
    Linguistics, Denver, Colorado (May–Jun 2015). https://doi.org/10.3115/v1/N15-
    1118, https://www.aclweb.org/anthology/N15-1118
16. Veira, N., Keng, B., Padmanabhan, K., Veneris, A.: Unsupervised embed-
    ding enhancements of knowledge graphs using textual associations. In: Pro-
    ceedings of the Twenty-Eighth International Joint Conference on Artificial In-
    telligence, IJCAI-19. pp. 5218–5225. International Joint Conferences on Artifi-
    cial Intelligence Organization (7 2019). https://doi.org/10.24963/ijcai.2019/725,
    https://doi.org/10.24963/ijcai.2019/725
8       Yuan An, Alex Kalinowski, and Jane Greenberg

17. Wang, Y., Zhang, H., Shi, G., Liu, Z., Zhou, Q.: A model of text-enhanced knowl-
    edge graph representation learning with mutual attention. IEEE Access 8, 52895–
    52905 (2020). https://doi.org/10.1109/ACCESS.2020.2981212
18. Wang, Z., Li, J.Z.: Text-enhanced representation learning for knowledge graph. In:
    IJCAI (2016)
19. Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating
    on hyperplanes. In: AAAI (2014)
20. Zhang, Y., Chen, Q., Yang, Z., Lin, H., Lu, Z.: Biowordvec, improving biomedical
    word embeddings with subword information and mesh. Scientific Data 6(52) (2019)