Combining FCA-Map with Representation Learning for
           Aligning Large Biomedical Ontologies*

              Guoxuan Li1,2, Songmao Zhang1, Jiayi Wei3 and Wenqian Ye4
       1
           Academy of Mathematics and Systems Science, Chinese Academy of Sciences,
                                       Beijing 100190, China
               2
                  University of Chinese Academy of Sciences, Beijing 100190, China
              3
                 University of Pennsylvania, 3451 Walnut St., Philadelphia, PA, USA
                4
                  New York University, 251 Mercer St., New York, NY 10012, USA
            liguoxuan18@mails.ucas.ac.cn, smzhang@math.ac.cn,
                      weijiayi@sas.upenn.edu, wy2029@nyu.edu

       Abstract. In our previous studies, we developed FCA-Map to utilize the Formal
       Concept Analysis (FCA) formalism for aligning ontologies in an incremental
       way. The approach has been shown to be effective by its performance in OAEI
       2016, 2018 and 2019. With FCA being inherently a symbolic, logical reasoning
       theory, we attempt to combine FCA-Map with representation learning tech-
       niques so as to take advantage of the semantic representation in numerical, la-
       tent space. The resultant system, called SBERTAlignment, is built based on Si-
       amese BERT and has obtained competitive results for matching large biomedi-
       cal ontologies. Both advantages and limitations are analyzed so as to further our
       study in exploring ontology similarity from diverse yet complementary perspec-
       tives.

1      Introduction
In our previous studies, we developed FCA-Map to utilize the Formal Concept Analy-
sis (FCA) formalism for aligning large and complex ontologies [1]. FCA-Map incre-
mentally constructs formal contexts for specifying the commonality across ontologies
at various levels, including lexical matching, structural validation, and structural
matching. Mappings are extracted from the derived concept lattice at each level and
then used to enable the next-level FCA construction and derivation. The purpose was
to push the envelope of FCA in exploiting the ontological knowledge, and our ap-
proach has been shown to be effective by its performance in OAEI 2016, 2018 and
2019 on anatomical, biomedical ontologies and knowledge graphs tasks [2].
   With FCA being inherently a symbolic, logical reasoning theory, we intend to
augment FCA-Map from a diverse perspective and the representation learning tech-
nology [3] becomes the one that can hardly be missed in nowaday knowledge engi-
neering research. Representation learning transforms symbolic knowledge base into
numerical, low-dimensional space, so that the correlation among entities can be re-
vealed by their vector values.

*
  Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons Li-
cense Attribution 4.0 International (CC BY 4.0).
The work is supported by the Natural Science Foundation of China grant 61621003.
2      Method and Result
Combining FCA-Map and the representation learning system Siamese BERT [4], our
ontology matching approach SBERTAlignment consists of three main steps as fol-
lows. Firstly, multiple and diverse ways are developed for constructing training sam-
ples: (1) using lexical descriptions of entities in ontologies (names, labels and syno-
nyms) and the tokens they share to build a lexical formal context, deriving a lexical
lattice of formal concepts, and extracting pairs of entities as positive match samples;
(2) for each lexical description of entities, retrieving corresponding terms from exter-
nal resources like ConceptNet, BableNet and WikiSynonyms, and thus forming pairs
as positive match samples; (3) training a word2vec model from PubMed, PMC and
Wikipedia and computing the similarity of embeddings of entities so as to obtain posi-
tive match samples; (4) using the is-a and part-of relations within ontologies to yield
more positive match samples; and (5) for negative match samples, using the disjoint-
with relations to generate conflicts between ontologies. Secondly, SBERTAlignment
trains a Siamese BERT model which is more effective for similarity-related tasks, and
the resultant embeddings are compared in order to decide a one-to-one alignment by
stable marriage rationale. Lastly, these matches, together with the matches obtained in
(1) above, are fed into a structural formal context, and those validated by the derived
structural lattice are the final mappings.
    We evaluated on the OAEI 2020 LargeBio small version tasks. SBERTAlignment
outperforms FCA-Map in all aspects; and when compared with the state-of-the-art
AML and LogMap, obtains highest recall and F-measure for FMA-NCI (92.3% and
93.9%) and FMA-SNOMED (83.1% and 87.4%). We also compared with two repre-
sentation learning-based systems DOME and MultiOM, and for all the tasks our sys-
tem outperforms except that DOME has higher precisions.
3      Discussion
We report the preliminary yet promising result of an attempt to take advantage of both
symbolic deduction and numerical, latent semantic representation for the purpose of
matching complex domain ontologies. Of note, neither the formal clustering in FCA
nor the semantic correlation from deep training can decisively determine the equiva-
lence across ontologies, thus comprehensive resources and methods shall be incorpo-
rated. We also notice that both FCA-Map and Siamese BERT can be used to align
multiple ontologies simultaneously, making indirect alignments available. And our
approach should be evaluated on more OAEI tracks like the Disease and Phenotype.

References
 1. Zhao, M., Zhang, S., Li, W., Chen, G.: Matching biomedical ontologies based on formal
    concept analysis. Journal of Biomedical Semantics, 9(1), 1–27 (2018).
 2. OAEI Homepage, http://oaei.ontologymatching.org/, last accessed 2021/08/19.
 3. Bengio, Y., Courville, A.,Vincent, P.: Representation learning: A review and new perspec-
    tives. IEEE Trans on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828 (2013).
 4. Reimers, N., Gurevych I.: Sentence-BERT: Sentence embeddings using siamese BERT-
    networks. In: EMNLP-IJCNLP 2019 Proceedings, pp. 3980–3990. Association for Com-
    putational Linguistics (2019).