<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Biomedical Ontology Alignment with BERT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuan He</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiaoyan Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denvar Antonyrajah</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ian Horrocks</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samsung Research</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Existing machine learning-based ontology alignment systems often adopt complicated feature engineering or traditional non-contextual word embeddings. However, they are often outrun by the rule-based systems despite the model complexity. This paper proposes a novel ontology alignment system based on a contextual embedding model named BERT, aiming to su ciently utilize the text semantics implied by ontologies. Our results on two biomedical alignment tasks demonstrate that, despite using the to-be-aligned classes alone as the input, our system outperforms the leading systems: LogMap and AML.</p>
      </abstract>
      <kwd-group>
        <kwd>Ontology Alignment</kwd>
        <kwd>Contextual Embeddings</kwd>
        <kwd>BERT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Ontology alignment refers to matching semantically related entities from di
erent ontologies, with the vision of integrating data from heterogeneous resources.
The resulting mappings usually indicate equivalence or subsumption
relationships, consequently providing a convenient means for merging two ontologies.
Moreover, through alignment with other ontologies, we can introduce additional
semantics for augmenting an individual ontology's quality assurance [
        <xref ref-type="bibr" rid="ref13 ref4 ref6">4, 6, 13</xref>
        ].
      </p>
      <p>Independent development of ontologies results in di erent naming schemes,
leading to a challenge in alignment. For example, the class named \lanugo" in
the SNOMED ontology is named as \primary hair " by the Foundational Model
of Anatomy (FMA) ontology. Besides, real-world ontologies typically contain a
large number of classes, which causes scalability issues during mapping discovery
and makes it more di cult to distinguish classes of similar names (e.g., with
overlapped sub-words) but distinct meanings, especially for systems that adopts
string similarity-based lexical matching.</p>
      <p>
        Leading systems such as LogMap [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and AgreementMakerLight (AML) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
approach alignment as a sequential process, with lexical matching typically
being the rst stage, followed by mapping extension and mapping repair. However,
their lexical matching parts mostly take the text's surface form, without
considering the semantics of words. More recent machine learning-based approaches
such as DeepAlignment [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and OntoEmma [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] adopt the word embedding
technique which projects words into vectors, where pairs of words with closer
semantic meanings have a smaller Euclidean distance in the vector space.
Nevertheless, these methods adopt traditional non-contextual word embedding
methods which assign each word a uni ed representation and thus cannot well exploit
word-level contexts to help resolve ambiguity.
      </p>
      <p>
        To tackle this problem, we propose an ontology alignment system based on
BERT, a contextual word embedding model that has demonstrated its strength
(through ne-tuning) in a wide range of Natural Language Processing (NLP)
tasks such as question answering, named entity recognition, and sentiment
analysis [
        <xref ref-type="bibr" rid="ref16 ref18 ref2">2, 16, 18</xref>
        ], but has not yet been fully investigated in ontology alignment. The
fundamental challenge of applying deep learning techniques on ontology
alignment is that the number of reference mappings is often several orders smaller
than the number of candidates (i.e., class pairs) to predict, resulting in a lack
of labelled data and an imbalance between positive and negative samples. Thus,
previous research into supervised learning schemes usually involves complicated
feature engineering and needs to address extra noise brought by silver data (i.e.
labelled data that are automatically generated by certain heuristics) [
        <xref ref-type="bibr" rid="ref15 ref7">7, 15</xref>
        ]. In
contrast, ne-tuning the pretrained BERT on downstream tasks typically
necessitates only a moderate amount of training data and avoids complex hand-crafted
features. Furthermore, we also consider a critical issue in mapping prediction,
i.e., reducing the quadratic complexity of searching all possible mappings.
      </p>
      <p>
        To the best of our knowledge, we are among the rst to develop a robust and
general ontology alignment system using contextual embedding. In comparison
to the previous work by Neutel et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which is a preliminary work that
employs BERT to match two domain ontologies, we establish a concrete and
exible pipeline that ts both the unsupervised and semi-supervised settings;
we improve their mean token embedding and the class token embedding models
and use them as baselines; we optimize the mapping search, and conduct more
extensive experiments to examine our approaches. We refer to our system as
BERTMap3. As shown in Figure 1, it consists of the following steps:
1. Corpora construction. We extract synonym and non-synonym pairs from
various sources, including input ontologies, known mappings and/or external
knowledge. Such construction exploits the text semantics and avoids the need
for hand-crafted features.
2. Fine-tuning. We then choose a suitable pretrained BERT variant, and
netune it on our corpora for a classi er, which takes a moderate amount of data
and training resources.
3. Mapping prediction. For each class pair, we take their labels as input to
the classi er. Since one class may have multiple labels, we use the average of
the output probabilities of all the label combinations as the mapping value.
To reduce the search space but keep the recall, we use a sub-word inverted
index based on BERT's tokenizer.
3 Code is available at: https://github.com/KRR-Oxford/BERTMap.
      </p>
      <p>Corpora</p>
      <p>BERT</p>
      <p>Fine-tuning</p>
      <p>Ontology and
Known/High Confidence
Mappings (Optional)
Complementary Sources
(Optional)</p>
      <p>Intra-ontology</p>
      <p>Corpus
Cross-ontology</p>
      <p>Corpus
Complementary</p>
      <p>Corpus
Class from either
ontology
Sub-word
Inverted Index</p>
      <p>Prediction
String Match
(Optional)
Not Found</p>
      <p>Found
Transfer
weights
Transfer
weights
Pretrained BERT</p>
      <p>Fine-tuned BERT
Binary Classifier</p>
      <p>Fine-tuned Binary</p>
      <p>Classfier</p>
      <p>,
from opposite ontology</p>
      <p>
        We evaluate BERTMap on (i) the FMA-SNOMED small fragment task of
the OAEI Large BioMed Track (LargeBio)4, and (ii) its extended version
FMASNOMED+, where the missing labels of SNOMED are augmented with labels
from a more recent version of SNOMED. We compare BERTMap with four
internal baselines | two lexical matching-based methods and two BERT token
embedding-based models, and two leading systems | LogMap [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and AML [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Our results demonstrate that BERTMap, despite using the to-be-aligned classes
alone as the input, outperforms all the baselines on both tasks.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <sec id="sec-2-1">
        <title>Problem Formulation</title>
        <p>An ontology is typically de ned as an explicit speci cation of a
conceptualization. It often uses representational vocabularies to describe a domain of interest
with the main components being entities and axioms. Note that entities include
classes, instances and properties. Ontology alignment involves tasks of matching
cross-ontology entities with equivalence, subsumption or other more complicated
relationships. In this work, we focus on equivalence alignment between classes.</p>
        <p>The ontology alignment system takes as input a pair of ontologies, O and
O0, with class sets C and C0, respectively. It rst generates a set of scored
mappings, which are triples of the form of (c 2 C; c0 2 C0; P (c c0)), where
P (c c0) 2 [0; 1] denotes the probability score (a.k.a. mapping value) that c
and c0 are equivalent. In this paper, we determine the nal output by preserving
mappings with a score larger than a certain threshold 2 [0; 1].</p>
        <p>We further clarify some notations used in this paper. Note that a class of
an ontology typically contains a list of labels (via annotation properties such as
rdfs:label ) that serve as alternative class names. We lowercase these aliases and
remove any underscores before tokenization. We denote the preprocessed labels
as ! and the set of them as (c) for a class c.</p>
        <sec id="sec-2-1-1">
          <title>4 http://www.cs.ox.ac.uk/isg/projects/SEALS/oaei/.</title>
          <p>NSP</p>
          <p>Mask LM</p>
          <p>Mask LM</p>
          <p>...</p>
          <p>
            ...
...
...
BERT is a language representation model built on the bidirectional transformer
encoder [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. As shown in Figure 2, its input is a sequence composed of a
special token [CLS], tokens of two sentences A and B, and a special token [SEP]
that separates A and B. Each token's initial embedding encodes its content, its
position in the sequence, and the sentence it belongs to. The model has L
successive layers of an identical architecture. Its main component is the multi-head
self-attention block, which computes a contextual hidden representation for each
token by considering the whole sequence output from the previous layer (see the
grey arrows in Figure 2). The output of layer l is denoted as:
fbert(x; l) = (vC(l)LS ; v1(l); :::; vN(l); vS(lE)P ; v10(l); :::; vN0(l0)) 2 R(N+N0+2) d
(1)
where x is the input sequence, vi(l)s and v0(l)s are d-dimensional vectors of
corj
responding tokens. The last layer (l = L) outputs can be used as the input of
downstream tasks or as the token embeddings. In contrast to the traditional
non-contextual word embedding techniques such as Word2Vec [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], which
assign each token in the vocabulary only one embedding, BERT can distinguish
di erent occurrences of the same token. For instance, given an input sentence
\the bank robber was seen on the river bank", BERT computes di erent
embeddings for the two occurrences of \bank", while the traditional model yields a
uni ed embedding that is biased towards the most frequent meaning of \bank"
(probably the money bank) in the training corpora.
          </p>
          <p>The framework of BERT involves pre-training and ne-tuning, where
pretraining is to develop a multi-purpose model that learns vast background
knowledge, and ne-tuning is to adjust the parameters of pre-trained BERT by further
training on a downstream task. On the left of Figure 2, we illustrate that BERT
is pre-trained on two tasks: Masked Language Modelling (MLM) which predicts
tokens that are randomly masked in sentences A and B, and Next Sentence
Prediction (NSP) which predicts if sentence B follows A. On the right of Figure 2, we
present the example of ne-tuning on a downstream paraphrasing task, where
the pre-trained BERT is attached to an additional binary classi cation layer
that outputs the probability that A and B are synonymous. By minimizing the
cross-entropy loss on the training samples, the parameters of pre-trained BERT
are adjusted, and the parameters of the additional layer are learnt. Pre-trained
BERT models are usually publicly available and can be re-used for ne-tuning on
various downstream tasks. In this paper, we conduct no pre-training but instead
ne-tuning an existing pre-trained BERT that has learnt biomedical background
knowledge (see Section 4 for details).
3
3.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>BERTMap</title>
      <sec id="sec-3-1">
        <title>Corpora Construction and Fine-tuning</title>
        <p>Real-world ontologies are typically abundant in labels that serve as aliases to
their classes. Labels of the same class or semantically equivalent classes are
intuitively synonymous in the domain of the input ontologies. On the other
hand, non-synonymous pairs can be extracted from classes that are semantically
distinct. For convenience, we use \synonym" and \non-synonym" to describe
a synonymous and a non-synonymous label pair, respectively. The corpora of
these pairs are divided into the following three categories:
Intra-ontology corpus. For each input ontology, we regard each pair of
labels associated with the same class as synonymous. We also construct identity
synonyms to encode each label as a synonym of itself. For non-synonyms, we
consider: soft non-synonyms which are labels from separate classes at random,
and hard non-synonyms which are labels from disjoint classes. Since class
disjointness is rarely de ned in an ontology, we infer it from the structure of the
input ontology. In this paper, we simply assume that sibling classes are disjoint.
Cross-ontology corpus. The lack of annotated mappings makes it unfeasible
to apply supervised learning on ontology alignment. However, we can optionally
employ a semi-supervised setting by assuming that a small portion of mappings
have been created by human experts. For each known mapping, we extract
label pairs from its two classes as synonyms. Meanwhile, soft non-synonyms are
extracted by matching a source class to a random target class, whereas hard
non-synonyms are not available at the cross-ontology level because we have no
prede ned disjointness in the mappings. Also, we do not create identity
synonyms here because they have been considered in the intra-ontology corpus.
Complementary corpus. Besides the input ontologies, we can expand the
synonym and non-synonym sets from external sources, especially other ontologies
in the relevant domain. To reduce the potential noise and the corpus size, we
could truncate the auxiliary ontology by considering only the classes whose labels
can be matched to some class of the input ontologies.</p>
        <p>The intra-ontology corpus, cross-ontology corpus and complementary corpus
are denoted as io, co and cp, respectively. io is essential to BERTMap, while co
and cp are optional. The identity synonyms are denoted as ids. For convenience,
we use + to denote the combination of di erent corpus/synonyms; for example,
io + ids refers to the intra-ontology corpus with identity synonyms considered,
and io+co+cp refers to including all three corpora without identity synonyms. To
learn the symmetrical property, we also consider appending reversed synonyms,
i.e., if (!1; !2) is in the synonym set, (!2; !1) is also added as a synonym. Given
a corpus setting, we obtain the corresponding synonym and non-synonym sets,
and then ne-tune a pretrained BERT on them as introduced in Section 2.2. We
evaluate various corpus settings in Section 4. Finally, since some non-synonym
pairs are extracted by random class combination, they can occasionally appear
in the synonym set; in such cases we delete the relevant non-synonym pair.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Candidate Selection with Sub-word Inverted Index</title>
        <p>
          Given input ontologies O and O0, and their class sets C and C0, a naive algorithm
for computing the alignment is to look up c0 = arg maxc02C0 P (c c0) for every
class c 2 C, which results in a time complexity of O(n2). To reduce that, we
use a sub-word inverted index based on BERT's WordPiece tokenizer [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The
algorithm rst initializes the vocabulary with single characters present in the
training corpus and incrementally merges them into sub-words so that the most
likely combination is added at each iteration. We opt to use the built-in
subword tokenizer rather than re-train it on our corpora because it has already been
tted to an enormous corpus (with 3:3 billion words) that covers various topics
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], and in this context we consider generality to be preferable to task speci city.
        </p>
        <p>We build sub-word inverted indices for O and O0 separately. Each entry of
an index is a sub-word, and its values are classes whose labels contain this
subword after tokenization. With the indices, we can implement candidate selection
very e ciently in the following way. We rst restrict the search space of each
source class c to target classes that share at least one sub-word token with c.
Next, we rank these target classes by a scoring metric based on inverted
document frequency (idf ), and the top k scored are chosen for subsequent mapping
prediction. For a target class c0, this scoring metric is computed as:
s(c; c0) =</p>
        <p>X
t2T (c)\T (c0)
idf (t) =</p>
        <p>X
t2T (c)\T (c0)
log10(jC0j = jC0(t)j);
where T ( ) is the set of sub-word tokens from tokenzing all the labels of a class,
C0(t) is the the set of target classes that have token t after tokenization, and j j
denotes set cardinality. In this way, we reduce the search space from O(n2) to
O(kn) where k is a constant. Compared to the traditional word-level inverted
index, our approach has the following advantages: (i) it captures various forms of
words without requiring additional processing such as stemming and consulting
a dictionary; (ii) it interprets unknown words by parsing them into consecutive
known sub-words rather than treating them as the same (unknown) token.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Mapping Prediction</title>
        <p>With the ranked candidate classes of a source class c, we rst perform a
stringmatch check to see whether any of the candidate classes have at least one exactly
matched label (after preprocessing as illustrated in Section 2.1) with c, i.e., to
search for c0 according to the rank such that (c) T (c0) 6= ;. We assign a
mapping value of 1:0 to the rst c0 that satis es the condition. If we cannot nd
such candidate class, we apply the ne-tuned BERT classi er. In this case, for
each candidate class c0, we predict the synonym probabilities for all label pairs
(!; !0) 2 (c) (c0), and take the average as the mapping value between c
and c0. We return the top scored mapping for each c.</p>
        <p>We can generate three mapping sets from the input source and target
ontologies: (i) src2tgt by looking for a target class c0 2 C0 for each source class c 2 C;
(ii) tgt2src by looking for a source class c 2 C for each target class c0 2 C0;
and (iii) combined by merging src2tgt and tgt2src with duplicates removed.
We nally output determined mappings by ltering out mappings whose values
are lower than a certain threshold 2 [0; 1].
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <sec id="sec-4-1">
        <title>Datasets and Experiment Settings</title>
        <p>Datasets and Tasks. We rst evaluate BERTMap on the FMA-SNOMED small
fragment task of the OAEI LargeBio Track. The input FMA and SNOMED
ontologies (segments) have 10; 157 and 13; 412 classes, respectively. The dataset
also includes a set of UMLS-based reference (ground truth) mappings for
evaluating the systems, with 6; 026 of them marked by \=" and 2; 982 marked by \?".
Mappings marked by \?" will cause logical con icts after alignment, so they are
regarded as neither positive nor negative in evaluation. We construct the
complementary corpus for FMA-SNOMED task by utilizing class labels from the
most recent version of the original SNOMED5. Note that the complementary
labels are functional in ne-tuning but not prediction. To examine the scenario
when baseline systems can also use these additional labels, we consider the
extended task FMA-SNOMED+, where the input ontology SNOMED is extended
to SNOMED+ by incorporating these labels.</p>
        <p>BERTMap Settings. We set up di erent corpus settings for training. Recall
the corpus notations in Section 3.1, the unsupervised and semi-supervised
learning settings are distinguished by including co (cross-ontology corpus) or not, and
io (intra-ontology corpus) is always considered. In the unsupervised learning
setting, 80% of the ne-tuning corpus are used for training and 20% for validation.
The nal mapping prediction is evaluated on the full set of reference mappings.
In the semi-supervised learning setting, the training data is formed by
incorporating all the unsupervised ne-tuning data and co constructed from 20% of
the reference mappings. We use an additional co constructed from 10% of the
5 The version of 20210131 from https://www.nlm.nih.gov/healthit/snomedct/index.html.
reference mappings as the validation set. We take the remaining 70% as the test
mappings for evaluating mapping prediction. Note that here validation is di
erent from testing because the former concerns ne-tuning while the latter concerns
mapping prediction. We also examine the impact of ids (identity synonyms) on
both tasks, and the impact of cp (complementary corpus) on FMA-SNOMED.
In implementation, we consider all the synonyms in the positive sample set, and
randomly sample 2 soft non-synonyms and 2 hard non-synonyms for each
synonym in io, and 4 soft non-synonyms for each synonym in co. We perform the
same negative sampling procedure on cp as on io because cp is also a corpus
derived from one (external) ontology. As a result, the positive-negative ratio is
consistently 1 : 4 for all the corpus settings. For settings that consider ids, we
sample the corresponding number of non-synonyms to keep this ratio.</p>
        <p>
          We adopt Bio-Clinical BERT which has been pretrained on biomedical and
clinical domain corpora [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We ne-tune the BERT model for 3 epochs with a
batch size of 32, and evaluate it on the validation set for every 0:1 epoch, through
which the best checkpoint (on the cross-entropy loss) is selected for testing. The
input maximum length is set to 128. In prediction, the number of candidates
selected using the sub-word inverted index is set to 200. Our implementation
uses owlready26 and transformers7.
        </p>
        <p>Baselines. We compare BERTMap with the following baselines:
1. String-match. It sets the mapping value of two classes c and c0 to 1:0 if
(c) T (c0) 6= ;, and to 0 otherwise.
2. Edit-similarity. Given a source class c, it predicts the target class c0 and
the corresponding mapping value by arg maxc02 (c) nes( (c); (c0)), where
nes( ; ) refers to the maximum normalized edit similarity between the labels
of c and c0, and (c) denotes the candidates of c that are selected in the same
way as BERTMap. Note that string-match is a special case of Edit-similarity.
3. Mean-embeds and Cls-embeds. BERT outputs token embeddings fbert(x; l)
at layer l (see (1)). Mean-embeds (the mean token embedding model)
extracts the mean of all the token embeddings of the last layer L, denoted as
fbert(x; L), as the embedding of a class label, and calculates the cosine
similarity of two classes as their mapping value. Note that the embeddings of
multiple labels of a class are averaged. Cls-embeds (the class token
embedding model) is the same except that it considers the class token embedding
of the last layer, i.e., vC(LL)S , as a class label's embedding. As in BERTMap,
string-match is also rst considered before calculating the cosine similarity.
4. LogMap, AML and LogMapLt. LogMap and AML are two lexical matching
and reasoning based systems with leading performance in many OAEI tracks
and other tasks. Since LogMap and AML consider the neighbourhoods and
relevant logical axioms of two classes while BERTMap at the current stage
only considers the class labels, we additionally introduce LogMapLt, which
only uses the lexical matching part of LogMap, for comparison.</p>
        <sec id="sec-4-1-1">
          <title>6 https://owlready2.readthedocs.io/en/latest/. 7 https://huggingface.co/transformers/.</title>
          <p>We illustrate the resulting Precision, Recall and Macro-F1 scores in Table 1 and
2. Note that, in the semi-supervised learning setting, we measure the
performance only on the test mappings. We search for the optimal combination of
mapping set (src2tgt, tgt2src and combined) and the corresponding mapping
value threshold that leads to the best Macro-F1 score on the validation
mappings (10%) for the semi-supervised models. We select the best combination on
full mappings for the unsupervised models due to the shortage of validation
mappings. Nevertheless, we will later illustrate that BERTMap models are robust to
mapping threshold selection, and we will obtain similar results if a reasonable
validation set is provided. For the baselines, we also select the best mapping
set-threshold combination for each of them.</p>
          <p>The overall results show that BERTMap attains the best F1 score (typically
with high recall) among all the systems for both tasks. On the FMA-SNOMED
task, the best unsupervised BERTMap model surpasses AML (resp. LogMap) by
2:0% (resp. 4:6%) in F1, while the best semi-supervised BERTMap model exceeds
AML (resp. LogMap) by 3:7% (resp. 6:1%). The corresponding statistics become
1:8% (resp. 1:0%) and 2:9% (resp. 2:3%) on the FMA-SNOMED+ task.</p>
          <p>The string-match and edit-similarity baselines perform much better on the
FMA-SNOMED+ task than the FMA-SNOMED task because they rely on the
su ciency of class labels in input ontologies, whereas BERTMap can learn from</p>
          <p>Unsupervised
Semi-supervised
Baselines
Unsupervised
Semi-supervised
Baselines
external resources. Edit-similarity is consistently better than string-match
because it has already considered all the string-match cases, but it is still worse
than LogMap and AML. Note that we also apply string-match for the
meanembeds and cls-embeds models before calculating the cosine similarity between
the source and target classes' embeddings. Still, the results are merely better
than the string-match baseline. This suggests that directly using pretrained
BERT to encode class embeddings and calculate their distance in vector space is
not adequate|we need ne-tuning to utilize the BERT embeddings e ectively.</p>
          <p>
            Compared to LogMap's lexical matcher, LogMapLt, BERTMap performs
better than 50% on FMA-SNOMED and 6% on FMA-SNOMED+, implying the
potential of BERTMap to become more powerful when it is extended to
incorporate structural and logical information. For example, we can adjust the mapping
values by taking the alignment of neighbouring classes into account. We can also
apply the reasoning-based ontology repair module [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] to prune the mapping set.
          </p>
          <p>Regarding the BERTMap settings, we observe that when the input
ontologies have su cient labels (i.e., on the FMA-SNOMED+ task), considering
intraontology corpus alone has already yielded promising results. Also, BERTMap has
better performance when it incorporates the cross-ontology and complementary
corpora|especially on the FMA-SNOMED task|where the SNOMED
ontology is de cient in labels. Including the identity synonyms can also improve the
performance, but not that prominent compared with without.</p>
          <p>Finally, in Figure 3, we illustrate the e ect of the mapping value threshold
on BERTMap under two settings, i.e., io + co + ids on FMA-SNOMED and
io + ids on FMA-SNOMED+. We can observe that in all cases the highest F1
scores are achieved when is very close to 1:0, and as increases, Precision grows
signi cantly while Recall does not drop much. This suggests that BERTMap is
robust to selecting an appropriate .
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Discussion</title>
      <p>In this study, we investigate ontology alignment using a contextual
embeddingbased model, BERTMap, that exploits the ontologies' text semantics. Rather
than using a complex combination of machine learning and hand-crafted features,
we construct a straightforward BERT ne-tuning task that learns the meanings
of class labels, and we apply the resulting classi er to mapping prediction, which
leads to promising results on two biomedical ontology alignment tasks. BERTMap
is suitable for real-world applications because it supports both unsupervised
and semi-supervised modes, and can well incorporate external materials when
the input ontologies are incomplete in class labels. As part of our future work,
we aim to develop the mapping extension and repair modules so as to make
BERTMap a full- edged ontology alignment system.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by the SIRIUS Centre for Scalable Data Access
(Research Council of Norway, project 237889), Samsung Research UK, Siemens AG,
and the EPSRC projects AnaLOG (EP/P025943/1), OASIS (EP/S032347/1),
UK FIRES (EP/S019111/1) and the AIDA project (Alan Turing Institute).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alsentzer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boag</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weng</surname>
            ,
            <given-names>W.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jindi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naumann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDermott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Publicly available clinical BERT embeddings</article-title>
          .
          <source>In: Proceedings of the 2nd Clinical Natural Language Processing Workshop</source>
          . pp.
          <volume>72</volume>
          {
          <issue>78</issue>
          (Jun
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwiatkowski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , Collins,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Toutanova</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>BoolQ: Exploring the surprising di culty of natural yes/no questions</article-title>
          .
          <source>In: Proceedings of NAACL-HLT</source>
          . pp.
          <volume>2924</volume>
          {
          <issue>2936</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proceedings of NAACLHLT</source>
          . pp.
          <volume>4171</volume>
          {
          <issue>4186</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stuckenschmidt</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Ontology alignment evaluation initiative: Six years of experience</article-title>
          .
          <source>J. Data Semant</source>
          .
          <volume>15</volume>
          ,
          <issue>158</issue>
          {
          <fpage>192</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Faria</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pesquita</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmonari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cruz</surname>
            ,
            <given-names>I.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Couto</surname>
            ,
            <given-names>F.M.:</given-names>
          </string-name>
          <article-title>The agreementmakerlight ontology matching system</article-title>
          . In: Meersman,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Panetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Dillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Eder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Bellahsene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Ritter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>De Leenheer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.)
          <article-title>On the Move to Meaningful Internet Systems: OTM 2013 Conferences</article-title>
          . pp.
          <volume>527</volume>
          {
          <fpage>541</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Tool support for ontology design and quality assurance (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Iyer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          , H.:
          <article-title>Veealign: a supervised deep learning approach to ontology alignment</article-title>
          .
          <source>In: OM@ISWC</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cuenca Grau</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Logmap: Logic-based and scalable ontology matching</article-title>
          . In: Aroyo,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Welty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Alani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Taylor</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kagal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blomqvist</surname>
          </string-name>
          , E. (eds.)
          <source>The Semantic Web { ISWC 2011</source>
          . pp.
          <volume>273</volume>
          {
          <fpage>288</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Evaluating mapping repair systems with large biomedical ontologies</article-title>
          .
          <source>In: Description Logics</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kolyvakis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalousis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiritsis</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>DeepAlignment: Unsupervised ontology matching with re ned word vectors</article-title>
          .
          <source>In: Proceedings of NAACL-HLT</source>
          . pp.
          <volume>787</volume>
          {
          <issue>798</issue>
          (Jun
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>In: ICLR</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Neutel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boer</surname>
          </string-name>
          , M.D.:
          <article-title>Towards automatic ontology alignment using bert</article-title>
          .
          <source>In: AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Euzenat</surname>
          </string-name>
          , J.:
          <article-title>Ontology matching: State of the art and future challenges</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>25</volume>
          (
          <issue>1</issue>
          ),
          <volume>158</volume>
          {
          <fpage>176</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Vaswani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parmar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
          </string-name>
          , L.u.,
          <string-name>
            <surname>Polosukhin</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Attention is all you need</article-title>
          . In: Guyon,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.V.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Vishwanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          . vol.
          <volume>30</volume>
          . Curran Associates, Inc. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhagavatula</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilhelm</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ammar</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Ontology alignment in the biomedical domain using entity de nitions and context</article-title>
          .
          <source>In: Proceedings of the BioNLP 2018 workshop</source>
          . pp.
          <volume>47</volume>
          {
          <issue>55</issue>
          (Jul
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlsson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lou</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Single-/multi-source crosslingual NER via teacher-student learning on unlabeled data in target language</article-title>
          .
          <source>In: Proceedings of ACL</source>
          . pp.
          <volume>6505</volume>
          {
          <issue>6514</issue>
          (Jul
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuster</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norouzi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macherey</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krikun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macherey</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klingner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gouws</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kato</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kudo</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazawa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurian</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patil</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Young</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riesa</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudnick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hughes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Google's neural machine translation system: Bridging the gap between human and machine translation</article-title>
          .
          <source>CoRR</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.W.:</given-names>
          </string-name>
          <article-title>SentiBERT: A transferable transformer-based architecture for compositional sentiment semantics</article-title>
          .
          <source>In: Proceedings of ACL</source>
          . pp.
          <volume>3695</volume>
          {
          <issue>3706</issue>
          (Jul
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>