=Paper= {{Paper |id=Vol-3324/om2022_LTpaper5 |storemode=property |title=An eye on representation learning in ontology matching |pdfUrl=https://ceur-ws.org/Vol-3324/om2022_LTpaper5.pdf |volume=Vol-3324 |authors=Guilherme Sousa,Rinaldo Lima,Cássia Trojahn |dblpUrl=https://dblp.org/rec/conf/semweb/SousaLT22 }} ==An eye on representation learning in ontology matching== https://ceur-ws.org/Vol-3324/om2022_LTpaper5.pdf
An Eye on Representation Learning in Ontology
Matching
Guilherme Sousa1 , Rinaldo Lima2 and Cassia Trojahn1
1
    Institut de Recherche en Informatique de Toulouse, Toulouse, France
2
    Universidade Rural de Pernambuco, Recife, Brazil


                                         Abstract
                                         Representation learning has received increased attention in the last few years in several tasks, includ-
                                         ing knowledge graph completion, entity resolution, and ontology matching. This paper presents an
                                         overview of representation learning approaches applied to the ontology matching task. It proposes to
                                         classify such approaches into the following dimensions: lexical unit segmentation, training strategy, and
                                         information representation complexity. A discussion on them is presented together with their pros and
                                         cons. Perspectives for further developments are also discussed.

                                         Keywords
                                         Ontology Matching, Representation Learning, Embeddings




1. Introduction
The advances in machine learning have promoted the emergence of new architectures and
methods capable of learning complex patterns on different types of data. Some applications
show impressive results on NLP tasks, such as question answering and summarisation [1],
image generation [2], or still in general tasks such as game playing and image captioning
[3]. In knowledge representation, representation learning methods show significant results in
tasks such as graph completion [4] and link prediction [5]. In ontology matching, a wave of
representation learning systems has appeared in the last few years. Word embeddings were
one of the first representation learning strategies adopted [6]. These models are based on the
distributional hypotheses stating that similar words appear in similar contexts. The well-known
Word2Vec [7] model has been used to compute the semantic similarity of ontology entities that
can improve systems performance compared to classical lexical similarity methods [8].
   Since word embeddings do not consider the ontology structure, better encoding strategies
were designed. The RDF2Vec[9] is used to integrate background knowledge in ontology match-
ing systems such as ALOD2Vec [10] and [11]. It generates sentences by randomly walking the
paths in ontology and uses the Word2Vec algorithm to generate entity embeddings. Besides
its representation capabilities, RDF2Vec does not fully represent OWL constraints, without
considering as well the word composition in labels, harming the model’s generality. To address
these problems, the OWL2Vec model [12] was designed by using a set of rules to map OWL

Woodstock’22: Symposium on the irreproducible science, June 07–11, 2022, Woodstock, NY
Envelope-Open guilherme.santos-sousa@irit.fr (G. Sousa); rinaldo.jose@ufrpe.br (R. Lima); cassia.trojahn@irit.fr (C. Trojahn)
Orcid 0000-0002-2896-2362 (G. Sousa); 0000-0002-1388-4824 (R. Lima); 0000-0003-2840-005X (C. Trojahn)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
predicates to RDF equivalents and considering the word composition of entity labels. Since
these models generate independent embeddings for entities in two different ontologies, post-
processing strategies like the alignment of embeddings using linear transformations [11] or
Siamese networks [13] were used. Besides the performance of random walk-based strategies,
as adopted in RDF2Vec and OWL2Vec, Graph Neural Networks [14] are more flexible and
show impressive performance in generating graph embeddings [15, 16]. The Graph Attention
Network [17], for instance, can filter irrelevant neighbors in entity encoding leading to more
robust similarity metrics.
   For better representation of terminological layers, Transformers [18] and BERT [19] language
models were introduced in ontology matching systems [20, 21, 22]. They have achieved high
performance and capacity of learning language representations in NLP tasks [23]. They have
attracted interest in generating embeddings for entity labels. BERT is a language model based
on the Transformer architecture [18] and has outperformed Recurrent Neural Networks as
they rely on an attention mechanism to filter irrelevant data. BERT enables the architecture
development of huge models with billions of parameters [1]. It is pre-trained by masking some
input tokens and by predicting the next sentence. This task provides the resulting model with
higher context notion capabilities in different NLP tasks by fine-tuning the model for each
task. In ontology matching, transformer-based models are used to learn representations for the
ontology terminological layer (labels, comments, etc).
   The goal of this paper is to review ontology matching systems based on representation
learning. These systems have some properties in common that enable them to be categorized
based on the following dimensions: lexical unit segmentation, training, and information com-
plexity. A discussion on the performance of variant systems to ontology matching is presented
together with their pros and cons. The perspectives for further developments are also discussed.
We start by presenting a categorization framework (§2), followed by the description of the
reviewed proposals (§3). We then discuss their impact on ontology matching (§4) together with
an analysis for further work (§5).


2. Categorisation framework
The works in the literature are categorized into three groups of common features: lexical unit
segmentation, training, and information complexity. In the next sections, these groups are
discussed in detail. The reviewed works are listed in Table 1.

2.1. Lexical unit segmentation
The lexical unit segmentation categorizes the systems in the way they represent the termi-
nological layer of the input ontologies: entity (no tokenization), words, and character. An
example of the categories is illustrated in Figure 1. Entity level refers to the systems where
no tokenization is applied in the terminological layer (id, labels, comments, etc), viewing these
terminological features as a unique symbol (ex. ”New_York_City”). It acts as an identifier and
has low generalization since ontologies may combine the same words in a different order to
represent the same entity. Word level refers to the splitting of the terminological features in
their word components (ex. [”New”, ”York”, ”City”]. In order to generate embeddings with
                          Lexical Unit                Learning                      Information level
 Work                Entity Word Char         Ref.   Trained Pre-trained   No Context  Context Graph     BK
 Zhang [6]                      X                                 X            X
 Xiang [24]                     X                       X                                X
 Kolyvakis [25]                 X                       X         X                                      X
 Kolyvakis [26]                 X                       X         X                                      X
 Nkisi [27]                     X              X        X         X                      X
 Gromann [28]                   X                                 X            X
 Jimenez [29]                   X                       X                      X
 OntoEmma [30]                  X      X       X        X         X                                      X
 Li [31]                        X                       X                                X               X
 HISDOM [32]                    X                                 X                      X
 Tounsi [33]                    X                                 X                      X
 DOME [34]                      X                       X                                X
 Li [8]                         X                                 X            X
 DeepFCA [35]                   X      X                X         X                      X               X
 Bento [36]                            X       X        X         X                      X
 ALOD2Vec [10]         X                                X                                                X
 LogMap* [13]          X        X                       X                                        X
 DAEOM [16]                     X              X        X         X                              X
 BERTMap [20]                   X                       X         X                                      X
 OntoConnect [15]                      X                X         X                              X
 VeeAlign [37]                  X              X        X         X                              X
 AMD [38]              X                                X                                        X
 Fine-TOM [21]                  X              X        X         X            X
Table 1
The reviewed works are categorized according to Lexical Unit Segmentation, Training, and Information
level, sorted by publication year.


the same dimension, an aggregation step needs to be applied like summing or averaging the
embeddings of each word. Most works in ontology matching use this approach since many
pre-trained word embeddings are available and the input ontologies in the same domain share
overlapping vocabulary. Character level is more generalist as it can be applied to any domain
with the same alphabet. This type of approach requires a complex model to generate relevant
representations as word boundaries need to be learned by the combination of characters.




Figure 1: An example of the label ”New York City” segmented in the Entity, Word, and Character levels.
Optional lower casing can be applied.
2.2. Learning
Some works are supervised and need labeled data for training while others are based on pre-
trained models or still on unsupervised learning. Reference refers to those systems that
need reference alignments for training the model. The Training category classifies those
systems that need to fine-tune or learn weights for each ontology before the production of final
alignments. Pre-trained refers to the systems that contain background knowledge in the form
of pre-trained word embeddings or models.

2.3. Information complexity
This group describes how much information is considered in the matching process. The No
Context category describes the systems that do not consider the entity neighborhood. Context
characterizes the systems that consider neighborhood information without considering the
graph structure (e.g., predicates or graph edge directions). The Graph category groups the
architectures that consider the neighbors and edge features of the relations that connect them in
arbitrary depth. The Background Knowledge category groups the systems that can aggregate
information that is not present in the ontology (e.g., external dictionaries or thesauruses).


3. Works on the Information Complexity Category
We chose to describe the works that are grouped according to the category information level,
which is the most discriminant one. This category classifies the works according to four
main features (Table 1): i) considering element-level based strategies (no context); ii) systems
exploiting context; iii) systems using deeper context (graphs); and iv) systems using external
background knowledge.

3.1. Element-level based systems (no context)
This category groups the works that map an entity’s terminological layer to embeddings
without considering its neighborhood. The work of [6] is claimed to be the first work in
ontology matching that used word embeddings, which were learned on Wikipedia data using
two techniques: Word2Vec [39] and Latent Semantic Analysis [40]. Another relevant work using
word embeddings is presented in [8]. The entity label is tokenized and each token is mapped
to a word embedding from a biomedical domain corpus. Despite its simplicity, this method
improved lexical similarities like edit distance. While these works have focused on monolingual
ontologies, the system in [28] uses pre-trained word embeddings to align multilingual ontologies.
A cross similarity between each word embedding, corresponding to the compared entity labels,
is used to build an intermediate vector to produce the final similarity. To deal with the Out Of
Vocabulary (OOV) problem, this work uses the average of the similarity of the present words.
   While few works expose their implementation as modules or libraries, the MELT platform
[41] has support for the use of pre-trained transformers in a pipeline for ontology matching.
FineTOM [21, 22] is a system based on MELT that uses transformers and two pipelines. The first
step is a training pipeline to fine-tune a pre-trained transformer and the second is a matching
pipeline that uses the transformer to measure the similarity between entities. Despite the
capacity of transformer-based models to generate contextualized word embeddings, the TOM
and Fine TOM were classified in the No Context category as these systems don’t consider
information from the entity neighborhood in the entity embedding.
   Finally, with a focus on large ontologies, the work in [29] proposes a technique to cluster
entities using an inverse word index to reduce the required matching space. Two clustering
strategies are applied to generate the subtasks based on the inverse word index. The first is
the random splitting of the entities in clusters of the same size and the second is the use of
the StarSpace [42] embedding model to cluster related entities while learning embeddings for
individual words.

3.2. Systems exploiting context
This category refers to the works that include information about the surroundings of each
entity, without exploring the full graph structure. The work of [27] uses a combination of
automatically learned and manually engineered features containing word embedding similarities.
Four similarity metrics are proposed, combining context and entity labels similarity that are
used to build a feature vector. The final alignment is produced using a random forest classifier
to predict the alignment using the generated features. In [33], word embeddings are used to
represent the entities as the average of the embeddings related to entity labels words and its
context as the average of the embeddings in the entity lineage. The final alignment is generated
by comparing the cosine similarity between the entity and context embeddings on the source
and target ontologies. The system can infer the specific relation between the alignment using
the radius measure of their contexts. Similar work, but using character level, is proposed in
[36]. The entity embedding is generated using a CNN on the characters of the entity labels, its
parents, and children. The character embedding is pre-trained and the last layer of the CNN is
connected to a single neuron with sigmoid activation to predict the alignment probability.
   The architecture ERSOM (Entity Representation and Structure-based Ontology Matching)
[24] uses a stacked auto-encoder to learn features from entities based on different methods
to encode classes, properties, and instances. Each entity representation aggregates context
information based on its surroundings and hidden representations are learned using the auto-
encoder. In the same sense, HISDOM [32] uses different similarity metrics for instances, names,
attributes, structure, and comments. The name similarity is a weighted sum between edit
distance and embedding cosine similarity in the entity labels. In particular, the structural
similarity is calculated as the weighted sum of the Jaccard similarity of the children and parent
concepts, with the comment similarity being calculated embedding the sentence of the entity
comments using a CNN. The final similarity is the weighted sum of all similarities. Finally,
DOME [34] applies a pipeline of filters. The first one is a string similarity matcher that compares
the textual description of entities. The next step is a confidence adjustment using an embedding
generated by Doc2vec [43], a generalization from Word2vec, to calculate cosine similarity. After
instance and property alignment were performed, the class alignment is revisited using the
matched instances as context to the class similarity since classes of matched individuals tend to
be the same.
3.3. Graph as context
Graph-based category refers to the systems representing the ontology graph structure in
depth. The OWL2Vec [12] model is similar to RDF2Vec[9], however, it differs in the lexical
unit segmentation as OWL2Vec combines entity and word representation while RDF2vec only
considers entity lexical units. LogMap* system [13] uses this embedding model, and since the
generated embeddings are independent, a siamese network is used to learn a transformation
that projects the embeddings into the same space.
   OntoConnect [15] uses a recursive neural network to encode the graph structure of the
entities in an unsupervised manner. The entity features are generated using FastText [44],
an embedding model developed by Facebook based on character n-grams. A recursive neural
network based on LSTM [45] is used to embed the graph structure of each entity. Also using
graph neural networks DAEOM [16] (Deep Attentional Embedded Ontology Matching) divides
its architecture into Ontology Attentional Encoder and Mapping Selection. The first step of the
Encoder is the embedding of the terminological descriptions. A fine-tuned architecture based
on BERT [19] is used to embed the words of textual data in the entities. Next, a Graph Attention
Network (GAT) is used to aggregate the graph structure of the entity with its terminological
layer embedding. Finally, VeeAlign [37] is a system that builds entity embeddings using the
Universal Sentence Encoder [46] and aggregates four views to generate the context embedding:
parents, child, properties, and datatype properties. A path and node attention is applied to
select the most important representations. The entity embedding is then concatenated to the
context embedding and passed through a feed-forward layer for down-sampling.
   An example of the use of translational embedding to ontology matching [5] is AMD (Agree-
mentMakerDeep) [38]. This type of embeddings was shown to be able to model some logical
relations like inversions, symmetries, and some compositions that appear in ontologies entities
(ex. Inverse Properties). It uses a modified version of RotatE [47] to generate embeddings of the
ontologies and compare similarities.

3.4. Background knowledge
This category is dedicated to the works that use external knowledge in the matching process.
Most systems use pre-trained embeddings and models as background knowledge as Alod2Vec
[10] which uses embeddings encoded with RDF2Vec [9] on an external dataset. Other systems
like DeepAlignment [25] refine word embeddings using synonyms to increase their semantic
similarity. Pre-trained word embeddings are refined by contrasting synonym samples from
background knowledge and negative samples from entities that are not explicitly stated as
equivalent. BERTMap [20, 48] system also extract synonyms from background ontologies to
fine-tune a BERT architecture. It also includes a mapping repair to increase the quality of final
alignments. A different strategy is adopted in [26], where a Siamese CBOW [49] refines the
word embeddings by extracting paraphrases from background knowledge and a Denoising Auto
Encoder (DAE) [50] to learn entity embeddings.
   In DeepFCA [35] word embeddings and formal concept analysis [51] are adopted. A pre-
trained word embedding is used to map the words in the terminological layer using a character-
level embedding as a fallback if the word is not present. The entity representation is created
by averaging the word vectors refined by synonyms extracted from semantic lexicons and
contrasting them with negative samples from the ontology, guided by a lattice generated from
formal concept analysis.
   The system in [31] uses multi-view embedding, a strategy that combines embeddings from
different views as parent and children ontology concepts, to calculate similarities between
entities. Embeddings are learned using a negative sampling strategy to contrast entity synonyms
extracted from external ontology with random samplings. Finally, OntoEmma [30], as in the
previous work, combines representations from different views to generate entity embeddings.
The first is the name view that concatenates character-level embeddings with pre-trained word
embeddings and is aggregated using a bidirectional LSTM. The aliases, definition, and context
views use the same strategy as the name view without the use of character embeddings and are
enriched with background knowledge. The final alignment similarity is calculated by a neural
network to predict the alignment probability.


4. Discussion
As introduced above, many representation learning strategies have been applied to ontology
matching. Most of the works shown in Table 1 are based on word lexical segmentation and
trained per ontology. They use pre-trained word embeddings and, some of them, rely on
pre-trained models (e.g BERT), which take into account some types of contextual information.
A qualitative analysis of the selected OM systems shows that not only a good capacity for
generalization, but also a higher level of flexibility can be achieved when representation learning
methods were applied. In the remainder of this section, we point out common difficulties found
when applying representation learning methods in ontology matching. In addition, some
improvements to address such difficulties are also discussed.

Lexical Unit Segmentation The use of representation learning in ontology matching has
some common problems that need to be addressed. First, the non-determinism of neural
learning-based methods leads to encoding entities in two different ontologies, that do not share
vocabulary, producing meaningless similarity scores when compared [13]. Some works try to
mitigate this issue by projecting the embeddings of the mapped ontologies into the same space
using linear transformations [11] or learning a projection matrix using siamese neural networks
[13]. The choice of entity lexical unit segmentation, as described in Section 2.1, emphasizes
this problem since similar entity labels with the same words, but in a different order, may lead
to distinct representations with a low similarity between them. Another attempt to solve this
problem is the use of word and character lexical unit segmentation [52, 53]. Moreover, word-
level lexical units are also prone to the known out-of-vocabulary (OOV) issue, when words not
present in the training vocabulary may decrease embedding representation robustness. Thus
the use of character embeddings to generate entity representations may be a promising research
direction to mitigate the OOV issue when combined with other types of learning representations.
Second, the performance of word embeddings in ontology matching is limited by the vocabulary
coverage of the employed embeddings restricting matching systems developers to use domain-
specific word embeddings. Since the alphabet size of some languages is orders of magnitude
lower than the vocabulary size and even different languages may share some part of their
alphabet, the use of characters embedding seems to reduce the occurrence of OOV at the cost
of bigger models [54].

Learning Concerning the choice of learning strategy, one difficulty of applying supervised
representation learning techniques is the low amount of reference alignments present in OM
benchmarking datasets [25], making unsupervised and self-supervised learning valuable tech-
niques to model ontology concepts. However, the major challenge when employing such
learning strategies consists in modeling a direct loss metric that reflects the similarity between
two given entities. In fact, the general notion of similarity can change among ontology domains
and is in the focus of ongoing research [55]. Concerning the training model dimension, recent
works have demonstrated that relying only on word embeddings is not enough to fully address
the matching problem of rejecting false positives so the contextual information derived from
graph structures in ontologies needs to be taken carefully into consideration [16]. Actually,
some graph embedding techniques achieve promising results using models supervised learning
techniques [37, 16]. However, due to the problem of a low amount of labeled data, such em-
bedding techniques may not achieve their highest performance in ontology matching without
some adaptation [56]. One possible direction to deal with the low amount of labeled data is
based on unsupervised graph embedding methods, including translational models [5], such as
AMD [38]. On the other hand, unsupervised graph embedding methods have shown effective
performance in encoding graph structures, especially for the Link Prediction task in Knowledge
Graphs [4]. These models have the advantage of being able to learn logical relations between
entities, including inversion, symmetrical, and some types of relation compositions obeying the
locality principle [57] in ontology matching. However, only relying on this type of embedding
method does not ensure acceptable performance, since it does not enforce high similarity scores
between similar entities belonging to two distinct ontologies.

Information Level The use of embeddings brings the possibility to compare symbolic data
similarities, by analogy, using metrics such as euclidean distance or cosine similarity. For
designing metrics fully adapted to the ontology matching problem, the embeddings should be
organized in a latent space enforcing that similar concepts have representations positioned near
to each other in the same vector space. However, some models, e.g vanilla Auto Encoders do
not enforce this type of organization in latent space. As a result, the learned embeddings of
similar entities are not guaranteed. Some improvements in generative models such as Infogan
[58], Variational Auto Encoder (VAE) [59] and Beta-VAE [60] can learn expressive interpretable
representations and can encode background knowledge information in their models. They
encode specific features in each embedding dimension making similar embeddings be placed
near each other in latent space. Therefore, in ontology matching, these different dimensions
could offer better generalization and interpretability of concepts. An interesting research
direction is the use of these representation methods for encoding ontology entities, allowing the
embedding representations to have similarity metrics suitable to the matching task. Another
direction is to exploit such existing representation learning techniques aiming to generate
expressive alignments. While many systems rely on context and the ontology graph structure
itself, in complex alignment, is difficult to define the boundaries of the entities as they can
be composed of multiple distinct elements. This may cause the need either change existing
architectures or propose new ones to deal with the complex alignment task.


5. Conclusion and Future Work
This paper presented an overview of ontology matching systems from the perspective of repre-
sentation learning. The aforementioned systems were classified according to three categories
revealing the most relevant aspects of them. Our analysis pointed out that not only a good capac-
ity for generalization, but also a higher level of flexibility can be achieved when representation
learning methods were applied. However, there is still room for improvement as the selected
works still do not fully explore all the possibilities that representation learning can provide. In
this direction, new approaches should explore deep learning models with higher parameter
dimensions as well as the integration of background knowledge and model composition.


References
 [1] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan,
     P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in
     neural information processing systems 33 (2020) 1877–1901.
 [2] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, M. Chen, Hierarchical text-conditional image
     generation with CLIP latents, 2022. URL: https://arxiv.org/abs/2204.06125. doi:1 0 . 4 8 5 5 0 /
     ARXIV.2204.06125. arXiv:2204.06125.
 [3] S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez,
     Y. Sulsky, J. Kay, J. T. Springenberg, et al., A generalist agent, arXiv preprint
     arXiv:2205.06175 (2022).
 [4] Z. Chen, Y. Wang, B. Zhao, J. Cheng, X. Zhao, Z. Duan, Knowledge graph completion: A
     review, IEEE Access 8 (2020) 192435–192456.
 [5] S. Ji, S. Pan, E. Cambria, P. Marttinen, S. Y. Philip, A survey on knowledge graphs:
     Representation, acquisition, and applications, IEEE Transactions on Neural Networks and
     Learning Systems 33 (2021) 494–514.
 [6] Y. Zhang, X. Wang, S. Lai, S. He, K. Liu, J. Zhao, X. Lv, Ontology matching with word
     embeddings, in: Chinese computational linguistics and natural language processing based
     on naturally annotated big data, Springer, 2014, pp. 34–45.
 [7] L. Gutiérrez, B. Keith, A systematic literature review on word embeddings, in: International
     Conference on Software Process Improvement, Springer, 2018, pp. 132–141.
 [8] G. Li, Improving biomedical ontology matching using domain-specific word embeddings,
     in: Proceedings of the 4th International Conference on Computer Science and Application
     Engineering, 2020, pp. 1–5.
 [9] P. Ristoski, J. Rosati, T. Di Noia, R. De Leone, H. Paulheim, RDF2Vec: RDF graph embed-
     dings and their applications, Semantic Web 10 (2019) 721–752.
[10] J. Portisch, M. Hladik, H. Paulheim, ALOD2Vec matcher results for OAEI 2020, in:
     Proceedings of the 15th Workshop on Ontology Matching, 2020, pp. 147–153.
[11] J. Portisch, G. Costa, K. Stefani, K. Kreplin, M. Hladik, H. Paulheim, Ontology matching
     through absolute orientation of embedding spaces, arXiv preprint arXiv:2204.04040 (2022).
[12] J. Chen, P. Hu, E. Jimenez-Ruiz, O. M. Holter, D. Antonyrajah, I. Horrocks, Owl2vec*:
     Embedding of owl ontologies, Machine Learning 110 (2021) 1813–1845.
[13] J. Chen, E. Jiménez-Ruiz, I. Horrocks, D. Antonyrajah, A. Hadian, J. Lee, Augmenting ontol-
     ogy alignment by semantic embedding and distant supervision, in: R. Verborgh, K. Hose,
     H. Paulheim, P. Champin, M. Maleshkova, Ó. Corcho, P. Ristoski, M. Alam (Eds.), The
     Semantic Web - 18th International Conference, ESWC 2021, Virtual Event, June 6-10, 2021,
     Proceedings, volume 12731 of Lecture Notes in Computer Science, Springer, 2021, pp. 392–408.
     URL: https://doi.org/10.1007/978-3-030-77385-4_23. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 0 3 0 - 7 7 3 8 5 - 4 \ _ 2 3 .
[14] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, S. Y. Philip, A comprehensive survey on graph
     neural networks, IEEE transactions on neural networks and learning systems 32 (2020)
     4–24.
[15] J. Chakraborty, S. K. Bansal, L. Virgili, K. Konar, B. Yaman, Ontoconnect: Unsupervised
     ontology alignment with recursive neural network, in: Proceedings of the 36th Annual
     ACM Symposium on Applied Computing, 2021, pp. 1874–1882.
[16] J. Wu, J. Lv, H. Guo, S. Ma, Daeom: A deep attentional embedding approach for biomedical
     ontology matching, Applied Sciences 10 (2020) 7909.
[17] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph attention
     networks, stat 1050 (2017) 20.
[18] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo-
     sukhin, Attention is all you need, Advances in Neural Information Processing Systems 30
     (2017).
[19] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional
     transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[20] Y. He, J. Chen, D. Antonyrajah, I. Horrocks, Bertmap: A bert-based ontology alignment
     system, arXiv preprint arXiv:2112.02682 (2021).
[21] L. Knorr, J. Portisch, Fine-TOM matcher results for OAEI 2021, in: Proceedings of the 16th
     Workshop on Ontology Matching, volume 3063, 2022, pp. 144–151.
[22] D. Kossack, N. Borg, L. Knorr, J. Portisch, TOM matcher results for OAEI 2021, in:
     Proceedings of the 16th Workshop on Ontology Matching, 2021, pp. 144–151.
[23] X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, X. Huang, Pre-trained models for natural language
     processing: A survey, CoRR abs/2003.08271 (2020). URL: https://arxiv.org/abs/2003.08271.
     arXiv:2003.08271.
[24] C. Xiang, T. Jiang, B. Chang, Z. Sui, Ersom: A structural ontology matching approach
     using automatically learned entity representation, in: Proceedings of the Conference on
     Empirical Methods in Natural Language Processing, 2015, pp. 2419–2429.
[25] P. Kolyvakis, A. Kalousis, D. Kiritsis, Deepalignment: Unsupervised ontology matching
     with refined word vectors, in: Proceedings of the 2018 Conference of the North American
     Chapter of the Association for Computational Linguistics: Human Language Technologies,
     2018, pp. 787–798.
[26] P. Kolyvakis, A. Kalousis, B. Smith, D. Kiritsis, Biomedical ontology alignment: an approach
     based on representation learning, Journal of Bio. Semantics 9 (2018) 1–20.
[27] I. Nkisi-Orji, N. Wiratunga, S. Massie, K.-Y. Hui, R. Heaven, Ontology alignment based
     on word embedding and random forest classification, in: Joint European Conference on
     Machine Learning and Knowledge Discovery in Databases, Springer, 2018, pp. 557–572.
[28] D. Gromann, T. Declerck, Comparing pretrained multilingual word embeddings on an
     ontology alignment task, in: Proceedings of the eleventh international conference on
     language resources and evaluation (LREC), 2018.
[29] E. Jiménez-Ruiz, A. Agibetov, M. Samwald, V. Cross, Breaking-down the ontology align-
     ment task with a lexical index and neural embeddings, arXiv preprint arXiv:1805.12402
     (2018).
[30] L. L. Wang, C. Bhagavatula, M. Neumann, K. Lo, C. Wilhelm, W. Ammar, Ontology
     alignment in the biomedical domain using entity definitions and context, arXiv preprint
     arXiv:1806.07976 (2018).
[31] W. Li, X. Duan, M. Wang, X. Zhang, G. Qi, Multi-view embedding for biomedical ontology
     matching., Proceedings of the 14th Workshop on Ontology Matching 2536 (2019) 13–24.
[32] J. Liu, Y. Tang, X. Xu, HISDOM: A Hybrid Ontology Mapping System based on Convo-
     lutional Neural Network and Dynamic Weight, in: Proceedings of the 6th IEEE/ACM
     International Conf. on Big Data Computing, Applications and Tech., 2019, pp. 67–70.
[33] M. Tounsi Dhouib, C. Faron Zucker, A. G. Tettamanzi, An ontology alignment approach
     combining word embedding and the radius measure, in: International Conference on
     Semantic Systems, Springer, Cham, 2019, pp. 191–197.
[34] S. Hertling, H. Paulheim, DOME results for OAEI 2019, in: Proceedings of the 14th
     Workshop on Ontology Matching, 2019, pp. 123–130.
[35] G. Li, Deepfca: Matching biomedical ontologies using formal concept analysis embedding
     techniques, in: Proceedings of the 4th International Conference on Medical and Health
     Informatics, 2020, pp. 259–265.
[36] A. Bento, A. Zouaq, M. Gagnon, Ontology matching using convolutional neural networks,
     in: Proceedings of the 12th Language Resources and Evaluation Conference, 2020, pp.
     5648–5653.
[37] V. Iyer, A. Agarwal, H. Kumar, Veealign: Multifaceted context representation using dual
     attention for ontology alignment, in: Proceedings of the 2021 Conference on Empirical
     Methods in Natural Language Processing, 2021, pp. 10780–10792.
[38] Z. Wang, I. F. Cruz, AgreementMakerDeep results for OAEI 2021, in: Proceedings of the
     16th Workshop on Ontology Matching, 2021, pp. 124–130.
[39] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations of
     words and phrases and their compositionality, Advances in neural information processing
     systems 26 (2013).
[40] S. T. Dumais, et al., Latent semantic analysis, Annu. Rev. Inf. Sci. Technol. 38 (2004).
[41] S. Hertling, J. Portisch, H. Paulheim, Matching with Transformers in MELT, arXiv preprint
     arXiv:2109.07401 (2021).
[42] L. Wu, A. Fisch, S. Chopra, K. Adams, A. Bordes, J. Weston, Starspace: Embed all the
     things!, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 32,
     2018.
[43] J. H. Lau, T. Baldwin, An empirical evaluation of doc2vec with practical insights into
     document embedding generation, arXiv preprint arXiv:1607.05368 (2016).
[44] T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in
     vector space, arXiv preprint arXiv:1301.3781 (2013).
[45] Y. Yu, X. Si, C. Hu, J. Zhang, A review of recurrent neural networks: Lstm cells and
     network architectures, Neural computation 31 (2019) 1235–1270.
[46] D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-
     Cespedes, S. Yuan, C. Tar, et al., Universal sentence encoder, arXiv preprint
     arXiv:1803.11175 (2018).
[47] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, Rotate: Knowledge graph embedding by relational
     rotation in complex space, arXiv preprint arXiv:1902.10197 (2019).
[48] Y. He, J. Chen, D. Antonyrajah, I. Horrocks, Biomedical ontology alignment with BERT,
     in: Proceedings of the 16th Workshop on Ontology Matching, 2021, pp. 1–12.
[49] T. Kenter, A. Borisov, M. De Rijke, Siamese cbow: Optimizing word embeddings for
     sentence representations, arXiv preprint arXiv:1606.04640 (2016).
[50] X. Lu, Y. Tsao, S. Matsuda, C. Hori, Speech enhancement based on deep denoising
     autoencoder., in: Interspeech, volume 2013, 2013, pp. 436–440.
[51] B. Ganter, R. Wille, Formal concept analysis: mathematical foundations, Springer Science
     & Business Media, 2012.
[52] R. Sennrich, B. Haddow, A. Birch, Neural machine translation of rare words with subword
     units, in: Proceedings of the 54th Annual Meeting of the Association for Computational
     Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, The
     Association for Computer Linguistics, 2016. URL: https://doi.org/10.18653/v1/p16-1162.
[53] A. Conneau, H. Schwenk, L. Barrault, Y. LeCun, Very deep convolutional networks for
     text classification, in: Proceedings of the 15th EACL 2017, Valencia, Spain, April 3-7, 2017,
     Volume 1: Long Papers, 2017, pp. 1107–1116. URL: https://doi.org/10.18653/v1/e17-1104.
[54] X. Wang, H. Pham, P. Arthur, G. Neubig, Multilingual neural machine translation with
     soft decoupled encoding, arXiv preprint arXiv:1902.03499 (2019).
[55] M. Kulmanov, F. Z. Smaili, X. Gao, R. Hoehndorf, Semantic similarity and machine learning
     with ontologies, Briefings Bioinform. 22 (2021). URL: https://doi.org/10.1093/bib/bbaa199.
[56] T. Zhao, X. Zhang, S. Wang, Graphsmote: Imbalanced node classification on graphs
     with graph neural networks, in: L. Lewin-Eytan, D. Carmel, E. Yom-Tov, E. Agichtein,
     E. Gabrilovich (Eds.), WSDM ’21, The Fourteenth ACM International Conference on Web
     Search and Data Mining, Virtual Event, Israel, March 8-12, 2021, ACM, 2021, pp. 833–841.
     URL: https://doi.org/10.1145/3437963.3441720. doi:1 0 . 1 1 4 5 / 3 4 3 7 9 6 3 . 3 4 4 1 7 2 0 .
[57] A. Solimando, E. Jiménez-Ruiz, G. Guerrini, Detecting and correcting conservativity
     principle violations in ontology-to-ontology mappings, in: International Semantic Web
     Conference, Springer, 2014, pp. 1–16.
[58] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, P. Abbeel, Infogan: Interpretable
     representation learning by information maximizing generative adversarial nets, Advances
     in Neural Information Processing Systems 29 (2016).
[59] J. An, S. Cho, Variational autoencoder based anomaly detection using reconstruction
     probability, Special Lecture on IE 2 (2015) 1–18.
[60] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, A. Lerchner,
     beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, in:
     5th International Conference on Learning Representations, ICLR 2017, Toulon, France,
     April 24-26, 2017, Conference Track Proceedings, 2017.