Domain-specific Taxonomy Enrichment based on
               Meta-Embeddings

    Mikhail Tikhomirov1[0000−0001−7209−9335] and Natalia V. Loukachevitch1

                         Lomonosov Moscow State University
                                 Moscow, Russia
                            tikhomirov.mm@gmail.com
                                louk nat@mail.ru


       Abstract. In this paper we study the use of meta-embeddings approaches,
       which combine several source embeddings, for the taxonomy class pre-
       diction of new terms. We test the proposed approach in the information-
       security domain in the task of enriching the Ontology on Natural Sci-
       ences and technologies (OENT). We show that autoencoder-based meta-
       embeddings with triplet loss achieve the best results in the task. The
       highest results are obtained on combination of in-domain and out-of-
       domain embeddings.

       Keywords: Taxonomy · Hypernym prediction· Meta-embeddings


1    Introduction

Ontologies, knowledge graphs in the majority of domains have a taxonomy as a
backbone. Relations in taxonomies usually comprise class-subclass relations be-
tween concepts, or instance-class relations connecting a specific entity represen-
tation and a concept [3,13]: relations of both types can be called IS-A relations
or hypernym relations [24]. Development of an ontology in a new domain usually
begins from constructing its taxonomy, which determines the ontology scope.
    It make it easier to build a taxonomy, various approaches were proposed
for extracting hypernym relations for new terms from texts including specific
patterns, word co-occurrences, distributional characteristics of words, and others
[27]. Currently, the important component of extracting hypernym relations from
texts are vector representations (embeddings) of words, which can provide an
additional evidence of semantic similarity between words [12,18,28], which is
important for identification of hypernym concept for a new term.
    Word vectors can be calculated using various text collections and various
methods, which means that different vector representations capture the context
in different ways, resulting in a wide variety of vector representations for the
same words. From here, we can be suppose that some combinations of vectors,
so-called meta-embeddings [8], can improve vector representation of words, which
allows achieving better prediction of semantic similarity between words or their
hypernym concepts.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).


                                           285
    It was already shown that meta-embeddings improved performance in word
analoqy and similarity tasks [41,8,7]. In recent work [38], it was shown that
combinations of general word embeddings calculated on large Internet text col-
lections have substantial impact on the performance in taxonomy enrichment of
general lexical-semantic resources such as WordNet [24] and RuWordNet [20].
    In this paper we show that in the task of extracting taxonomic relations in
a specific domain, meta-embeddings combining general (out-of-domain) and in-
domain embeddings significantly improve the hypernym concept prediction from
the given taxonomy for a new term. We experiment on an information-security
text collection, which is used to enrich the Ontology on Natural Sciences and
Technologies [11,36] in the information-security domain.


2   Related Work

Traditional methods for hypernym detection include pattern-based methods,
searching for specific hypernym patterns in sentences [15,30,31], methods based
on similarity of word vector representations [12,18], and also combined ap-
proaches integrating various context and similarity features of words [35,4,34].
    In 2016, the taxonomy enrichment task was organized as a shared task at
SemEval workshop (task 14) [16]. At this task, the participants should to attach
words with definitions to correct hypernyms in WordNet [24] using their defini-
tions. In 2020, a new open evaluation on taxonomy enrichment of the Russian
wordnet RuWordNet [20] RUSSE’2020 was organized [27]. The task was to find
correct hypernyms from an older RuWordNet version for words described in a
newer RuWordNet version.
    In the RUSSE-2020 evaluation of predicting RuWordNet hypernym synsets
for new words [27], the participants used various word embeddings (static –
fastText [5], word2vec [21], and contextualized - BERT [10]), the available Ru-
WordNet taxonomy structure, hypernym and co-hyponym patterns, definitions
of words from Wiktionary, and global search engines results [2,9,37,27].
    Recent methods to hypernym extraction exploit graph-based representations
of taxonomy structure. Liu et al. [19] use node2vec embeddings of graph struc-
tures [14] for taxonomy induction. Aly et al. [1] use hyperbolic Poincare embed-
dings [26] for automatic generation of taxonomies. Graph convolutional networks
(GCNs) [17] are applied to the link prediction task on large knowledge bases. In
[29], the authors study graph-based representation methods on the Diachronic-
wordnets dataset, which contains several English and Russian WordNet versions
and correct answers of links of words from newer versions to concepts of older
versions.
    Most current approaches of taxonomy enrichment are based on vector repre-
sentations of new words and existing concepts in a taxonomy [28]. To improve
vector representations, the combined approaches of some source vector repre-
sentations such as vector concatenation or averaging can be used [8]. In [41] it
was shown that using singular value decomposition (SVD) over concatenation


                                      286
of several source vectors can improve the results in several tasks with the ability
to control the final vector size.
    Autoencoders [7], called Autoencoded Meta-Embeddings (AEME), became
a further development of the idea of creating meta-embeddings. In [7], the au-
thors proposed several algorithms (CAEME, AAEME and etc.) for combination
various word vectors in one vector by encoding initial vectors in some meta-
embedding space and then decoding backward. The CAEME approach tries to
reproduce source vectors from the concatenation of encoded representation of
these vectors. In the AAEME approach each vector is mapped to a fixed-size
vector and all encoded representations are averaged, but not concatenated, which
restricts the vector dimension.
    In [25] the authors investigated the performance of the autoencoders depend-
ing on the loss function (MSE loss, KL-divergence loss, cosine distance loss and
also their combinations). They found that there is no evident winner across tasks
and that different loss functions should be chosen for different applications.
    In [38] the best results for enriching taxonomies of general lexical-semantic re-
sources such as WordNet [23] and RuWordNet [20] were achieved using AAEME
encoders with triplet loss, combining fastText, glove and word2vec embeddings
in a single meta-representation. The meta-representation was further used for
training of a supervised model, which also included features from Wiktionary.
    In the current paper we study the performance of meta-embedding approaches
in domain-specific taxonomy enrichment: we experiment with assigning new
terms from the information-security domain to Ontology in natural sciences and
technologies (OENT).


3   OENT Ontology

We study the task of domain-specific taxonomy enrichment using Ontology of
Natural Sciences and Technologies (OENT) [11,36]. The OENT ontology [11]
is presented as a semantic net of concepts and relations between them, each
concept is connected with the set of words and phrases, which can express this
concept in documents (text entries). All text entries of the same concepts can
be called a synset similar to WordNet synsets [23]. For example, ”Mathematical
analysis” concept has Russian text entries (synset) as matematicheskii analis,
matanalis, matan. Synsets in OENT can include different parts of speech: nouns,
adjective, verbs, or adverbs.
    The OENT ontology is used for automatic document analysis in information-
analytical systems, which includes providing of conceptual search, query expan-
sion using the ontology, knowledge-based document categorization etc [36].
    OENT comprises large volumes of concepts and terminology from several
scientific disciplines and technological domains presented as a connected seman-
tic network of concepts with corresponding text entries and relations between
concepts [11]. The ontology was started from extracting terms from specialized
text collections (web-sites, school and university text books) in mathematics,


                                        287
physics, geology, biology, and chemistry. Currently, this initial terminology is
collected in the OENT subset called OENT-lite.
    In further projects, the available conceptual structures were elaborated to
more specific levels, also the terminologies of technological domains such as
oil-and-gas industry, power energy, education policy and techniques, computer
technologies, and information security were added to OENT. The full version of
OENT consists of 106K concepts and 308K single and multi-word terms, while
the OENT-lite consists of 37K concept and 133K terms.
    In the current study we use the OENT-lite ontology and the terminology of
the Information security domain to study the enrichment of the ontology with
a domain-specific terminology via extracting hypernym concept relations from
domain-specific text collections.


4    Taxonomy Enrichment Task

The task of taxonomy enrichment consists in finding an appropriate concept
from a given taxonomy for a new word, which can be considered as a hypernym
or class for this word. Similar to RUSSE-2020 evaluation, the task is to find
a direct (the closest) hypernym concept from the taxonomy [28]. To make the
extraction less restricted but to keep it quite precise, the second order hypernym
concepts (hypernyms of hypernyms) are also considered as correct answers. In
this, we try to simulate the work of knowledge engineers, which should find the
most specific concept in a taxonomy to attach a new domain term.
    The taxonomy enrichment is treated as a ranking task where the correct an-
swers should be in the top of a candidate list [28]. In contrast to the classification
task, the ranking is a more appropriate setting in conditions when the share of
correct answers is much smaller that the overall number of candidates.
    As a subject area for taxonomy enrichment, the information security domain
was chosen, represented by a corpus of 500 thousand texts. For this corpus, a
frequency list was built so that each word occurs at least 50 times.
    The OENTCyber dataset for evaluation hypernym detection was constructed
as follows:

 1. All one-word text entries of concepts from the OENT ontology were selected
    so that they appear in the full version, but are absent in the OENT-lite
    version;
 2. From this list, only words for which the hypernyms are present in the OENT-
    lite were taken.

    As a result, a dataset of 4372 words was obtained, and the task was to
predict hypernyms (OENT concepts) for a given set of words using OENT-
lite and the available corpus of articles from information-security domain. This
dataset contains specific names such as ”chrome”, ”amazon”, ”cisco”, etc and
specific terms such as ”css3”, ”dbscan”, ”dll”, etc.


                                        288
5   Method of Hypernym Concept Prediction
In our approach, we use word embeddings to generate a list of most similar
taxonomy entries (words or phrases from the taxonomy) to the target word ac-
cording to cosine similarity. For each target word, the top 20 taxonomy entries
are considered. The number of elements for consideration was chosen experimen-
tally [38]. For each entry in the similarity list, all corresponding concepts, their
direct and second-order hypernyms are extracted from the taxonomy. They are
considered as candidate concepts to be hypernyms of the target word.
    For candidate hypernym concepts, several features are calculated. Logistic
regression is used to predict the probability of a candidate to be a hypernym of
the target word. The calculated features are as follows:
 – the minimum, average, and maximum similarities of the target word to all
   words of the concept synset;
 – the features based on hyponyms of a candidate concept synset:
     • we extract all hyponyms (lower classes) of the candidate concept;
     • for each word/phrase in each hyponym synset we compute their similar-
       ity to the target word;
     • we compute the minimum, average, and maximum similarity for each
       hyponym synset;
     • we form three vectors: a vector of minimums of similarities, average
       similarities, and maximum similarities of hyponym synsets;
     • for each of these vectors we compute minimum, average, and maximum.
       We use these resulting 9 numbers as features.
 – the minimum, average, and maximum similarity level of the concept in the
   merged candidate list:
     • the level is 0 if the concept was added based on similarity to the target
       word;
     • the level of 1 is for the immediate hypernyms of the word in the similarity
       list;
     • the level of 2 is for the hypernyms of the hypernyms of words in the
       similarity list.
 – the number of occurrences (n) of the concept in the merged candidate list
   and the quantity log2 (2 + n) which serves for smoothing.
    In total, 17 features were calculated. Training data were generated randomly
and automatically from the OENT-lite (thus, the data do not contain test data).
    We use two models for calculating word embeddings: fastText [6] and word2vec
[22]. In order to obtain vectors for words absent in embedding models, the fol-
lowing procedures were carried out:
 – For FastText, embeddings if not in the vocabulary, are obtained in a natural
   way, by calculating vectors by the model itself;
 – For Word2Vec, embeddings are calculated by averaging the vectors of maxi-
   mum prefixes for the constituent words of a multi-word expressions. There is
   a limitation on the minimum length of a prefix word, which is 4 characters;
 – For meta-embeddings, if there is no vector for a word in any model, the
   corresponding source vector is initialized with zeros.


                                       289
6    Meta-Embeddings in Taxonomy Enrichment Task
In our work we compare simple meta-embeddings such as concatenation of source
embeddings and SVD over the concatenation and two variants of autoencoders
generating meta-embeddings: Concatenated Autoencoded Meta-Embeddings
(CAEME) and Averaged Autoencoded Meta-Embeddings (AAEME), which have
shown good results in previous works [7,38].
    Suppose we have two source embeddings s1 (w) and s2 (w), their encoders
E1 (w) and E2 (w) and their decoders D1 (w) and D2 (w). Meta-embedding m(w)
in CAEME is constructed as the L2 -normalised concatenation of two encoded
source embeddings E1 (s1 (w)) and E2 (s2 (w)):

                                E1 (s1 (w)) ⊕ E2 (s2 (w))
                         m(w) =                             ,             (1)
                              ||E1 (s1 (w)) ⊕ E2 (s2 (w)||2
   where ⊕ is the concatenation operation.
   In CAEME, the dimensionality of the meta-embedding space is the sum of
the dimensions of the source embeddings. The AAEME encoder can be seen as
a special case of the CAEME encoder, where the meta-embedding is computed
by averaging the two encoded sources in (1) instead of their concatenation.
Averaging gives the possibility to avoid increasing the dimensionality of the
meta-embedding.
   The AAEME encoder computes the meta-embedding of a word w from its two
source embeddings s1 (w) and s2 (w) as the L2 -normalised sum of two encoded
versions of the source embeddings E1 (s1 (w)) and E2 (s2 (w)):

                                    E1 (s1 (w)) + E2 (s2 (w))
                        m(w) =                                      .                   (2)
                                 ||E1 (s1 (w)) + E2 (s2 (w))||2
    The CAEME and AAEME decoders reconstruct the source embeddings from
the same meta-embedding m(w), thereby implicitly using both common and
complementary information in the source embeddings.
    The overall objective of autoencoder training is given below. Function f
can be any distance or similarity measure as MSE, KL-divergence, or cosine
distance.The coefficients λ1 and λ2 can be used to give different emphasis to the
reconstruction of the two sources.
                                       X
         Lossw (E1 , E2 , D1 , D2 ) =       (λ1 fs1 (w),ŝ1 (w) + λ2 fs2 (w),ŝ2 (w) ), (3)
                                         w
where ŝi (w) - decoded embeddings corresponding to si (w).
    Jointly learning of E1 , E2 , D1 , D2 minimises the total reconstruction error
given by Equation 3. To obtain meta-embedding representations after training,
only the encoders are applied, which convert the input source embeddings into a
meta representation. Further, these meta-embedding vectors are used as vector
representations of words.
    The standard loss function for AEME approaches we used was cosine dis-
tance loss. We have tried variations and combinations between MSE loss, KL
divergence loss and cosine distance loss, and last one works best in our case.


                                             290
    We can impose additional restrictions on AEME models during training. One
of such restrictions is the use of triplet loss.
    The triplet loss function is a loss function for machine learning algorithms
where some basic example (anchor) is compared with positive and negative ex-
amples. The goal is to minimize the difference in distance between base and
positive examples and base and negative examples. In this case, there is often
some margin parameter that controls how much the distance to the negative
example is greater than to the positive one. One of the first formulations of the
triplet loss equivalent approach was introduced in [33] for the metric learning
problem. The use of a similar loss function for modifying algorithms has also
been used in the problems of image similarity [39], face recognition [32], text
classification [40] and other tasks.
    We restrict a word to be closer to the words that are semantically related
to it according to the taxonomy than to a randomly chosen word with some
margin:

                  L(wa , wp , wn ) = max(||m(wa ) − m(wp ))||−
                                                                              (4)
                      ||m(wa ) − m(wn ))|| + margin, 0),
where ||.|| is a distance function, wa is the target word, wp and wn are positive
and negative words, respectively.
   The algorithm of calculating triplet loss is as follows:

1. for each word presented in the taxonomy, we compile a list of semantically
   related words which includes synonyms, hyponyms and hypernyms;
2. at each epoch, we randomly select K positive words from this related words
   set and form a set of K negative words by selecting them randomly from the
   vocabulary;
3. if the word is not presented in the taxonomy, then we cannot form a list
   of related words for it. In this case, we generate positive vectors for it by
   adding random noise to its vector;
4. next, we calculate the triplet margin loss by combining the triplet loss with
   the original loss as α ∗ loss + (1 − α) ∗ triplet loss.

   We use the following parameters for the triplet loss: K = 5, margin = 0.1,
alpha = 0.005. These parameters were selected via grid search with AAEME
algorithm.


7   Experiments

The quality of the approach was evaluated using two general (external) source
vector representations: fastText1 and word2vec2 . Also different meta-embedding
1
  Common Crawl Russian versions from https://fastText.cc/docs/en/crawl-
  vectors.html
2
  Araneum for Russian from http://vectors.nlpl.eu/repository/


                                      291
approaches were investigated: concatenation, SVD over concatenation, CAEME,
AAEME (with and without triplet loss).
     In addition to the two ”external” vector models word2vec and fastText, two
vector models (word2vec and fastText, respectively) were trained on the informa-
tion security text corpus (hereinafter ”internal”). The training parameters were
as follows: window = 3, vector size = 300, epochs = 10, method = skip-gram.
The performance of models trained on the domain corpus and also their combi-
nation with more ”powerful” models was investigated, since 500 thousand texts
is significantly less compared to text collections on which external word2vec and
fastText models were trained.
     For evaluation of hypernym prediction, a traditional measure for ranking
tasks Mean Average Precision measure is used. This measure achieves the max-
imal value equal 1 when all correct answers are located in the beginning of a
ranking list:
                                          PN
                             M AP = N1        i=1 APi ;
                                                                               (5)
                               1
                                   Pn
                         APi = M      i preci × I[yi = 1].

   Another traditional metric for such tasks is Mean Reciprocal Rank (MRR),
depending on positions of the first correct answers. This measure is equal to
maximal value 1, when all first correct answers are located on 1st positions in
ranking lists for all target words.

                                              N
                                        1 X 1
                             M RR =                 ,                          (6)
                                        N i=1 ranki

where rank is the position of the first relevant item in the ranked list.
    Where N and M are the number of predicted and ground truth values,
respectively, preci is the fraction of ground truth values in the predictions from
1 to i, yi is the label of the i-th answer in the ranked list of predictions, and I
is the indicator function.
    In order to evaluate the quality of the approach in such a setting, the de-
scribed methods of constructing meta-embeddings were used, and each vector
model was evaluated separately.
    In case of using the AEME approaches for all vector models, it was necessary
to determine the individual contributions of each vector model when calculating
the loss function. The following weights were obtained experimentally: the weight
of 1.0 for the internal models trained on the corpus, the weight of 5.0 for the
external word2vec model, and the weight of 2.0 for the external fasttext model.
The results can be seen in Table 1 (external models), Table 2 (internal models),
and Table 3 (combination of external and internal models).
    From Tables 1 and 2, we can see that powerful external models calculated on
large text collections still predict hypernym concepts much better than internal,
domain-specific models. In both cases all variants of meta-embeddings better
predict hypernyms than source vectors. The best results are achieved by encoders


                                        292
with triplet loss. The combination of all models (Table 3) achieves the best results
in the hypernym concept prediction. The best prediction results are much higher
than the prediction on any of source models. The results achieved by encoders
are much better than the results of simple approaches (concatenation and SVD).


                             method              MAP     MRR
                    fastText                     0.362   0.407
                    word2vec                     0.375   0.421
                    concat                       0.397   0.446
                    SVD                          0.400   0.447
                    CAEME                        0.391   0.439
                    CAEME triplet                0.398   0.448
                    AAEME                        0.404   0.453
                    AAEME triplet                0.412   0.464

                 Table 1. OENT-lite enrichment: external models


                             method              MAP     MRR
                    fastText                     0.277   0.317
                    word2vec                     0.277   0.316
                    concat                       0.287   0.327
                    SVD                          0.283   0.324
                    CAEME                        0.286   0.325
                    CAEME triplet                0.298   0.339
                    AAEME                        0.280   0.319
                    AAEME triplet                0.295   0.335

                 Table 2. OENT-lite enrichment: internal models


7.1   Analysis of Results

We analysed hypernym predictions for new words for which correct predictions
were not found in the Top-10 of correct answers and found the following cases:

 – Predicted hypernyms correspond to senses missed in the taxonomy. For ex-
   ample, word ”halo” is described in OENT-lite only in the sense of Russian
   helicopter, but also this word can mean a computer game. Predicted hyper-
   nyms include the concept ”computer program” at the first position of the
   candidate list and the ”game” concept at eighth position.
 – A predicted hypernym concept is quite valid and convey another aspect of a
   target word. For example, for verb ”to cache”, the correct answer in OENT is


                                       293
                           method               MAP     MRR
                    concat                      0.386   0.434
                    SVD                         0.387   0.433
                    CAEME                       0.385   0.434
                    CAEME triplet               0.408   0.456
                    AAEME                       0.414   0.463
                    AAEME triplet               0.427   0.479

            Table 3. OENT-lite enrichment: external + internal models


   concept ”data storing”, but the first predicted hypernym concept ”computer
   technology” seems also correct;
 – Predicted hypernyms may be too general in OENT, but more specific in
   predictions, for example for word ”amazon” the correct answers are con-
   cepts: ”American company”, ”company”, ”foreign company”. The predicted
   concepts contain such concept as ”American software company”, ”American
   tech company”. These predicted concepts seem to be more correct;
 – In many cases predictions are very semantically close to correct answers but
   not correct. For example, for word ”CSS3” the correct answers are concepts
   ”document markup language”, ”formal language”. The predicted hypernym
   concepts are ”programming language”, ”scripting language”, ”object-oriented
   language”, ”computer technologies”;
 – there are also numerous examples when too general hypernyms are predicted;
   in some cases predicted conceptr are very far from reasonable answers and
   are difficult for explanation.
    To show that the higher confidence of the model correlates with better results
of hypernym concept prediction, we calculated the plots of dependence of correct
answers on the weight of the first prediction. Figure 1 shows the proportion of a
correct hypernym concept among first 1, 3, 5, and 10 answers depending on the
prediction weight. It can be clearly seen that the higher weight of a predicted
hypernym leads to the higher proportion of correct answers. This means that
model predictions with high predicted weights and absence of correct answers in
the top can be considered as a source of improving hypernym class descriptions
in the ontology.


8   Conclusion
In this paper we considered the problem of adapting the OENT ontology to a spe-
cific domain of information security: for new words from an information-security
text collection, a hypernym concept from the OENT ontology has to be pre-
dicted. We investigated methods for combining different word embeddings in a
single meta-embedding. The meta-embeddings methods included concatenation
of initial embeddings, SVD over the concatenation, two variants of autoencoders
aimed to learn better word embeddings from initial vectors.


                                       294
                                              Prediction ratio
                    top1_ratio
                    top3_ratio
              0.8   top5_ratio
                    top10_ratio

              0.6
      Ratio


              0.4


              0.2


              0.0
                    0.2           0.3   0.4   0.5        0.6     0.7   0.8   0.9   1.0
                                                 LogReg scores


Fig. 1. Ratios between correct and incorrect predictions in: top1, top3, top5, top10
depends on first prediction weight


    We showed that the use of meta-embeddings improves the performance of the
system for the considered datasets. SVD always improves the results compared
to concatenation. Autoencoder-based meta-embeddings achieve the best results
in all cases. It can also be seen that adding the triplet loss improves the results
significantly.
    It has been also shown that the use of vector models trained on specific
domain in combination with the meta-embedding approach can improve the
quality of hypernym concept prediction. It can also be seen that the quality of
the approach on the specific domain is worse then on general domain [38]. We
plan to make OENT-lite and the related hypernym dataset publicly available.

Acknowledgements. The participation of M. Tikhomirov in the reported study
was funded by RFBR, project number 19-37-90119. The work of Natalia Loukachevitch
in the current study (preparation of data for the experiments) is supported by
the Russian Science Foundation (project 20-11-20166).


References
 1. Aly, R., Acharya, S., Ossa, A., Köhn, A., Biemann, C., Panchenko, A.: Every
    child should have parents: A taxonomy refinement algorithm based on hyperbolic
    term embeddings. In: Proceedings of the 57th Annual Meeting of the Associa-
    tion for Computational Linguistics. pp. 4811–4817. Association for Computational
    Linguistics, Florence, Italy (2019)
 2. Arefyev, N., Fedoseev, M., Kabanov, A., Zizov, V.: Word2vec not dead: predicting
    hypernyms of co-hyponyms is better than reading definitions. In: Computational
    Linguistics and Intellectual Technologies: papers from the Annual conference “Di-
    alogue” (2020)


                                                  295
 3. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Scientific american
    284(5), 34–43 (2001)
 4. Bernier-Colborne, G., Barriere, C.: Crim at semeval-2018 task 9: A hybrid approach
    to hypernym discovery. In: Proceedings of the 12th international workshop on
    semantic evaluation. pp. 725–731 (2018)
 5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
    subword information. Transactions of the Association for Computational Linguis-
    tics 5, 135–146 (2017)
 6. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
    subword information. Transactions of the Association for Computational Linguis-
    tics 5, 135–146 (2017)
 7. Bollegala, D., Bao, C.: Learning word meta-embeddings by autoencoding. In: Pro-
    ceedings of the 27th international conference on computational linguistics. pp.
    1650–1661 (2018)
 8. Coates, J., Bollegala, D.: Frustratingly easy meta-embedding–computing
    meta-embeddings by averaging source word embeddings. arXiv preprint
    arXiv:1804.05262 (2018)
 9. Dale, D.: A simple solution for the taxonomy enrichment task: Discovering hyper-
    nyms using nearest neighbor search. In: Computational Linguistics and Intellectual
    Technologies: papers from the Annual conference “Dialogue” (2020)
10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep
    bidirectional transformers for language understanding. In: Proceedings of the 2019
    Conference of the North American Chapter of the Association for Computational
    Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
    pp. 4171–4186. Association for Computational Linguistics (Jun 2019)
11. Dobrov, B.V., Loukachevitch, N.V.: Development of linguistic ontology on natural
    sciences and technology. In: LREC. pp. 1077–1082. Citeseer (2006)
12. Fu, R., Guo, J., Qin, B., Che, W., Wang, H., Liu, T.: Learning semantic hierarchies
    via word embeddings. In: Proceedings of the 52nd Annual Meeting of the Asso-
    ciation for Computational Linguistics (Volume 1: Long Papers). pp. 1199–1209
    (2014)
13. Gómez-Pérez, A., Corcho, O.: Ontology languages for the semantic web. IEEE
    Intelligent systems 17(1), 54–60 (2002)
14. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Pro-
    ceedings of the 22nd ACM SIGKDD international conference on Knowledge dis-
    covery and data mining. pp. 855–864 (2016)
15. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Col-
    ing 1992 volume 2: The 15th international conference on computational linguistics
    (1992)
16. Jurgens, D., Pilehvar, M.T.: SemEval-2016 task 14: Semantic taxonomy enrich-
    ment. In: Proceedings of the 10th International Workshop on Semantic Evaluation
    (SemEval-2016). pp. 1092–1102. Association for Computational Linguistics (Jun
    2016)
17. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional
    networks. arXiv preprint arXiv:1609.02907 (2016)
18. Levy, O., Remus, S., Biemann, C., Dagan, I.: Do supervised distributional methods
    really learn lexical inference relations? In: Proceedings of the 2015 Conference of
    the North American Chapter of the Association for Computational Linguistics:
    Human Language Technologies. pp. 970–976 (2015)


                                         296
19. Liu, N., Huang, X., Li, J., Hu, X.: On interpretation of network embedding via
    taxonomy induction. In: Proceedings of the 24th ACM SIGKDD International
    Conference on Knowledge Discovery & Data Mining. pp. 1812–1820 (2018)
20. Loukachevitch, N.V., Lashevich, G., Gerasimova, A.A., Ivanov, V.V., Dobrov,
    B.V.: Creating russian wordnet by conversion. In: Computational Linguistics and
    Intellectual Technologies: papers from the Annual conference “Dialogue. pp. 405–
    415 (2016)
21. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed represen-
    tations of words and phrases and their compositionality. In: Burges, C.J.C., Bot-
    tou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural
    Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013)
22. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre-
    sentations of words and phrases and their compositionality. In: Advances in neural
    information processing systems. pp. 3111–3119 (2013)
23. Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM
    38(11), 39–41 (1995)
24. Miller, G.A.: WordNet: An electronic lexical database. MIT press (1998)
25. Neill, J.O., Bollegala, D.: Meta-embedding as auxiliary task regularization. arXiv
    preprint arXiv:1809.05886 (2018)
26. Nickel, M., Kiela, D.: Poincar\’e embeddings for learning hierarchical representa-
    tions. arXiv preprint arXiv:1705.08039 (2017)
27. Nikishina, I., Logacheva, V., Panchenko, A., Loukachevitch, N.: RUSSE’2020: Find-
    ings of the First Taxonomy Enrichment Task for the Russian Language. In: Com-
    putational Linguistics and Intellectual Technologies: papers from the Annual con-
    ference “Dialogue” (2020)
28. Nikishina, I., Panchenko, A., Logacheva, V., Loukachevitch, N.: Studying taxon-
    omy enrichment on diachronic wordnet versions. In: Proceedings of the 28th Inter-
    national Conference on Computational Linguistics. Association for Computational
    Linguistics, Barcelona, Spain (December 2020)
29. Nikishina, I., Panchenko, A., Logacheva, V., Loukachevitch, N.: Evaluation of tax-
    onomy enrichment on diachronic wordnet versions . In: Proceedings of the 11th
    Global WordNet conference GWC-2021 (2021)
30. Roller, S., Kiela, D., Nickel, M.: Hearst patterns revisited: Automatic hypernym
    detection from large text corpora. In: Proceedings of the 56th Annual Meeting
    of the Association for Computational Linguistics (Volume 2: Short Papers). pp.
    358–363 (2018)
31. Sabirova, K., Lukanin, A.: Automatic extraction of hypernyms and hyponyms from
    russian texts. In: AIST (Supplement). pp. 35–40 (2014)
32. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face
    recognition and clustering. In: Proceedings of the IEEE conference on computer
    vision and pattern recognition. pp. 815–823 (2015)
33. Schultz, M., Joachims, T.: Learning a distance metric from relative comparisons.
    Advances in neural information processing systems 16, 41–48 (2004)
34. Shwartz, V., Dagan, I.: Path-based vs. distributional information in recognizing
    lexical semantic relations. COLING 2016 p. 24 (2016)
35. Snow, R., Jurafsky, D., Ng, A.Y.: Semantic taxonomy induction from heterogenous
    evidence. In: Proceedings of the 21st International Conference on Computational
    Linguistics and 44th Annual Meeting of the Association for Computational Lin-
    guistics. pp. 801–808 (2006)


                                         297
36. Tikhomirov, M., Loukachevitch, N., Dobrov, B.: Methods for assessing theme ad-
    herence in student thesis. In: International Conference on Text, Speech, and Dia-
    logue. pp. 69–81. Springer (2019)
37. Tikhomirov, M., LOukachevitch, N., Ekaterina, P.: Combined approach to hy-
    pernym detection for thesaurus enrichment. In: Computational Linguistics and
    Intellectual Technologies: papers from the Annual conference “Dialogue” (2020)
38. Tikhomirov, M., Loukachevitch, N.: Meta-embeddings in taxonomy enrichment
    task pp. 681–692 (2021)
39. Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu,
    Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of
    the IEEE conference on computer vision and pattern recognition. pp. 1386–1393
    (2014)
40. Wei, J., Huang, C., Vosoughi, S., Cheng, Y., Xu, S.: Few-shot text classification
    with triplet networks, data augmentation, and curriculum learning. arXiv preprint
    arXiv:2103.07552 (2021)
41. Yin, W., Schütze, H.: Learning word meta-embeddings. In: Proceedings of the 54th
    Annual Meeting of the Association for Computational Linguistics (Volume 1: Long
    Papers). pp. 1351–1360 (2016)


                                        298