Search Query Extension Semantics

        Olga Ataeva1[0000-0003-0367-5575], Vladimir Serebryakov2[0000-0003-1423-621X],
                           Natalia Tuchkova3[0000-0001-6518-5817]
1,2,3Dorodnicyn Computing Center FRC CSC of RAS, Vavilov str., 40, 11933, Moscow, Russia
1oli@ultimeta.ru, 2serebr@ultimeta.ru, 3natalia_tuchkova@mail.ru


       Abstract. The problems of extracting the most complete information from the
       semantic library by accounting for related documents are considered. Expert
       knowledge encrypted in the subject area can be made available when the user
       obtains additional information from linked documents. A feature of the ap-
       proach is the use of a shallow neural network algorithm to expand the search
       query in mathematical subject areas, where expert knowledge is available with
       a significant scientific background of users. The solution to this problem can be
       achieved by means of semantic analysis in the knowledge space using machine
       learning algorithms. The paper investigates the construction of a vector repre-
       sentation of documents based on paragraphs in relation to the data array of the
       digital semantic library LibMeta. Each piece of text is labelled. Both the whole
       document and its separate parts can be marked. Since the problem of enriching
       user queries with synonyms was solved, when building a search model in con-
       junction with word2vec algorithms, an approach of “indexing first, then train-
       ing” was used to cover more information and give more accurate results.


       Keywords: Search Model, Word2vec, Synonyms, Query, Query Extension.


 1       Introduction

The history of research the problems of expanding the request for the most complete
coverage of information is quite long [1-8]. The problem itself is directly related to
the understanding of the subject of the search, that is, the level of competence of the
user and the capabilities of the information retrieval system to use expert knowledge.
Ideally, the use of query enhancement and refinement functionality assumes the pres-
ence of the actual data and knowledge base and the ability to reformulate the original
query in order to improve the search result.
   Many approaches have been developed with the advent of artificial intelligence al-
gorithms and corresponding programming tools in this area [9]. The first expert sys-
tem using query refinement technique, Dendral [10, 11] was developed in 1965 for
the analysis of chemical compounds. An example of another system based on medical
expertise was MYCIN [12] presented to the scientific community in 1972. During the
dialogue, MYCIN offered options for the diagnosis and further investigation of the


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).


                                             325
patient. Using about 500 inference rules, MYCIN performed at about the same level
of competence as blood infection specialists and better than general practitioners.
    The next stage of introducing artificial intelligence into knowledge systems is due
to the use of neural network algorithms [13]. Despite the fact that the ideas of creating
mathematical models based on the functioning of biological neural networks have
been developing since 1943 [14], their practical implementation has gained popularity
with the accumulation of digitized data, that is, already in the 21st century. Some
researchers have noted this as a new era in the “partially forgotten” for the time of
artificial intelligence. Search algorithms began to learn [15] on the accumulated que-
ries, accumulate the most frequent of them, as well as the corresponding answers. All
this contributed to an increase in the reaction speed of the search service, the devel-
opment of targeted offers and user tips.
    More complex links and structures are embedded in scientific libraries, which is
dictated by the logic of subject areas and requires more careful processing of links to
provide users with advanced query capabilities [16]. One such subject area is mathe-
matics. It is of interest to study and replenish the mathematical encyclopedia, to iden-
tify unaccounted for semantic relationships of concepts and formulas.
    This work is devoted to the use of shallow neural network algorithms [17] to ex-
pand the search query in mathematical subject areas based on the LibMeta [18] li-
brary, presented in the form of an ontology, and is a continuation of the authors' re-
search in this direction [19-24]. The description of the subject area is terminologically
limited to the terms of the mathematical encyclopedia [25]. As a corpus of texts,
many mathematical articles are considered, which are partially supplied with codes of
thematic          classifiers     MSC        (https://msc2020.org/)        and      UDC
(https://teacode.com/online/udc/) and correspond to a certain structure.
    The LibMeta resources include a thesaurus on ordinary differential equations
(ODE), dictionaries for special functions of equations of mathematical physics. All
dictionaries are semantically linked to a mathematical encyclopedia [25]. These re-
sources are used to analyze semantic relationships.
    This paper presents a search model (part 2), outlines a technique based on the use
of algorithms for vector representation of texts [26-29] (part 3); shows the application
of the search model to add synonyms to a search query (part 4); and examples of
search query extension (part 5), which demonstrate the application of the model to
improve search results, also provides estimates of the completeness and accuracy of
the algorithm, and also shows the process of ranking documents.


 2       Search Model

The construction of the search model in LibMeta is based on three main key points,
namely:

 converting documents to searchable format;
 requests are presented in a format that allows expressing the user's information
  needs;
 the assessment of the compliance of the document with the request.


                                          326
  In our case, for the preparation of documents, preprocessing of full texts was car-
  ried out to remove the publisher's markup and highlight the main parts of the text.
  Then a full-text document index was created, which allows you to efficiently load
  and store data and provide quick access to it. Queries written in natural language
  are used, which can be enriched with synonyms by the system. The assessment of
  the compliance of a document with a request is subjective and depends on the
  method used.

One of the most commonly used document and query presentation models is the vec-
tor space model [26-29]. In this model, one of the models based on artificial neural
networks, both the request and the document are represented by a vector and the dis-
tance between them is measured, which estimates the degree of closeness of the doc-
ument and the request.
    In vector notation, each word is associated with a weight, which can be calculated
in different ways. One of the most commonly used algorithms is the TF-IDF [30]
algorithm, the main idea of which is that the more often a word appears in one docu-
ment, the more important it is. And at the same time, the more common a word is in a
corpus of documents, the less important it is. Another common model is the probabil-
istic model, which is based on an estimate of the likelihood that a document is rele-
vant to a particular query. One of the popular scoring algorithms in this model is
Okapi BM25 [30, 31].
    The main problem of any search model is to provide relevant results in relation to
the user's information needs: from query analysis to ranking search results. This work
is devoted to options for resolving this problem. One of the modern approaches is to
use neural networks for text processing, since text is an example of data that can be
parsed into smaller structures such as paragraphs, sentences, words, etc. depending on
the text. This approach to text processing allows you to capture the semantics of the
text, since closely related words or fragments of text occur in the same context and lie
side by side in vector space. The search model used in this work is based on the vec-
tor representation of words and documents built using the word2vec [27-29] neural
network algorithm [17].
    Integration of neural network and index can be done in the following ways:

 first training on the corpus of texts, then indexing the texts and share them in the
  search;
 indexing first, then training on indexed data and sharing in search;
 first training, then extraction / creation of useful resources by the trained network,
  and then indexing of all resources, both new and original.


                                          327
Fig. 1. Joint use of a search engine and a neural network model built on the basis of an index
using word2vec algorithms.

Since we were solving the problem of enriching user queries with synonyms, in the
LibMeta system we used the “indexing first, then training” approach to provide more
results and more accurate results, based on extended queries on the one hand. On the
other hand, using the extended version of word2vec in conjunction with the LibMeta
search engine, it becomes possible to give users smarter recommendations based on
the documents found. This approach to sharing the index and search engine and neu-
ral network allows for relevant models and ranking functions that adapt well to the
underlying data. The version of the model built on the LibMeta search index using
word2vec algorithms, hereinafter we will be abbreviated as wsgMath.
   Figure 1 schematically illustrates the operation of a search based on a neural net-
work, which receives a query string as input, then returns synonyms to the query us-
ing the model built by word2vec. In another case, a document on a vector representa-
tion can be submitted to the input, which, using the constructed model, gives recom-
mendations in the form of a list of documents similar to it.


 3       Vector Representation of Documents

Studies show [26-29] that vector representations of text are well suited for taking into
account the semantics of words, but the meaning and deep semantics of text docu-
ments depend not only on the meaning of individual words. For this purpose, you
need to study the semantics of phrases and longer text fragments.
   For convenience, we will use the term “paragraph” to denote a paragraph, as such,
but also for fragments of a paragraph or several phrases from the text. As applied to
our field and the specifics of the structure of a mathematical text, these can also be
theorems, lemmas, etc.
   Note that the term “important” fragment will also be used. In scientific texts, this is
an abstract, introduction, conclusion, theorem, etc. This term is defined since the
specified elements of a scientific publication will be used as defining for documents
belonging to a certain subject area.
   Content for research is the resources of the LibMeta digital library [32], where,
along with the accumulated original thesauri and dictionaries (for special functions,


                                             328
ordinary differential equations, mixed equations of mathematical physics), a mathe-
matical library is integrated [25].
   Therefore, to construct wsgMath, taking into account the context for paragraphs,
we used a version of the word2vec algorithm which is a generalization (extension) of
the original doc2vec algorithm [29, 33]. For this, during training, one more compo-
nent is added to the vector. Thus, when training “vectors of the word w”, the “docu-
ment vector d” is also trained, and upon completion of training, we obtain a vector
representation of the document. As a result of the processing of the original content,
the presentation of documents as a set of “related contents” was obtained. “Related
content” is a semantically similar article related to articles from a mathematical ency-
clopedia and thesauri.
   The procedure for highlighting such content will be used to offer the user semanti-
cally related documents. It is essential that without the application of the algorithm for
highlighting related content, such documents will not be displayed in the search re-
sults by request, since they may not contain keywords from the query or not directly
related to a certain subject area in other terms.
   The peculiarity of common search models, such as the vector space model with
TF-IDF, is that they only take into account individual terms. This approach does not
always lead to optimal results because contextual information is discarded. The word
context is understood as N words in the text before the word for which the vector is
constructed, and N words after this word. In contrast to the TF-IDF model, the indi-
vidual elements of the vector are not interpretable, but the distance between the vec-
tors is investigated, which is interpreted as the semantic proximity of words.
   Based on the vector representation, the proximity of the texts is measured. Using
the search index and vector document representation together leverages the ability of
these views to capture the semantics of text when building search models that are well
adapted to the data.
   The main metrics for measuring the proximity of texts are cosine distance and Eu-
clidean distance, which are used to capture semantically similar words, sentences,
paragraphs, etc.


 4       Revealing Synonyms

The analysis of mathematical texts is conventionally considered as the analysis of the
actual mathematical text as a whole, the analysis of formulas as a “separate language”
for the representation of mathematical knowledge and the establishment of semantic
links between the text and formulas. Further, only the analysis of the mathematical
text as a whole is considered.
   To extract synonyms for query terms from the constructed model, lexical and
grammatical templates were used, which are one of the recognized methods for ex-
tracting links from text [34-38]. Based on the idea of using such patterns, we investi-
gated the task of extracting synonyms of concepts and extracting / constructing simple
patterns from them to identify relationships.


                                           329
   The implementation of the model consists in the application of an iterative research
algorithm, which will be called iraWsgMath below. We list its main stages:

     allocation of synonyms of terms

As an example, we will demonstrate the query “Cauchy problem” (For the conven-
ience of the reader, the examples have been translated into English, but the work was
done for texts in Russian. In Russian the considering term is “задача Коши”), which
consists of two words, “problem” and “Cauchy”, each of which has its own syno-
nyms, which are presented in Table 1.

       Table 1. Synonyms for each words of query “Cauchy problem” (задача Коши)

 problem (задача)           Cauchy (Коши)          Cauchy problem (задача Коши)
 equation (уравнение)       Riemann (Риман)        to define (определять)
 inequality (неравенство)   boundary (краевой)     boundary (краевой)

 boundary (краевой)


   The third column presents the query context as one unit Cauchy problem (задача
Коши). Extracting its synonyms, it is clear that the list consists of words where the
adjective boundary falls, which also occurs in the synonyms of individual words in
the first two columns.
   In this case, the term “Cauchy problem” itself has the following synonyms: “Cau-
chy equation”, “Cauchy inequality”, which were determined on the basis of high
estimates of the proximity of the following pairs of synonyms, for example, for a pair
(problem, equation), the proximity estimate is 0.84.
   Note that when constructing synonymous terms, synonyms of the word Cauchy
were not used, since it was defined as a named entity Cauchy based on a dictionary
that includes a list of persons mentioned in the mathematical encyclopedia. But at the
same time, we note that Riemann got into the synonyms of Cauchy.

     determination of classes of synonyms by parts of speech

   Lexical and grammatical templates were used to extract synonyms for query terms
from the constructed model. They are one of the recognized methods for extracting
links from text [34, 35]. Based on the idea of using such patterns, we investigated the
task of extracting synonyms of concepts and extracting / constructing simple patterns
from them to identify relationships. Consider a link extraction pattern based on a sim-
ple adjective <term> pattern that most often indicates generic links. The original term
is a generic concept, and the combination corresponding to the pattern is a specific
concept [36, 37, 38].
   Each word was considered separately, the synonyms are filtered by parts of speech
and a possible synonym (candidate) of the term is formed from them. After that, a
sentence was formed for the term and its synonyms based on the selected templates.


                                         330
   Based on these synonyms, the following sentences were formed, which were ob-
tained in accordance with the pattern “adjective <request term> <synonyms of the
request term>”: [Cauchy boundary value problem, Cauchy boundary equation, Cau-
chy boundary inequality] (In Russian was used word “краевой”).
   When compiling an extended query, it was also proposed to use the adjective
boundary as a synonym, therefore additional queries were used: [Cauchy boundary
value problem, Cauchy boundary equation, Cauchy boundary inequality]. (In Russian
was used word “краевой”).

     selection of patterns of “capture” of links

   The <term> verb pattern was considered to analyze and construct more complex
relationship patterns in the <term> verb <term> thesaurus. Using this pattern to fill
the links with it requires a separate analysis and is beyond the scope of the word2vec-
based algorithm considered in this article.
   In the process of training the model, several verbs were defined to identify patterns
using the considered algorithm. When analyzing context-sensitive synonyms, the list
of emerging verbs was rather limited, which is not surprising due to the specifics of
the subject area. The list of these verbs is limited to such as: apply, use, apply, base,
prove, consider, consider, define, depend, be, embody. Also often used are verbal
nouns formed from the listed verbs: application, use, application, basis, definition.

     improving the “quality of terms” and checking them

   To improve the search for extended domain terms matching templates for verbose
terms in the thesaurus, possible spellings of terms were considered, for example, for
“ordinary differential equation”, the possible options are “ODE”, “ordinary DE”,
etc. All possible spellings were explored as separate terms. Since there are few such
terms in the studied subset of terms, they did not have a significant meaning on the
results.
   Validation of the model and the links it retrieves was performed based on the the-
saurus ODE.
   The problem of synonyms and their extraction using word2vec with search index is
covered in more detail in [24].


 5       Examples

   The combined use of the full-text index [40, 41] and the search model wsgMath
makes it possible to extend the original query with synonyms. Extending queries with
synonyms without wsgMath requires pre-compiled synonym dictionaries. You can
use resources such as WordNet (https://wordnet.princeton.edu/) or RuWordNet
(https://ruwordnet.ru/ru), but the main problem is that synonyms from pre-compiled
dictionaries are not tied to the data being indexed and their use does not improve the
results.
   Figure 2 shows the main steps of forming a model wsgMath for generating query
synonyms in LibMeta content. The query string coming from the full-text search in-


                                          331
terface goes through the Analyzer. Analyzer is a functional part of the model, where
the basic operations for interacting with the wsgMath model are performed. All opera-
tions described in points in the previous section refer to its main functionality.
   The Analyzer splits a string into words, analyzes and transforms them. From the
wsgMath model, synonyms for words are extracted and filtered, an extended query is
formed, with the help of which the corresponding documents are extracted from the
full-text index.


Fig. 2. Joint use of a search engine and a neural network model built on the basis of an index
using word2vec algorithms to generate an extended query with synonyms.

A user's information need is defined as a chain of requests that leads him to the in-
formation he needs. Each subsequent request in this chain is a refinement of the pre-
vious one.
    A real information request, as a rule, consists of an initial request and clarifica-
tions. Let's consider an example, when the primary query leads to excessive infor-
mation noise, and the refinement allows you to get a more pertinent answer, and com-
pare the search results using the wsgMath model and without it. For comparison, sta-
tistical characteristics are calculated (denote score) obtained using the TF-IDF algo-
rithm.
    The example below demonstrates three lists (List 1-3) with different scores de-
pending on how the query is expanded. For example, when searching for the test que-
ry “Cauchy problem”, the user enters the qualifying query “Cauchy boundary value
problem” and finds the information of interest. Based on the fact that the search index
contains 3654 scientific articles, of which only 637 contain a mention of the “Cauchy
problem”. Of these, 59 pieces were selected for the user, since the query words were
found in significant parts of the document (title and annotation). With this approach,
the document of interest to the user is in 18th place. Part of the list is shown below,
“score” value shows how well a document matches the request and is calculated based
on statistical characteristics such as TF-IDF.


                                             332
List 1:

  1. The Cauchy problem for the system of equations of the theory of elasticity and
   thermoelasticity in space
  score = 0.65376675
  2. The Cauchy problem for the system of thermoelasticity equations in space
  score = 0.64415324
  ……………………………………………………….
  18. On the Well-Posedness of a Boundary Value Problem on the Line for Three
   Analytic Functions
  score = 0.5233538
   With the refinement query “Cauchy boundary value problem”, the list of results
looks as shown below, and the document of interest is moved to the fifth position,
while the number of documents satisfying the query text is reduced to 338, while the
user is recommended only 20 of them.

List 2:

  1. Projection procedures for non-local improvement of linearly controlled processes
  score = 0.8902895
  2. On one method of constructing parametric synthesis for a linear-quadratic opti-
   mal control problem
  score = 0.8708762
  …………………………………………
  5. On the Well-Posedness of a Boundary Value Problem on the Line for Three
   Analytic Functions
  score = 0.85024154
    Let us consider the situation when the query “Cauchy problem” is extended by
synonyms and is transformed into the form “boundary”, “problem or equation or
inequality Cauchy” (in Russian: “краевая или граничная”, “задача или уравнение
или неравенство Коши”) using the wsgMath model. In parentheses in an extended
query, synonyms are listed, connected by a logical operation OR. The presence of at
least one of these synonyms is required. This approach insignificantly increases the
completeness of the answer, and the accuracy also increases, therefore, the degree of
satisfaction of the user's need increases. The list of results obtained is displayed below
and the searched document is in second place. The number of documents correspond-
ing to the request is 395 and the user will receive the desired answer already in the
first positions, while the size of the issue by the system is 65.

List 3:

  1. On a positive radially symmetric solution of the Dirichlet problem for one non-
   linear equation and a numerical method for obtaining it


                                           333
  score = 0.9809638
  2. On the Well-Posedness of a Boundary Value Problem on the Line for Three
   Analytic Functions
  score = 0.9587569
  3. On a positive radially symmetric solution of the Dirichlet problem for one non-
   linear equation and a numerical method for obtaining it
  score = 0.9512307

   This example illustrates the effect of this approach already at the level of extending
queries with synonyms based on indexed documents. With this approach, all suggest-
ed synonyms are found in the search engine index and the query extension is guaran-
teed to offer answers to the user's queries.
   The use of the extended version of word2vec (doc2vec or paragraph2vec, in dif-
ferent sources in different ways) [29, 33] allows you to introduce an additional ele-
ment, such as a label for a text fragment or the entire document, and based on the
vectors of these labels, select similar documents not only by the exact match of key-
words or terms, but based on the context of individual fragments or the entire docu-
ment. As an illustration, Fig. 3 shows the main steps of this approach. This feature is
used to issue documents that are close in meaning, which do not appear in the search
results, but may be of interest to the user.


Fig. 3. Joint use of a search engine and a neural network model built on the basis of an index
using word2vec algorithms to generate an extended query with synonyms and refine search
results based on a selection of similar documents.

Let's take a closer look at the process of ranking documents based on the wsgMath
model when searching for similar documents. When a document enters the system, its
current vector representation is retrieved, a search is performed, and the labels of the
nearest documents are returned, the cosine distance of which exceeds a certain thresh-
old, determined experimentally as 0.6. Below is the result of the work on the example


                                             334
of the document, which in the previous example was the desired one. As the closest to
it, 9 documents were found whose cosine distance exceeded 0.6.

List 4:

  1. Some classes of singular integral equations solvable in closed form
  cosineSimilarity = 0.8136491179466248
  2. Riemann's boundary value problem for a half-plane with a coefficient exponen-
   tially decreasing at infinity
  cosineSimilarity = 0.8028532266616821
  3. Algorithm for constructing a quasiregular asymptotic representation of the solu-
   tion of singularly perturbed linear multipoint boundary value problems with fast
   and slow variables
  cosineSimilarity = 0.7246567010879517
  4. Solution in closed form of an integral equation of convolution type in the hyper-
   elliptic case
  cosineSimilarity = 0.6468908786773682
  5. On biorthogonal systems generated by some involutive operators
  cosineSimilarity = 0.6454607248306274
  6. On linear periodic systems in the plane having matrices of the required form
  cosineSimilarity = 0.6165973544120789
  7. On integral equations for the Riemann function
  cosineSimilarity = 0.6134763956069946
  8. Gakhov's equation for an exterior mixed inverse boundary value problem with
   respect to a parameter ...
  cosineSimilarity = 0.6059825420379639
  9. On a nonlinear integral equation of the first kind
  cosineSimilarity = 0.6017340421676636


                                         335
Fig. 4. Joint use of a search engine and a neural network model built on the basis of an index
using word2vec algorithms using attribute search.

In Fig. 4 adds steps that include attribute search and how it interacts with the previ-
ously described search components. Attribute-based search delineates the boundaries
in which documents are searched (by author, by year, etc.), then a transition to full-
text search can be performed on them, and/or its results can also be refined based on
the similarity of documents.


 6       Conclusion

Vector representation of documents is proposed to expand the search query, increase
the coverage of information on demand. It is shown that the quality of an answer to a
request is improved by taking into account semantically close text fragments.
    The model proposed in the work was tested on primary data, namely, arrays of ar-
ticles not systematized by subject matter. Note that the technology of processing and
thematic classification of primary data using machine learning methods has been test-
ed. This technology can be used for the subject classification of the texts of scientific
articles in Russian and the comparison of selected subjects with the English-language
classification by comparing the MSC and UDC classifiers.
    Integration of neural network and search indexes makes possible to give users
smarter results based on the identified relations among documents.


                                             336
   Also, the considered search model can be used for thematic processing, both pri-
mary texts of scientific articles, and already systematized, provided with keywords
and links to classifiers. In the second case, this can help to identify interdisciplinary
research, as well as erroneous assignments of the subject area, since not only second-
ary documents, but also the texts of articles (primary documents) are taken as the
basis for thematic analysis.

Aknowledgement. The work is presented in the framework of the implementation of
the theme of the state assignment “Mathematical methods of data analysis and fore-
casting” FRC CSC of RAS and partially supported by grant #20-07-00324 of the Rus-
sian Foundation of Basic Research.


References
 1. Furnas, G.W., Landauer, T.K., Gomez, L.M., and Dumais, S.T.: The vocabulary problem
    in human-system communication. Commun. ACM, 30(11), 964–971 (1987).
 2. Biswas, G., Bezdek, J., and Oakman, R.L.: A knowledge-based approach to online docu-
    ment retrieval system design. In Proc. ACM SIGART Int. Symp. Methodol. pp. 112–120.
    Intell. Syst. (1986).
 3. Voorhees, E.M.: Query expansion using lexical-semantic relations. 17th Annu. Int. ACM
    SIGIR Conf. Res. Develop. Inf. Retr., Dublin, Ireland (1994).
 4. Buckley, C., Salton, G., Allan, J., and Singhal, A.: Automatic query expansion using
    SMART: TREC 3, presented at the 3rd Text Retr. Conf. (TREC) (1995).
 5. Efthimiadis, E.N.: Query expansion. Annu. Rev. Inf. Sci. Technol., 31(5), 121-187 (1996).
 6. Guarino, N.: OntoSeek: Content-Based Access to the Web, IEEE Intelligent Systems,
    May-June , pp. 70-80 (1999).
 7. Bhogal, J., MacFarlane, A., and Smith, P.: A review of ontology based query expansion,
    Inf. Process. Manage., 43(4), 866-886 (2007).
 8. Qui, Y., Frei, H.: Concept based query expansion. SIGIR '93 Proceedings of the 16th an-
    nual international ACM SIGIR conference on Research and development in information
    retrieval Pittsburgh, Pennsylvania, USA June 27 – July 01, 1993. ACM New York, NY,
    USA, pp. 160–169 (1993) https://doi.org/10.1145/160688.160713.
 9. Berk, A.A.: LISP: the Language of Artificial Intelligence. New York: Van Nostrand Rein-
    hold Company, 1-25 (1985).
10. Lindsay, R.K., Buchanan, B.G., Feigenbaum, E.A., and Lederberg, J.: DENDRAL: A Case
    Study of the First Expert System for Scientific Hypothesis Formation. Artificial Intelli-
    gence, 61 (2), 209-261 (1993).
11. Lederberg, J.: An Instrumentation Crisis in Biology. Stanford University Medical School.
    Palo Alto (1963).
12. Copeland,       B.J.:  "MYCIN".      Encyclopedia      Britannica,   21     Nov.     2018,
    https://www.britannica.com/technology/MYCIN, last accessed 2021/07/27.
13. Gurney, K.: An Introduction to Neural Networks. CRC Press. London and New York
    (1997).
14. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity.
    Bulletin        of     Mathematical       Biophysics        5,      115–133        (1943).
    https://doi.org/10.1007/BF02478259.
15. MachineLearning.ru, http://www.machinelearning.ru/, last accessed 2021/07/27.


                                             337
16. Gavrilova, T.A., Horoshevskij, V.F.: Bazy znanij intellektualnyh sistem. SPb. Piter (2000).
17. Aggarwal, C.C.: Machine Learning with Shallow Neural Networks. In: Neural Networks
    and Deep Learning. Springer, Cham. (2018) https://doi.org/10.1007/978-3-319-94463-0_2.
18. Sererbryakov, V.A., Ataeva, O.M.: Ontology based approach to modeling of the subject
    domain "Mathematics" in the digital library. Lobachevskij Journal of Mathematics. 42(8),
    (2021). pp. 1920–1934.
19. Ataeva, O., Serebryakov, V., Tuchkova, N.: Ontological Approach: Knowledge Represen-
    tation and Knowledge Extraction. Lobachevskii Journal of Mathematics. 41(10), 1938–
    1948 (2020) https://doi.org/10.1134/S1995080220100030 ISSN 19950802.
20. Ataeva O.M., Sererbryakov V.A., Tuchkova N.P.: Mathematical Physics Branches: Identi-
    fying Mixed Type Equations. Lobachevskij Journal of Mathematics. 40(7), 876–886
    (2019) https://doi.org/10.1134/S1995080219070047.
21. Ataeva, O.M., Sererbryakov, V.A., Tuchkova, N.P.: Mathematical Physics Problems: The-
    saurus and Ontology. Selected Papers of the XXI International Conference on Data Ana-
    lytics and Management in Data Intensive Domains (DAMDID/RCDL 2019) Kazan, Rus-
    sia, October 15-18. Vol-2523, pp. 158-168, (2019) http://ceur-ws.org/Vol-
    2523/paper16.pdf.
22. Muromskij, A.A., Tuchkova, N.P.: Predstavlenie matematicheskih ponyatij v ontologii
    nauchnyh znanij. Ontologiya proektirovaniy. 9(1), (31), 50-69 (2019)
    https://doi.org/0.18287/2223-9537-2019-9-1-50-69.
23. Ataeva, O.M., Sererbryakov, V.A., Tuchkova, N.P.: Query Expansion Method Application
    for Searching in Mathematical Subject Domains, 38-48 (2020) http://ceur-ws.org/Vol-
    2543/rpaper04.pdf, last accessed 2021/04/27.
24. Ataeva, O.M., Sererbryakov, V.A., Tuchkova, N.P.: Using Applied Ontology to Saturate
    Semantic Relations. Lobachevskij Journal of Mathematics. 42(8), 1776–1785 (2021).
25. Vinogradov I.M.: Mathematical Encyclopedia, Vol. 1-5, Soviet Encyclopedia, Moscow,
    (1982).
26. Gonçalves, A., Zhu J., Song D., Uren, V., Pacheco, R.: LRD: Latent Relation Discovery
    for Vector Space Expansion and Information Retrieval. Technical Report KMI-06-09.
    Conference: Advances in Web-Age Information Management, 7th International Confer-
    ence,WAIM 2006, Hong Kong, China, June 17-19, 2006, Proceedings                     (2006).
    DOI:10.1007/11775300_11.
27. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representa-
    tions in Vector Space. Proceedings of Workshop at ICLR (2013).
28. Mikolov, T., Yih, W.T., Zweig, C.: Linguistic Regularities in Continuous Space Word
    Representations. Proceedings of NAACL HLT (2013).
29. Le, Q., Mikolov, T.: Distributed Representations of Sentences and Document. Internation-
    al Conference on Machine Learning, pp. 1188-1196 (2014).
30. Manning, C.D., Raghavan, P., and Schütze, H.: Introduction to Information. Retrieval.
    Cambridge Univ. Press, Cambridge (2008).
31. Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond.
    Foundations and Trends in Information Retrieval. 3(4), 333-389 (2009).
    DOI:10.1561/1500000019.
32. Ataeva, O.M., Serebryakov, V.A.: Ontologiya cifrovoj semanticheskoj biblioteki LibMeta.
    Informatics and Applications. 12(1), 2-10 (2018).
33. Lu, Y., Zhai, Y., Luo, J., Chen, Y.: MLPV: Text Representation of Scientific Papers Based
    on Structural Information and Doc2vec, American Journal of Information Science and
    Technology. 3(3), 62-71 (2019) https://doi.org/10.11648/j.ajist.20190303.12.


                                             338
34. Bullinaria, J.A., Levy, J.P.: Extracting Semantic Representations from Word Co-
    occurrence Statistics: A Computational Study, Behavior Research Methods, vol. 39, pp.
    510–526 (2007).
35. Klaussner, C., Zhekova, D.: Lexico-syntactic patterns for automatic ontology building,
    Proceedings of the Second Student Research Workshop associated with RANLP, 109–114
    (2011).
36. Raza, M.A., Mokhtar, R., Ahmad, N., Pasha, M., and Pasha, U.: A Taxonomy and Survey
    of Semantic Approaches for Query Expansion, in IEEE Access, vol. 7, pp. 17823-17833,
    (2019) https://doi.org/10.1109/ACCESS.2019.2894679.
37. Wang, C., Cao, L., Zhou, B.: Medical synonym extraction with concept space models
    https://arxiv.org/abs/1506.00528. (2015), last accessed 2021/07/27.
38. Mitchell, J., and Lapata, M.: Vector-based Models of Semantic Composition (2008).
39. Polozov I.K., Volkova I.A.: Applying word2vec technology to shifter extraction task. In-
    ternational research journal 4-1 (94) (2020).
40. Makinen, V.: Compact suffix array — a space-efficient full-text index. Fundamenta In-
    formaticae 56(1–2), 191–210 (2003).
41. Makinen, V., and Navarro, G.: Compressed full-text indexes ACM Computing Surveys 39,
    (1), 1–79 (2007) https://doi.org/10.1145/1216370.1216372.


                                            339