<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Research and development of linguo-statistical methods for forming a portrait of a subject area</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Oleg V. Zolotarev ANO HE «Russian New University»</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The project aims to solve the fundamental scientific problem of semantic modeling, within the framework of which a methodology is developed for the automated identification of translation links (translation correspondences), as well as hierarchical, synonymous and associative links from Internet texts and the construction of multilingual associative hierarchical portraits of subject area (MAHPSA), in particular, on autonomous uninhabited underwater vehicles (UUV). Accounting for multilingual and heterogeneous resources allows you to get a more complete picture of what is happening in the subject area, to identify the sources of the origin of ideas, the speed and directions of their distribution, to identify significant documents and promising directions. The solution to the problem is based on an integrated approach that combines the methods of statistics, corpus linguistics and distributive semantics, and is implemented in technology that involves the development of linguo-statistical mechanisms for the formation of a multilingual associative hierarchical portrait of a subject area, which is a dictionary of significant terms of the subject area, the elements of which organized in synonymous series (synsets), including translational correspondences, as well as associative and hierarchical relationships.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The growth of volumes on the Internet significantly
complicates the search for information. Using semantic
search, comparing multilingual documents will allow you
to find new interesting trends and ideas, which will
significantly reduce the cost of developing and
popularizing new areas in science. Using a multilingual
associative hierarchical portrait of a subject area when
comparing documents will allow us to compare texts not
only on the basis of matching phrases included in these
documents, but also on the matching of the described
objects and processes. MAHPSA allows you to determine
the semantic similarity of documents even if the
documents do not have common words that are included
in both documents. MAHPSA allows you to calculate the
integrated statistics of a multilingual collection, determine
significant documents and promising areas without
translating documents into one of the languages. This is
important for the automatic processing of a large number
of documents (Big Data). The construction of MAHPSA
will provide an opportunity not only to compare
documents and search for new ideas, but also to solve other
problems associated with the rapid analysis of a large
amount of information.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Technique of automatic formation of a multilingual associative-hierarchical portrait of a subject area</title>
      <p>
        The essence of the proposed method for the formation
of a multilingual associative-hierarchical portrait of a
subject domain consists in iteratively expanding the initial
multilingual dictionary of significant phrases to the
hierarchy of multilingual synonymous series (synsets).
The method can be stated as the following algorithm:
1) Compiling a collection of multilingual texts by means
of a directed search in the databases of scientific
documents (for example, Dimensions) by keywords;
2) Word processing by means of the Pullenti program,
tokenization and metatoke nization;
3) Automatic generation of glossaries of terms and
megalemms; expert quality control of generated
dictionaries;
4) Automatic selection of topics on the basis of thematic
modeling methods, the formation of a dictionary of
subject areas, the selection of many keywords of
subject areas, expert control, topic correction;
5) The formation of a dictionary of key terms mapped to
topics;
6) Compilation of frequency dictionaries of domain
terms (using statistical methods);
7) Compilation of frequency dictionaries of subject
domain megalemmas;
8) Building multilingual synsets by combining BabelNet
resources and a megalemma dictionary;
9) Building SVPs using a neural network model (a
combination of Word2Vec with multilingual
recurrent neural networks RNN) for texts that have
undergone preprocessing;
10) Performing hierarchical clustering using Word2Vec
and RNN, taking into account the hierarchical
relationships of synsets;
11) The construction of an ordered list of candidates for
hierarchical relationships from associative
connections of the neural network model; viewing and
correction of hierarchical relations is implemented on
the basis of the Keywen Knowledge Architect
resource [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology for calculating integral statistics based on MAHPSA</title>
      <p>MAHPSA is created automatically on the basis of
statistical analysis of large volumes of texts from the
Internet. The hierarchical connections that make up the
MAHPSA form a hierarchy and classifier that facilitate the
search and navigation in the multilingual subject area of
the UUV.</p>
      <p>The proposed methodology also includes the
integration of various MAHPSA s with multilingual
linguistic resources (WordNet, Wikipedia, BabelNet, etc.)
to obtain the largest multilingual ontology with relevant
knowledge and improved coverage of terminology in the
subject areas under consideration. The combined (integral)
ontology contains a hierarchy of synonymic series
(synsets) of multilingual terms, including Russian, and
serves as the basis for constructing a single multilingual
vector space that allows us to evaluate the semantic
proximity of multilingual texts, synsets and terms, similar
to NASARI and MAFFIN methods. The translation
correspondences between the multilingual synsets of
MAHPSA are built using Word2Vec technology. Integral
ontology allows you to calculate integrated multilingual
statistics and trends in the use of terms and ideas, which
allows you to predict the distribution of ideas between
languages and determine promising directions. A measure
of the semantic proximity of multilingual documents
allows you to identify implicit links between documents
and determine significant documents, which is necessary
to collect high-quality information from the open Internet
and build large relevant multilingual corpuses of texts for
the subject area. Thus, increasing the size and quality of
integral ontology will allow us to build a better similarity
measure and subject corpus of texts, extracting knowledge
from which in turn will further increase the size and
quality of integral ontology.</p>
      <p>The methodology includes not only the identification
of significant documents, but also the identification of
trends and the identification of promising areas for the
development of science.</p>
      <p>To develop the first version of the integrated statistics
methodology based on MAHPSA, it is necessary to do the
following:
1) Conduct morphological, syntactic and partially</p>
      <p>semantic analysis of the text;
2) Select typed objects - named entities;
3) Identify formal elements for the presentation of</p>
      <p>concepts;
4) Develop a structure and software for storing a</p>
      <p>multilingual collection of documents;
5) Create dictionaries for storing structured information;
6) Develop neural network algorithms for calculating
integrated statistics based on MAHPSA.</p>
      <p>The first version of the program has been developed
for highlighting interlingual implicit connections and
assessing the semantic similarity of phrases in different
languages.</p>
      <p>
        Text processing is carried out using the program
PullEnti [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This is a unique product that wins the
computer linguistics competitions held as part of the
Dialogue conference.
      </p>
      <p>Pullenti is a linguistic processor developed at the
Institute of Informatics Problems, which is constantly
being refined and allows morphological, syntactic and
partially semantic analysis of the text, distinguishing typed
objects - named entities.</p>
      <p>Pullenti SDK includes the following main blocks:
1) Tokenization: breakdown into words (tokens) as</p>
      <p>
        adjusted (Fig. 1 [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">2-12</xref>
        ]);
2) Morphological analysis: definition for tokens of parts
of speech (this is a POS-tagger - Part of Speech, which
gives out all possible options for a word form
regardless of its surrounding context). Languages are
Russian, Ukrainian and English. There is
normalization, reduction of the word form to the
desired case \ gender \ number, and there is also
processing of unknown and new words, and there is
also a mode for correcting errors (Fig. 2 [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">2-12</xref>
        ]);
3) Selection of named entities [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] (NER - Names Entity
      </p>
      <p>
        Recognition): a lot of so-called analyzers that find
entities of the corresponding type (person,
organization, geographical objects, etc.) in sequences
of tokens (Fig. 3 [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">2-12</xref>
        ]);
4) A lot of tools for working with numerical data,
nominal and verb groups, brackets and quotation
marks, dictionaries of terms and abbreviations,
various checks (for example, equivalence of strings in
Latin and Cyrillic letters) and other useful features
that appeared during the solution of practical
problems (Fig. . 4 [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">2-12</xref>
        ]);
5) Derivative dictionary: a dictionary of the so-called
derivative groups (many same-root words, but
different parts of speech, and one group contains
words in different languages), group management
model (what can come after a group), synonymy, etc.;
6) Semantic representation: tokens are structured in the
form of a graph with semantic connections to solve
more complex problems related to meaning [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY
4.0)
      </p>
      <p>Fig. 2. Morphological analysis</p>
      <p>Specially for this project, the linguistic processor has
been modified so that it is possible to more accurately
highlight implicit links in documents</p>
      <p>The concept of a token (Token base class) is at the
heart of the Pullenti SDK model. Each token refers to a
merged fragment of the source text (BeginChar and
EndChar positions). First, the text is divided into a
sequence of text tokens (TextToken), and then during
processing they are converted - merging into meta-tokens
(MetaToken). A metatoken is a token that has "absorbed"
a fused
sequence
of other tokens.</p>
      <p>Metatokens, for
example, represent places of occurrence of named entities
(ReferentToken) in the text. Metatokens can represent
various numerical data (lowercase spelling of numbers),
name groups (in the example, NounPhraseToken is the
inherited
elements received and used during the analysis are
metatokens.</p>
      <p>The concept of PullEnti megatokens served as the basis
for building dictionaries of megalemmas, each of which
can
consist
of
several tokens
or
megatokens.</p>
      <p>The
megalemma is the basis for comparing meaningful phrases
from different languages, i.e. the concept of megalemma
is broader than the concept of megatoken, since it
additionally includes identifying connections between
different languages.</p>
      <p>
        Megalemma dictionaries are constructed using the
method for determining the proximity of terms [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. It is
this method that allows us to form
      </p>
      <p>megalemmas on the
basis of statistical patterns of occurrence of terms in the
framework of the formation of an associative-hierarchical
portrait of a subject area.</p>
      <p>Thematic dictionaries of megalemmas are formed by
subject areas and serve as the basis for the classification of
texts. Megalemma dictionaries are also used to represent
knowledge in ontologies and automatically supplement
them with relevant vocabulary.</p>
      <p>
        The formal element for the presentation of concepts
was chosen synset. This is the basis of knowledge
representation in systems such as Wordnet, Babelnet and
others. This is a well-established and generally accepted
concept [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Synsets can chain together (megalemmas
include synsets).
multilingualism.
      </p>
      <p>Thus megalemmas are presented - these are chains of
synsets. The concept of synset is initially oriented toward</p>
      <p>The
work</p>
      <p>was carried out in two subject areas
“computer graphics and visualization” and “autonomous
uninhabited underwater vehicles”.</p>
      <p>
        Algorithms for the semantic analysis of information
have been developed [
        <xref ref-type="bibr" rid="ref10 ref11 ref15 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">2-11, 15</xref>
        ]. Prototypes of software
components for semantic analysis of textual information
have been developed too.
      </p>
      <p>Implicit links are searched using the megalemma
dictionary. First, the text is processed using the PullEnti
program, normalization of words in the text, selection of
named
entities
(NER
- named
entity</p>
      <p>recognition),
formation of dictionaries of tokens and megatokens for the
text are performed. Next, a thematic analysis of the text is
carried
out
using
megalemma
dictionaries. In</p>
      <p>the
dictionaries of megalemmas, as already mentioned, there
is a correlation of each</p>
      <p>megalemma with a specific
document and with a specific subject area. This allows the
classification of texts in subject areas and a statistical
analysis of documents for the presence of implicit
references. According to the publication date of the
document, the source document of the megalemma and the
document that has a link to the megalemma are
determined.</p>
      <p>
        To control the quality of automatic detection of
implicit links, methods of collective intelligence and
crowdsourcing were used [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. It was proposed to conduct
a quality check for the detection of implicit links using an
expert approach.
the mathematical model:
      </p>
      <p>The probability of a positive decision is determined by
 = /</p>
      <p>=
  =
    = ( −   )</p>
      <p />
      <p>In accordance with this formula, the probability K0 of
a positive decision by a group of M</p>
      <p>experts with the
probability of the correct GR solution for one expert is
determined</p>
      <p>by this formula. The analysis of expert
estimates showed a rather high level of revealing implicit
links and determining the semantic similarity of phrases
and documents.</p>
      <p>There
was</p>
      <p>developed
multilingual
collection</p>
      <p>of
implementation of thematic
software
documents.
modeling
for storing a</p>
      <p>A</p>
      <p>
        software
methods using
dictionaries of megalemmas in subject areas has been
developed [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>As a result of processing collections of documents,
dictionaries of terms and dictionaries of megalemmas are
built. Statistics is collected for the use of terms and
megalemmas by articles.</p>
      <p>BabelNet is an integration resource based on the
following resources: WordNet, Wikipedia, OmegaWiki,</p>
      <sec id="sec-3-1">
        <title>Wiktionary, Wikidata,</title>
      </sec>
      <sec id="sec-3-2">
        <title>Wikiquote, VerbNet, Microsoft Terminology, GeoNames, ImageNet, FrameNet, WN</title>
      </sec>
      <sec id="sec-3-3">
        <title>Map, Open</title>
      </sec>
      <sec id="sec-3-4">
        <title>Multilingual</title>
      </sec>
      <sec id="sec-3-5">
        <title>WordNet,</title>
      </sec>
      <sec id="sec-3-6">
        <title>WoNeF, Albanet,</title>
      </sec>
      <sec id="sec-3-7">
        <title>Arabic WordNet ( AWN v2), BulTreeBank WordNet (BTB-WN), Chinese Open WordNet, Chinese WordNet (Taiwan), DanNet, Greek WordNet, Princeton WordNet,</title>
        <p>Persian WordNet, FinnWordNet, WOLF (WordNet Libre
du</p>
      </sec>
      <sec id="sec-3-8">
        <title>Français),</title>
      </sec>
      <sec id="sec-3-9">
        <title>Hebrew</title>
      </sec>
      <sec id="sec-3-10">
        <title>WordNet, Croatian</title>
      </sec>
      <sec id="sec-3-11">
        <title>WordNet,</title>
      </sec>
      <sec id="sec-3-12">
        <title>IceWordNet , MultiWordNet, ItalWordNet, Japanese</title>
      </sec>
      <sec id="sec-3-13">
        <title>WordNet,</title>
      </sec>
      <sec id="sec-3-14">
        <title>Multilingual Central</title>
      </sec>
      <sec id="sec-3-15">
        <title>Repository,</title>
      </sec>
      <sec id="sec-3-16">
        <title>Bahasa, Open</title>
      </sec>
      <sec id="sec-3-17">
        <title>Dutch</title>
      </sec>
      <sec id="sec-3-18">
        <title>WordNet, Norwegian</title>
      </sec>
      <sec id="sec-3-19">
        <title>WordNet</title>
      </sec>
      <sec id="sec-3-20">
        <title>WordNet,</title>
        <p>plWordNet, OpenWN-PT, Romanian WordNet, Lithua.</p>
      </sec>
      <sec id="sec-3-21">
        <title>BabelNet</title>
        <p>is
fully
integrated
with</p>
      </sec>
      <sec id="sec-3-22">
        <title>BabelFly's</title>
        <p>
          multilingual lexical ambiguity and entity binding system.
BabelNet is also integrated with Wikipedia's bitaxonomy
[
          <xref ref-type="bibr" rid="ref20">20</xref>
          ],
which is
built
around
two
hierarchies: page
hierarchies and category hierarchies [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-23">
        <title>Integration</title>
        <p>with</p>
        <p>
          BabelNet will be carried out by
analogy with the approach that BabelNet uses to integrate
with other (described above) resources, using automatic
display and filling of lexical gaps in languages with limited
resources using statistical machine translation. The result
is an “encyclopedic dictionary” that provides concepts and
named
entities
lexicalized
in
many
languages
and
associated with a large number of semantic relations [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-24">
        <title>Additional vocabulary</title>
        <p>and
definitions are
added
by
reference to free networks such as WordNet, OmegaWiki,</p>
      </sec>
      <sec id="sec-3-25">
        <title>English</title>
        <p>Wiktionary, Wikidata, FrameNet, VerbNet and
others. Like WordNet, BabelNet groups words in different
languages into sets of synonyms called Babel synsets. For
each Babel syntax, BabelNet provides short definitions
(called glosses) in
many languages, taken from
both</p>
      </sec>
      <sec id="sec-3-26">
        <title>WordNet and Wikipedia.</title>
        <p>
          In the future, it is planned to use the Babelscape
product [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], which allows us to analyze documents,
perform semantic markup of texts, build semantic
knowledge graphs in several languages, etc., but this issue
requires additional careful study [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>The dictionaries of terms and megalemmas proposed
within the framework of the project allow not only to
classify texts, but also to define implicit links between
articles.</p>
        <p>The structure of the glossary is represented by a tuple:</p>
        <p>Dterm = &lt; IDterm, Term&gt;, (1)
where Dterm is a glossary of terms, IDterm is a term
identifier in a dictionary, Term is a term.</p>
        <p>The structure of the megalemma dictionary is
represented by a tuple:</p>
        <p>Dmeg = &lt; IDmeg, MegL&gt;, (2)
where Dmeg is the megalemma dictionary, IDmeg is the
megalemma identifier in the dictionary, MegL is the
megalemma.</p>
        <p>The structure of the document dictionary is represented
by a tuple:</p>
        <p>Ddoc = &lt;IDdoc, NAMEdoc, SRCdoc, YEARdoc, NUMwrd&gt;, (3)
where Ddoc is the document dictionary, IDdoc is the
document identifier in the dictionary, NAMEdoc is the
document name, SRCdoc is the publication source,
YEARdoc is the publication year, NUMwrd is the total
number of terms in the document.</p>
        <p>The structure of the domain dictionary is represented
by a tuple:</p>
        <p>Dsa = &lt; IDsa, SA&gt;, (4)
where Dsa is the domain dictionary, IDsa is the domain
identifier in the dictionary, SA is the domain name.</p>
        <p>While the Dterm dictionary is a general glossary of
terms, dictionaries of documents contain the terms of the
document and the frequency of occurrence of the term in
the document. The same thing applies to the dictionary of
megalemmas. These two dictionaries are associative tables
in the database. An associative table in the database
implements a relationship between many-to-many entities.</p>
        <p>The structure of the dictionary of terms of the
document is represented by a tuple:</p>
        <p>Dtd = &lt; IDterm, IDdoc, Fterm&gt;, (5)
where Dtd is the dictionary of terms of the document,
Fterm is the relative frequency of occurrence of the term
in the document, calculated as follows: first, all
insignificant words are removed from the document (stop
words, rare words, etc.), only the terms remain, then the
frequency of occurrence of the term is divided by the total
number of terms in the document.</p>
        <p>The structure of the dictionary of megalemmas of the
document is represented by a tuple:</p>
        <p>Dmd = &lt; IDmeg, IDdoc, Fmeg&gt;, (6)
where Dmd is the dictionary of megalemmas in the
document, Fmeg is the relative frequency of megalemma
in the document, calculated as follows: the frequency of
megalemma is divided by the total number of
megalemmas in the document.</p>
        <p>The structure of the keyword dictionary is represented
by a tuple:</p>
        <p>Dkeywrd = &lt; IDterm, IDsa&gt;, (7)</p>
        <p>Keywords are taken from a general vocabulary of
terms and compared with the subject area. This is also an
associative table.</p>
        <p>The structure of the dictionary of document correlation
with a subject area is presented below.</p>
        <p>Ddsa = &lt;IDdoc, IDsa&gt;, (8)
where Ddsa is a dictionary of subject areas of a document.
One document can belong to several subject areas.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>
        A program was developed to implement methods for
modeling topics and to identify implicit links between
documents [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The megalemmas' dictionary is used to
determine implicit references. The task is to determine the
source of the megalemma and link to it. A storage structure
and methods for constructing a multilingual collection of
synsets - synonymous series are developed.
      </p>
      <p>
        A neural network algorithm was developed using tags
/ tokens (flagging) and the Word2vec method modified by
the team of authors, already described, to identify
Russianspeaking terms in texts that are similar in context of lexical
meaning [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
      </p>
      <p>The methodology for constructing forecasts for the
development of new directions includes the ratio of the
relative frequencies of occurrence of the same
megalemmas calculated over adjacent years. This
approach eliminates the problem of retraining neural
networks in connection with the accumulation of
information.</p>
      <p>
        The analysis of clustering methods and thematic
modeling to assess the quality / significance of texts
carried out [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Various thematic modeling methods are
considered, including the vector model, latent semantic
analysis, latent Dirichlet placement, and others. The basis
of these methods is a probabilistic approach, i.e.
correlation of a term or document with several topics with
a certain degree of probability. The disadvantage of this
approach is the automatic formation of a list of topics.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>As a result of this scientific research, a number of
results will be obtained that have high scientific and
applied significance:
1. The updated actual multilingual collection of
scientific texts in various languages, containing more
than 60 thousand scientific documents and having
more than 6 thousand internal bibliographic
references. This collection will allow us to accurately
calculate the significance of documents using the
scientific citation index (SCI) by the number of
bibliographic references, as well as using the context
scientific citation index (CSCI), calculated by the
number of implicit references identified through the
semantic similarity of texts.
2. The developed technique for the automatic formation
of a multilingual associative-hierarchical portrait of a
subject area (MAHPSA) containing a hierarchy of
multilingual synonymous series (synsets). With the
help of MAHPSA, it is possible to solve a wide range
of problems, including calculating the semantic
similarity of texts, identifying multilingual
plagiarism, expanding queries in multilingual search.
3. The developed methodology and algorithms for
calculating integrated multilingual statistics based on
MAHPSA, including the identification of significant
documents, trends and promising areas. Because of
applying the technique to a multilingual collection,
new concepts will be revealed, the dynamics of their
development over time will be considered, and
promising areas for the development of the subject
area will be constructed. Based on this, it will be
possible to build forecasts of promising areas of
research.
4. The developed methodology for integrating
MAHPSA with other ontologies and linguistic
resources, including BabelNet, which contains
millions of multilingual synsets. As a result, the
shortcomings of BabelNet related to the low level of
coverage of Russian terms will be overcome. For
integrated resources, updated ratings of the
significance of documents will be calculated and
updated forecasts of promising areas of research in
selected subject areas will be constructed.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>The reported study was funded by RFBR according to
the research projects № 18-07-00225, 18-07-00909,
1807-01111, 19-07-00455 and 20-04-60185.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Galbraith</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Thayer</surname>
          </string-name>
          ,
          <article-title>SECSH Public Key File Format, draft-ietf-secsh-publickeyfile-01</article-title>
          .txt,
          <year>March 2001</year>
          ,
          <article-title>work in progress material</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharnin</surname>
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klimenko</surname>
            <given-names>S.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            <given-names>K.I.</given-names>
          </string-name>
          <article-title>PullEnty system - information extraction from natural language texts and automated building of information systems</article-title>
          . In the collection:
          <article-title>Situational centers and information-analytical systems of class 4i for monitoring and security tasks (SCVRT2015-16)</article-title>
          .
          <source>Proceedings of the International Scientific Conference: in 2 volumes</source>
          .
          <year>2016</year>
          . P.
          <volume>28</volume>
          -
          <fpage>35</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozerenko</surname>
            <given-names>E.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharnin M.M.</surname>
          </string-name>
          <article-title>The principles of constructing models of business processes in the subject area based on natural language text processing</article-title>
          .
          <source>Bulletin of the Russian New University. Series: Complex systems: models, analysis and control</source>
          .
          <source>2014</source>
          . No. 4. P.
          <volume>82</volume>
          -
          <fpage>88</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          <article-title>Methods and tools for domain modeling</article-title>
          .
          <source>In the collection: The Civilization of Knowledge: Problems and Prospects of Social Communications Proceedings of the XIII International Scientific Conference</source>
          .
          <year>2012</year>
          . P.
          <volume>71</volume>
          -
          <fpage>72</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Zolotareva</surname>
            <given-names>V.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yashkova</surname>
            <given-names>N.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          <article-title>Project management</article-title>
          . Educational-methodical manual / Nizhny Novgorod,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          <article-title>Formalization of knowledge about the subject area based on the analysis of natural language structures. In the collection: The civilization of knowledge: the problem of man in science of the XXI century</article-title>
          .
          <source>Proceedings of the XII International Scientific Conference</source>
          .
          <year>2011</year>
          . P.
          <volume>78</volume>
          -
          <fpage>80</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharnin M.M.</surname>
          </string-name>
          <article-title>Methods of extracting knowledge from natural language texts and building business process models based on the allocation of processes, objects, their relationships and characteristics</article-title>
          .
          <source>In the collection: Proceedings of the International Scientific Conference CPT2014. Institute of Computing for Physics and Technology</source>
          .
          <year>2015</year>
          .P.
          <volume>92</volume>
          -
          <fpage>98</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Sharnin</surname>
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Somin</surname>
            <given-names>N.V.</given-names>
          </string-name>
          <article-title>Extracting and processing knowledge from unstructured texts of the business sphere and social networks. In the collection: Social computing: fundamentals, development technologies, social and humanitarian effects Materials of the Fourth International Scientific</article-title>
          and Practical Conference.
          <year>2015</year>
          . P.
          <volume>364</volume>
          -
          <fpage>371</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozerenko</surname>
            <given-names>E.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharnin M.M.</surname>
          </string-name>
          <article-title>Analytical intelligence based on the analysis of unstructured information from various sources, including the Internet and the media</article-title>
          .
          <source>Bulletin of the Russian New University. Series: Complex systems: models, analysis and control</source>
          .
          <source>2015</source>
          . No 1. P.
          <volume>49</volume>
          -
          <fpage>54</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          <article-title>New approaches in constructing the functional structure of the subject area. In the collection: Twenty Years of Post-Soviet Russia: crisis phenomena and modernization mechanisms materials of the XIV All-Russian Scientific</article-title>
          and Practical Conference of the Humanitarian University: in 2 volumes. Humanitarian University. Ekaterinburg,
          <year>2011</year>
          . P.
          <volume>639</volume>
          -
          <fpage>643</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharnin</surname>
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klimenko</surname>
            <given-names>S.V.</given-names>
          </string-name>
          <article-title>A semantic approach to the analysis of terrorist activity on the Internet based on thematic modeling methods</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Zolotarev</surname>
            <given-names>O.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharnin</surname>
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klimenko S</surname>
          </string-name>
          .V. Bulletin of the Russian New University. Series:
          <article-title>Complex systems: models, analysis and control</article-title>
          .
          <source>2016</source>
          . No. 3. P.
          <volume>64</volume>
          -
          <fpage>71</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Kozerenko</surname>
            <given-names>E. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            <given-names>K. I. Romanov D. A.</given-names>
          </string-name>
          <article-title>Semantic processing of unstructured textual data based on the linguistic processor PullEnti Informatics and applications 2018 volume 12 issue 3</article-title>
          . DOI:
          <volume>10</volume>
          .14357/19922264180313, pp.
          <fpage>91</fpage>
          -
          <lpage>98</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Chiu</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Nichols</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Named entity recognition with bidirectional lstm-cnns</article-title>
          .
          <source>arXiv preprint arXiv:1511</source>
          .
          <fpage>08308</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Peters</surname>
            <given-names>M. E.</given-names>
          </string-name>
          et al. Deep contextualized word representations //arXiv preprint arXiv:
          <year>1802</year>
          .05365. -
          <fpage>2018</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          and
          <article-title>Simone Paolo Ponzetto</article-title>
          .
          <year>2012a</year>
          .
          <article-title>BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic net-work</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>193</volume>
          :
          <fpage>217</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>John</surname>
            <given-names>Hebeler</given-names>
          </string-name>
          , Matthew Fisher, Ryan Blace,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Perez-Lopez. Semantic Web Programming</surname>
          </string-name>
          . - John Wiley &amp; Sons,
          <year>2009</year>
          . -
          <fpage>648</fpage>
          с.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>V.I.</given-names>
            <surname>Protasov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.E.</given-names>
            <surname>Potapova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.O.</given-names>
            <surname>Mirakhmedov</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.M. Sharnin</surname>
          </string-name>
          , Minasyan V.B.
          <article-title>Methods for finding solutions by a group actor with a low probability of error. In the collection of CPT2019. Materials of the international scientific conference of the Nizhny Novgorod State University of Architecture and Civil Engineering and the Scientific and Research Center for Information in Physics</article-title>
          and Technique.
          <year>2019</year>
          ,
          <string-name>
            <given-names>Nizhny</given-names>
            <surname>Novgorod</surname>
          </string-name>
          . P.
          <volume>284</volume>
          -
          <fpage>291</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Brickley</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guha</surname>
            <given-names>R.V.</given-names>
          </string-name>
          <article-title>RDF vocabulary description language 1.0: RDF schema W3C working draft</article-title>
          .
          <year>2002</year>
          . http://www.w3.org/TR/2002/WD-rdf-schema20020430/.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Ehrmann</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cecconi</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vannella</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>McCrae J.P.</given-names>
            ,
            <surname>Cimiano</surname>
          </string-name>
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Navigli</surname>
          </string-name>
          <string-name>
            <surname>R</surname>
          </string-name>
          .
          <article-title>Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0</article-title>
          .
          <string-name>
            <surname>- LREC</surname>
          </string-name>
          (
          <year>2014</year>
          ). -
          <fpage>2014</fpage>
          . - URL: http://wwwusers.di.uniroma1.it/~navigli/pubs/ LREC_2014_Ehrmannetal.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Flati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vannella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pasini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <article-title>Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project</article-title>
          .
          <article-title>Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL</article-title>
          <year>2014</year>
          ), Baltimore, USA, June 22-27,
          <year>2014</year>
          , pp.
          <fpage>945</fpage>
          -
          <lpage>955</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Ustalov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Panchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>A tool for effective extraction of synsets and semantic relations from BabelNet</article-title>
          .
          <source>В Proceedings - 2017 Siberian Symposium on Data Science and Engineering</source>
          ,
          <string-name>
            <surname>SSDSE</surname>
          </string-name>
          <year>2017</year>
          <article-title>(стр</article-title>
          .
          <fpage>10</fpage>
          -
          <lpage>13</lpage>
          ). [8071954]
          <article-title>Institute of Electrical and Electronics Engineers Inc</article-title>
          . https://doi.org/10.1109/SSDSE.
          <year>2017</year>
          .8071954
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.P.</given-names>
            <surname>Ponzetto</surname>
          </string-name>
          ,
          <article-title>BabelNetXplorer: a platform for multilingual lexical knowledge base access and exploration</article-title>
          ,
          <source>in: Companion Volume totheProceedings of the 21st World Wide Web Conference</source>
          , Lyon, France,
          <fpage>16</fpage>
          -20
          <source>April</source>
          <year>2012</year>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Lau</surname>
            <given-names>J.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Newman</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karimi</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            <given-names>T</given-names>
          </string-name>
          . Best Topic Word Selection for Topic Labelling // COLING'10
          <source>Proceedings of the 23rd International Conference on Computational Linguistics. Stroudsburg</source>
          , PA: Association for Computational Linguistics,
          <year>2010</year>
          . Pp.
          <volume>605</volume>
          -
          <fpage>613</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Google</given-names>
            <surname>Cloud</surname>
          </string-name>
          Machine Learning [CD] - https://cloud.google.com/mlengine/docs/tutorials/python-guide.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Xie</surname>
            <given-names>Pengtao</given-names>
          </string-name>
          , Xing Eric P.
          <article-title>Integrating document clustering and topic modeling</article-title>
          .
          <source>arXiv preprint, arXiv:1309.6874</source>
          .
          <year>2013</year>
          . Zolotarev Oleg V.,
          <string-name>
            <surname>Ph</surname>
            .D., Docent,
            <given-names>ANO HE</given-names>
          </string-name>
          «Russian New University» (Moscow, Russia),
          <article-title>E-mail: ol-zolot@yandex</article-title>
          .ru
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>