<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>COLINS-</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>of Natural Language Texts for Various Functional Style Types</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Natalia Sharonova</string-name>
          <email>nvsharonova@ukr.net</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Kyrychenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Gruzdo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Glib Tereshchenko</string-name>
          <email>hlib.tereshchenko@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radioelectronics</institution>
          ,
          <addr-line>Nauky Ave. 14, Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Technical University "KhPI"</institution>
          ,
          <addr-line>Kyrpychova str. 2, Kharkiv, 61002</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>6</volume>
      <fpage>12</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>The paper reviews and analyzes existing solutions for determining the meaning of text documents. The issues and problems of determining the meaning of text documents were discussed. The most used models and algorithms of semantic analysis are considered. In the course of the analysis, it was found that when applying classical models of semantic analysis in practice, there is a partial loss of the meaningful meaning of the text, which in turn is not always justified, although it allows you to perform some procedures, but misses a number of features. A theoretical algorithm for the semantic analysis of natural language texts in a generalized form from existing ones was developed. The process of determining the quality of decisions made to find the degree of semantic proximity in text work has been designed.</p>
      </abstract>
      <kwd-group>
        <kwd>Keywords1</kwd>
        <kwd>Text fragment</kwd>
        <kwd>text information</kwd>
        <kwd>algorithm</kwd>
        <kwd>text analysis</kwd>
        <kwd>text meaning determination</kwd>
        <kwd>method</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>At the present stage of development of information technologies, both throughout the world and in
Ukraine, tasks related to machine search for information [1] are of relevance and special attention is
paid to determining the meaning of text documents and finding dependencies and borrowings.</p>
      <p>To date, to determine the meaning of text documents in natural language, there are two large areas
that try to solve problems related to the semantic perception of the text, namely the construction of
onto-controlled information structures [2] and the definition of the meaning of the text using semantic
analysis [3]. Among them, the most successful and effective is the direction associated with the
semantic analysis of the text. Considering this observation, the article will be a study of the direction of
determining the meaning of text documents using semantic analysis was carried out.</p>
      <p>Under the semantic analysis of the text, a component assessment of the number of words or phrases
that determine the main meaning of the text (semantic core) and statistical indicators inherent in texts
of various types and types will be understood. Considering the functions and style of language and
speech, there are six main functional and stylistic types of texts [4]: colloquial, official-business,
socially informative, scientific, artistic texts and religious works.</p>
      <p>Attention should also be paid to the concept of "language style", since this is what allows you to
highlight additional characteristics of the text, thanks to which you can judge not only the authorship,
but also the specifics of the construction of various types of texts. According to V. V. Vinogradov, "the
style of language is a combination of two factors – "what is said" and "as it is said", that is, it is a
purposeful set of linguistic means [4]. At the heart of the concept of language style is the assessment of
the relationship of the means of expression to the content expressed".
EMAIL:</p>
      <p>2022 Copyright for this paper by its authors.</p>
      <p>When solving the problem of determining the meaning of text documents, it should be remembered
that each text is characterized by certain functional and style characteristics, which determine a certain
choice of means of analysis at a particular time, and interpretations of the understanding of meaning,
which ensure its equivalence. Therefore, one of the subtasks of semantic analysis is the allocation of
functional and style characteristics at the stage of pre-processing of the text. To date, there is no single
classification of functional styles, many scientists have worked and continue to work on this problem,
since this is a very complex and ambiguous issue, both from the point of view of ethics and legal norms.</p>
      <p>The semantics of texts of different functional and style types and in different parts of the document
are different, but at the same time, depending on the type of text itself, it helps to understand the essence
of the text and determine the meaning depending on its specific interpretation within a particular
direction. Therefore, in order to more accurately determine the meaning of text documents, in the
classics, and search engines perform the process of minimizing the amount of "water" (information that
is not the main one but is only used to give the width or volume of the text a certain appearance), but at
the same time there is a loss of some characteristics affecting the test. It should also be noted that,
performing the procedure of semantic analysis, stop words, in other words, repetitions, also try to
remove from the text, since, according to many authors, they complicate the process of determining the
semantics of the text. All this affects the process of semantic analysis and, consequently, the result.</p>
      <p>The classical semantic model consists of the word itself, its definition, examples of combining it
with others and composing phrases, sentences. But in its classical form, it is very difficult to use,
because it requires significant computing resources on large volumes of texts [1]. In turn, this imposes
a number of restrictions on its use in various fields.</p>
      <p>Despite the large number of papers on this subject, the most common are works describing the "bag
of words" (or n-gram bag) method [5]. Another commonly used representation method is the Latent
Dirichlet distribution (LDA) [3]. Most search giants in their developments use an approach called latent
semantic analysis (LSA), or latent-semantic indexing (LSI).).</p>
      <p>A relatively new paradigm in the field of semantic analysis is the use of distributed representations
for words [6] and for documents [7]. Even though intermediate representations are less readable than
earlier methods of determining the semantics of texts, according to some authors, in practice they work
quite well and allow you to consider a number of criteria and take into account the presence of "water"
and a large number of stop words in the text.</p>
      <p>In [5], Mikolov and Chen demonstrate that the paragraph vector method they developed allows
semantic analysis to be performed on "large" documents of different meanings. Using representations
in the form of vectors that can be used to classify movie reviews or to extract information from web
pages. But even the authors themselves note that this technique does not cover different types of work
and in practice analyzes only small ones. volume of texts, only annotations, reviews, messages, etc.)</p>
      <p>Despite the success of individual studies, very little is known about how well a particular method
works in comparison with each other and how sensitive they are to changes in various parameters and
characteristics of texts. It is also difficult to understand how the search for texts similar in meaning
occurs, what algorithms are available for searching for texts of the same or similar subjects and whether
they take into account the functional and style types of text arrays of information. These problems are
since the authors do not disclose the algorithm of work and consider their works to be the subject of
know-how and due to this, we can say that for 80 years there has been no common solution to the
problem associated with determining the meaning of text documents using semantic analysis.</p>
      <p>The above circumstances determine the relevance of the task of studying existing algorithmic
approaches in the semantic analysis of texts. In turn, the solution of this problem will allow not only to
present the process of semantic analysis in the form of steps, but also to develop an approach that will
subsequently solve this problem based not only on the language itself, but also on a specific subtask for
determining the meaning of text documents.</p>
      <p>The article is aimed at analyzing the current state of the problem of determining the meaning of text
documents using semantic analysis, reviewing the most used methods and algorithms based on them,
as well as formulating and meaningfully formulating the problem of determining the meaning of
documents using semantic analysis for different functional and style types.</p>
      <p>Statement of the task of the research conducted in the article. There are many algorithms suitable
for determining the meaning of text documents using semantic analysis for different functional and
style types of texts. In addition, the nomenclature of functional and style types of texts is known.</p>
      <p>It is necessary, having formulated several common characteristic criteria on a set of algorithms
suitable for determining the meaning of text documents, and based on known algorithms to formulate
the task of semantic analysis, taking into account the subtask of determining functional and style types
of texts.</p>
      <p>The solution of this task involves the implementation of the following sequence of steps:
• clarification of the steps of the text processing process during semantic analysis and, based on
the refined algorithm, the formulation of the generalized algorithm.
• meaningful formulation of the task of determining the meaning of text documents using
semantic analysis for different functional and style types of texts.
• definition of criteria for assessing the quality of decisions made to determine the fact of
semantic proximity of text documents, considering paraphrasing.</p>
      <p>As a result of solving the problem, the current state of the problem of semantic analysis of natural
language texts will be reviewed, the most used algorithms for semantic analysis will be considered, and
a generalized algorithm for semantic analysis of natural language texts will be developed. The process
of determining the quality of decisions made to find the degree of semantic proximity in text work has
been designed.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The state of the problem of semantic analysis of natural language texts</title>
      <p>The application of existing methods and algorithms of semantic analysis depends on the application
area of the problem being solved, to which functional and style type it belongs and the existing
directions in the processing of text information.</p>
      <p>There is a wide range of search engines for determining the similarity of the text, namely [8]: utilities
for statistical analysis of the text; linguistic technologies and systems; utilities of linguistic analysis of
the text (morphology, syntax); natural language processing systems.</p>
      <p>During the study in the work [1], the class of natural language processing systems was considered,
and the methods and algorithms that underlie them were analyzed. In this class of problems, there are
two groups of methods that allow solving the problem of determining the meaning of text
documents [1]:
• methods of semantic processing of texts, which are aimed at the so-called "linguistic
transformations", namely translation into a foreign language in two directions, both into the language
of interest and back to the original; a brief retelling of texts; notes; abstract submission; annotation.
• methods of semantic text processing based on artificial intelligence. This group focuses on
"extracting knowledge" from the texts being analyzed, namely the classification of text messages,
answering questions, contextual translation and understanding of textual information based on what
has been presented. In this class of problems, in most cases, conceptual analysis of texts is used.
Within the framework of this, special attention is paid to such problems as the synthesis of
knowledge representation systems and the development of systems for semantic analysis and
machine "understanding" of texts using ontologies.</p>
      <p>In most cases, the following models of semantic text processing are used to determine the meaning
of text documents using semantic analysis: "linguistic transformations" over texts [9–11], content
analysis [12–18], context analysis [13, 15–17, 19-20], paragraph vector model [21], as well as
associative semantic analysis [22].</p>
      <p>There are many publications that allow you to perform the so-called "linguistic transformations" on
texts, but the most significant contribution to its solution was made by A. Palagin [8]. In accordance
with the decision of A. Palagin [8], the typical process of text processing in semantic analysis consists
of the following steps:
1. Removal of "stop words" are words that occur in every text and do not carry a semantic load,
it is, first, all conjunctions, particles, prepositions, and many other words).
2. Stemming is the process of finding the basis of a word for a given source word. The basis of
the word does not necessarily coincide with the morphological root of the word.
3. Application of the algorithm that directly makes up the semantic core.</p>
      <p>Another important part is the basic semantic template [10, 11], which will allow you to analyze
textual information. BSMS are called the rule according to which there is a semantic dependence in the
analyzed text. In accordance with the works of A. V. Mochalov BSMS consists of four main parts [10]:
1. sequences of words or indivisible semantic units for which their morphological features are
indicated (in some cases, when it is especially important for semantic analysis, the names of these
words and semantic units are given).
2. the names of the semantic relation, which should be formed if the sequence described in the
previous paragraph is found in the text.
3. a sequence of numbers that determine the positions in the sequence from clause 1, the elements
of which should be added to the queue with priority, according to which words will subsequently be
removed from the analyzed sentence fed to the semantic analyzer.
4. the number denoting the priority value, the group of semantic dependencies to which the
semantic relation belongs.</p>
      <p>According to this algorithm, sentences are received at the input, and at the output the program
provides a set of semantic relations formed from the analyzed text.</p>
      <p>After the formation of a set of indivisible semantic units, a morphological analysis of each such unit
is carried out. Then there is a search for such complex language constructions as introductory, participle
and participle turn, subordinate clauses, etc.</p>
      <p>Further, the coincidences of each BSMS from one set to another set are sequentially searched, while
both for the BSMS and for all sentences their morphological characteristics are considered.</p>
      <p>If a match is detected, with some subsets of the set of sentences, for all different subsets and
sentences coinciding with the BSMS, semantic dependencies are formed, which are recorded in the
database if they are detected in the analyzed text for the first time. The search for BSMS in the offer
takes place until all the templates are checked for a match.</p>
      <p>Next, the search for the BSMS among the remaining indivisible semantic units of the set in the text
is repeated. This continues until all the semantic dependencies described by the BSSS are found in the
sentence under analysis.</p>
      <p>Taking into account the above, it can be made a platoon that the algorithm proposed by
A. V. Mochalov for finding semantic dependencies [10, 11] with the help of BSMS is very complex,
and due to its adaptation to the task of detecting dependencies in texts of different thematic orientation
in this form, is resource intensive.</p>
      <p>Within the framework of the second group of determining the meaning of text documents, the
direction of content analysis should be noted. To date, there is no unambiguous definition of what
content analysis is, and there are many interpretations of it.</p>
      <p>Within the framework of this study, the following definition of content analysis was chosen - a
method for identifying and evaluating the specific characteristics of texts and other information carriers,
in which, in accordance with the objectives of the study, certain semantic units of content and forms of
information are allocated [15].</p>
      <p>Content analysis makes it possible to identify individual characteristics of information, as well as
their interrelations. In the process of extracting information, the frequency and volume of references to
these units in a certain set of texts or other information are systematically measured.</p>
      <p>There are several classifications of types of content analysis. The most famous was proposed by
R. Merton [15]:
1. Character Counting – A simple count of certain keywords.
2. Classification of symbols in relation – a balance of positive and negative statements about the
object of research. It is used to analyze the effective arrangement of symbols for propaganda, to
detect contrasting, contradictory judgments and to determine the intentions of the communicator.
3. Analysis by elements – the choice of the main and secondary parts of the text, the definition of
topics related to the main and peripheral interests of the audience.
4. Thematic analysis – identification of obvious and hidden topics.
5. Structural analysis – clarification of the nature of the ratio of different materials:
complementary, combined, colliding.
6. Analysis of the relationship of various materials – a combination of structural analysis with the
study of the sequence of publication of materials, the volume and time of their publication.</p>
      <p>Such many types of content analysis is due to the ambiguity of the criteria, characteristics or
elements of the content, in relation to which the counting procedure is applied, since these can be
individual words, phrases, sentences, paragraphs, texts [23].</p>
      <p>With the help of a frequency analysis of the lexical composition of the text, it is possible to determine
the theme of the work, to identify specific lexemes inherent in the text, to establish the style of the text,
to highlight the genres of the text, etc. Depending on the frequency of significant words in the text, one
can judge the semantic dominants of the text, and by the most frequent words it is possible to determine
the semantic accents of the text.</p>
      <p>The generalized algorithm of content analysis based on the scheme from [19] is as follows:
Step 1. Preparation of the research program
1. Formation of the goal, objectives, hypothesis of the study.
2. Selection of the time interval, selection of the source of analysis, selection of categories and
indicators of evaluation and compilation of the dictionary.
3. Systematization of information on the problematic feature of determining the sample size.
Step 2. Collection of information
1. Correlation of categories and subcategories of content analysis with specific content in the text.
2. Encoding information, recording the frequency and volume of mentions of categories and
subcategories of content analysis.</p>
      <p>Step 3. Analysis of the results obtained
1. Statistical processing of the obtained quantitative data.
2. Interpretation and visualization of the obtained data based on the obtained tasks.</p>
      <p>It should be noted that a simple frequency count is not an exhaustive or sufficient criterion in
determining the meaning of text documents. Also, another problem is the comparison of texts of
different lengths, since it is necessary to compare not simple frequencies, but conditional ones, i.e.,
shares that constitute a certain category in the first and second text.</p>
      <p>Algorithm of content analysis according to [24]:
1. Compilation of a table of content analysis (category + unit of analysis).
2. Calculation of the frequency of use of keywords in messages (encoding matrix).
3. Calculation of the share of keyword usage in messages (encoding matrix).
4. Context consideration.
• Opinion of arbitrators (coders).
• Consideration of the topic (a certain combination of words or concepts, embodied, for example,
in a phrase).
5. Is the mention given in a positive or negative sense?
• The Q-sort method.
6. Janis’s formula.</p>
      <p>According to this algorithm, the same category can be expressed by different units of analysis: word,
topic, paragraph, judgment, hero, social, message as a whole (book, letter, speech, newspaper). That is,
it can be concluded that a significant problem is the choice of the assessment itself, in what exactly it
is necessary to evaluate in words, sentences, in percentage content, etc.</p>
      <p>An integral part of content analysis is contextual analysis. The main essence of contextual analysis
is that not the entire text is analyzed, but only a certain sample of the object of analysis from it, which
is the context of the use of a certain characteristic [20]. It should be noted that there are many ways to
set the context. Contextual analysis will allow you to evaluate not only for individual categories of the
analyzed text, but also for interrelations.</p>
      <p>It is possible to introduce logical relations of jointness, contradiction, subordination, etc. on a set of
vectors [25]. Thus, a certain logical model of the subject area referred to in the text is set, or a model of
the cognitive map inherent in the author of the text.</p>
      <p>Context analysis algorithm [26]:
1. Determine the totality of sources or messages to be studied using a set of specified criteria that
each message must meet.
2. Formation of a sample set of messages.
3. Identification of units of analysis (words or topics).
4. Allocation of units of account, which may coincide with semantic units or be of a specific
nature.
5. The counting procedure itself. It is generally like the standard methods of classification by
selected groupings.
6. Interpretation of the obtained results in accordance with the goals and objectives of a particular
study.</p>
      <p>The main problem is that it is necessary to anticipate not only the mentions that may occur, but also
the elements of their contextual use, while a detailed system of rules for assessing each case of use
should be developed.</p>
      <p>Another model of semantic text processing is the paragraph vector model, which is described in the
work [20], the authors called it "Distributed Memory". This model, when determining the meaning of
text documents, adds a memory vector to a standard language model aimed at determining the topic of
the document. In general, the algorithm of paragraph vectors for semantic text processing is as follows:
1. a dictionary is created
2. the corpus is read, and the occurrence of each word in the corpus is calculated
3. an array of words is sorted by frequency and rare words are removed
4. a Huffman tree is built – to encode the dictionary to reduce the computational and temporal
complexity of the algorithm
5. the so-called sub-proposal is read from the corpus, which is the basic element of the corpus – a
sentence, a paragraph, an article, after which the most frequent words are removed from the analysis
(sampling)
6. going through the sub-proposal and calculating the maximum distance between the current and
predicted word in the sentence
7. a direct propagation neural network with the activation function of hierarchical softmax and /
or negative sampling is used
8. the vector representation of the words is calculated. The vector representation is based on
contextual proximity, the essence of which is that the words encountered in the text will have close
coordinates of the vector-words.</p>
      <p>A paragraph vector is concatenated or averaged using local contextual word vectors to predict the
next word.</p>
      <p>It should be noted that in the process of analyzing a paragraph vector, you can further simplify if
you do not use the local context in the prediction task. With this approach, the parameters of the
classifier and word vectors are not used, and the backpropagation algorithm is used to adjust the
parameters of paragraph vectors.</p>
      <p>In practice, the paragraph vector model is effective for the task of finding semantic similarity for
large chunks of text. The article [21] uses an example to prove that the paragraph vector model is
superior to the LDA in finding semantic similarity in Wikipedia articles at different vector sizes, but it
is surprising that vector operations can be performed on articles like word vectors.</p>
      <p>Associatively, semantic analysis [22] is a set of lexemes denoting concepts united by
cause-andeffect relationships and at the same time having identical schemes in the semantic structure that are in
a special kind of relationship. In accordance with the conclusion of the authors [22] can be divided into
three stages:
• Transition from words and phrases of sentences to the corresponding semantic meanings –
concepts of ontology.
1. Search for the meaning of a word from a variety of possible alternatives to concepts, which is
semantically the closest to the meanings of neighbor words from the local environment of this word.
2. Finding the degree of proximity of the shortest path between concepts in the network of
ontology. Calculation of the distance in ontology between the assumed meaning of the word and the
conceptual meanings of the words of the immediate environment.
3. Defines in the semantic network of the ontological knowledge base the concept corresponding
to the correct meaning of the word or phrase in the text.
• Assembling semantic frames of text sentences.
1. The choice of the type of slot to fill with the meaning of the concept of a word depends on the
syntactic position of this word in the grammatical structure of the sentence.
2. Filling in the slots of the frame structure of the sentence, analyzing the sentence parse tree and
the syntactic positions of words and phrases for each concept using semantic-syntactic tables of
modal-role cases like File more cases.
3. Construct a semantic frame of the current input text sentence.
• Unification of semantic structures of text sentences into a single semantic network of text.
Combining two structures into one network is performed on the principle of combining semantically
identical vertices, that is, if there are vertices in the structures G1 and G2 that refer to one semantic
concept, these vertices are combined into one. At the output of the system, a semantic network of
input text is generated, which contains in the vertices the concepts of the text connected by arcs of
semantic relations.</p>
      <p>The use of the considered models in practice in most cases leads to a partial loss of the meaningful
meaning of the text, although it makes it possible to perform semantic processing of texts and allows
you to group documents by formal features more quickly. Therefore, it is necessary to pay special
attention to the methods of extracting data from natural language texts, as they allow you to perform
preliminary preparation of texts during semantic analysis and allow you to understand why there is a
decrease in accuracy in the process of semantic analysis of the document.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Setting the task of determining the meaning of text documents using semantic analysis for different functional and style types of texts</title>
      <p>Based on the considered concepts and existing solutions, a meaningful formulation of the task of
determining the meaning of text documents using semantic analysis for different functional and style
types of texts can be formulated.</p>
      <p>The initial data of the task is a lot of texts of different functional and style types and of different
volumes.</p>
      <p>While solving the problem, it is necessary to obtain data on the degree of semantic proximity of
textual work, considering possible facts of borrowing or paraphrasing.</p>
      <p>Because the results of the problem will be used to determine the meaning of text documents when
calculating the degree of semantic proximity of text work, taking into account possible facts of
borrowing or paraphrasing, it is advisable to introduce the following requirements:
• search should be implemented for text different functional and style types.
• search should be implemented for different volumes of texts.
• it should be possible to choose a search tool.
• the search should consider possible errors and typos in the analyzed text.
• for each word from the text, the setting of many synonyms should be provided.
• a set of all documents must be found – the degree of proximity of which is close.
• an analysis of the results should be provided.
• there should be a mode for re-calculating the degree of semantic proximity with a change in the
order of the components of the attribute group and a change in the functional and style types of texts.</p>
      <p>The specificity of this task involves the development of a special method that will make it possible
to consider all the above requirements as fully as possible. The solution of the problem is complicated
by the fact that today there is no single classification of functional and style types of texts, but such a
classification will allow us to identify additional characteristics of the text, thanks to which it is possible
to judge not only the authorship, but also the specifics of the construction of various types of texts that
also differ in volume. It is also necessary to feed the repetitions, and not removing them from the text,
since they affect the process of semantic analysis and, therefore, the result.</p>
      <p>These features make it possible to develop a specialized methodology for determining the meaning
of text documents using semantic analysis for different functional and style types of texts. Which will
allow you to find the semantic proximity of different more accurately at first glance types of documents.</p>
      <p>A generalized algorithm for semantic analysis of natural language texts for various functional and
style types can be represented as:
1. selection of the source of analysis, selection of evaluation indicators
2. systematization of information by the selected feature relative to the volume of the database
analyzed
segments</p>
      <p>definition of the functional and style type of text for the analyzed text (style, substyle, genre
style, genre substyle). Define the characteristics of functional and style types for the text being
of the selected functions or variables
checking whether there are significant differences between groups for the analyzed text in terms
assignment of the document to one of the specified groups of functional and style types
determination of the direction of semantic analysis
definition of stylistic techniques and other, stylistically marked means of language and text
subcategories
compared</p>
      <p>the timing of the algorithm that directly constitutes the semantic core (e.g., latent semantic
analysis or paragraph vectors, etc., etc.)
8.1. search for the meaning of a word from a variety of possible alternatives to concepts, which is
semantically the closest to the meanings of neighbor words from the local environment of this word.
8.2. definition of dependencies between words
8.3. finding the degree of proximity between elements (by sentences, phrases in the text, words)
8.4. assembly of semantic frames of text sentences
8.5. constructing a semantic frame of the current sentence of the input text
8.6. unification of semantic structures of text sentences into a single system of the analyzed text</p>
      <p>analysis by elements (selection of the main and secondary parts of the text, determination of
topics related to the main and peripheral interests of the audience)
10. thematic analysis (identification of obvious and hidden topics)
11. analysis of the relationship of various materials (combination of structural analysis with the
study of the sequence of publication of materials, the volume and time of their publication)
11.1. correlation of categories and subcategories of content analysis with specific content in the text
11.2. finding matches of each basic semantic pattern from one set to another set
11.3. encoding of information, recording the frequency and volume of mentions of categories and
12. issuance of values of sets of semantic relations formed from the analyzed text
13. finding all possible semantic dependencies, in other texts with which the original one is
It should be noted that this is a theoretical algorithm that is derived from the totality of existing
algorithms considered in the paragraph "State of the problem of semantic analysis of natural language
texts" of this article. Therefore, it is necessary to perform its verification, for this it is necessary to
perform its programmatic implementation. It should also be noted that most likely within the framework
of testing, these algorithms will be refined and modified, so we can say that this is the first version of
the algorithm for semantic analysis of natural language texts. It is also necessary to determine how the
degree of semantic proximity of texts will be evaluated.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Determining the quality of decisions made to find the degree of semantic proximity of texts</title>
      <p>Assessment of the degree of semantic proximity of texts. The task of determining the degree of
semantic proximity of texts is to experimentally study and assess the adequacy of the semantic analysis
model, as well as to decide on the proximity of texts depending on the functional and style types of text.</p>
      <p>Let the overall decision-making task S be defined formally [27] by the Quartet:

 : 
= { } – many possible solutions;
×</p>
      <p>→  – limited actual loss function;  ( ,  );
where  – is the set of characteristics of the decision-maker (LPR), i.e., the set of possible situations on
the set of possible results of observations;</p>
      <p>= { 1,  2, … ,   };
P – is some statistical regularity on  , that is, a non-empty family of finite-additive probable measures
on 2 , or a given class of functional regularities closed in the corresponding topology.
 = ( ,  ,  ,  ),
(1)</p>
      <p>= ( ,  ,  ) ∈  .</p>
      <p>It is required to choose one  ∈</p>
      <p>that would minimize losses under the unknown  ∈  . Note
that in our case of solving unstructured problems, it is possible to extract knowledge with the help of
experts, learn from scenario examples and form functional patterns on  .</p>
      <p>This allows you to define the decision-making scheme (DSM) Z as an ordered triple:
from class Z of the permissible variants of solutions formed by the SPPR allowing to realize the
algorithmic degree of semantic proximity of the texts P in the form for deriving the desired solutions.</p>
      <p>Therefore, the formula (2) can be clarified: the SPPR  = ( ,  ,  ) ∈  and the class  ( ) of
(2)
(3)
regularities characterizing {  } ∈  .
are minimal for a given class of patterns  ( ).</p>
      <p>It is necessary to choose a sequence of solutions {  } ∈  so that the average losses (risk R(S))
So, in general, the adequacy and effectiveness of the proposed method for assessing the degree of
semantic proximity of texts will be calculated using the minimax risk criterion R. The optimal solution
will be considered a solution  ∗∗ for which the condition is met:


 ( ,  ∗∗) =</p>
      <p>( ,  )</p>
      <p />
      <p>In practice, the risk R is estimated by the amount of error that the found decisive rule makes on control
situations {  ,</p>
      <p>∈  } ⊆  . The error value is characterized by the ratio of the number of decisions incorrectly
made by the rule to the total number of decisions presented. The above assessment technique will be used to
conduct an experiment to determine the degree of semantic proximity of texts. Experimental assessment of the
adequacy and effectiveness of the developed algorithm of semantic analysis Natural language texts for various
functional and style types will be conducted on specially developed software.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In the course of this work:
was carried out;
•
•
•
•
•
•
•
an analysis of the current state of the problem of determining the meaning of text documents
the importance of semantic analysis was determined;
an overview of the most used algorithms of semantic analysis is carried out;
the steps of the text processing process in semantic analysis have been specified;
a generalized algorithm for determining the meaning of text documents using semantic analysis
for different functional and style types of texts has been developed;
functional and style types of texts was carried out;
a meaningful formulation of the task of determining the meaning of text documents for different
the criteria for assessing the quality of decisions made to determine the fact of semantic
proximity of texts, taking into account paraphrasing, have been determined.</p>
      <p>As a conclusion throughout the work, it can be said that semantic analysis has a high practical
application, determining the meaning of text documents.</p>
      <p>It should be noted that today there is no single classification of functional styles, but the solution of
this subtask will allow you to operate with the characteristics of various style texts and thereby increase
the accuracy of determining the meaning of text documents. Therefore, it is necessary to pay attention
to its solution in the future.</p>
      <p>The results obtained will make it possible to continue work on the analysis of texts for the presence
of textal borrowings and borrowings of the idea in them, as well as in determining the authorship of the
text, taking into account its paraphrasing. At the moment, work is underway to create a system for
assessing the degree of semantic proximity of texts. Within the framework of this system, the developed
algorithm is used.</p>
    </sec>
    <sec id="sec-6">
      <title>6. References</title>
      <p>[1] G. Tereshchenko, I. Gruzdo. Overviewand Analysis of Existing Decisions of Determining the
Meaning of Text Documents, 2018 International Scientific-Practical Conference Problems of
Infocommunications. Science and Technology PIC S&amp;T'2018, 2018, IEEE, Kharkiv, Ukraine
p.645-653 doi: 10.1109/INFOCOMMST.2018.8632014.
[2] O. Ye. Stryzhak, V. V. Gorborukov, O. V. Franchuk and M. A. Popova. “Ontologiya zadachi
vyboru ta yiyi zastosuvannya pry analizi limnologichnyx system [Ontology of the problem of
choice and application in the analysis of limnological systems].” Ekologichna bezpeka ta
pryrodokorystuvannya: Zb. nauk. pracz M-vo osvity i nauky Ukrayiny, Kiyiv. nacz. un-t bud-va
i arxit., NAN Ukrayiny, In-t telekomunikacij i global. inform. prostoru; redkol.: O. S. Voloshkina,
O.M. Trofymchuk (golov. red.) [ta in.]. Kyev, Ukraine: no.15, pp.172-183, 2014.
[3] D. Blei, M. Jordan and A. Ng, Latent Dirichlet Allocation, Journal of Machine Learning Research,
vol. 3, no. 1, pp. 993-1022, 2003.
[4] S.M. Shcherbina, Scientific style in the system of stylistic differentiation of modern literary
languages, Scientific Journal of the National Pedagogical University named after M.P.
Drahomanov. Series 9: Current trends in language development, 2017, Vol. 16, pp. 261-268. URL:
http://enpuir.npu.edu.ua/handle/123456789/20001.
[5] T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in
vector space, Arxiv.org, 2013. URL: https://arxiv.org/abs/1301.3781.
[6] Q. V. Le and T. Mikolov, Distributed representations of sentences and documents, International</p>
      <p>Conference in Machine Learning, Arxiv.org, 2014. URL: https://arxiv.org/abs/1405.4053.
[7] Z. Harris, Distributed structure, Word, vol. 10, no. 2-3, 1954, pp. 146-162.
[8] I. V. Gruzdo. Problemi analiza estestvenno-yazikovix tekstov dlya obnaruzheniya plagiata v
uchebnix rabotax. [Problems of the analysis of natural language texts for detecting plagiarism in
educational work]. Radioelektronni i komp'yuterni systemy, no.1 (49), 2011, pp.130-138.
[9] K. Palagin, S. Kryvy, V. Velichko and N. Petrenko, K analizu estestvenno-jazykovyh ob’ektov
[To the analysis of natural language objects], International Book Series. Supplement to the
International Journal Information Technologies &amp; Knowledge, ITHEA, Sofia, Bulgaria vol. 3, no.
9, 2009, pp. 36-43.
[10] A. V. Mochalov, Algoritm semanticheskogo analiza teksta, osnovannyj na bazovyh
semanticheskih shablonah s udaleniem [Algorithm for the semantic analysis of text, based on the
basic semantic templates with deletion]. Nauchno-tehnicheskij vestnik informacionnyh
tehnologij, mehaniki i optiki, no.5, 2014, pp.126-132.
[11] V. A. Kuznetsov, V. A. Mochalov and A. V. Mochalova, Ontological-semantic text analysis and
the question answering system using data from ontology, in ICACT Transactions on Advanced
Communications Technology (TACT), vol. 4, issue 4, July 2015, pp. 651-658.
[12] V. Shalak. Jelementy matematicheskih metodov komp'juternogo kontent-analiza tekstov
[Elements of mathematical methods of computer content-analysis of texts], Vaal.ru, URL:
http://www.vaal.ru/show.php?id=146.
[13] Pingali, K., Bilardi, G. (2015). A Graphical Model for Context-Free Grammar Parsing. In: Franke,
B. (eds) Compiler Construction. CC 2015. Lecture Notes in Computer Science , vol 9031. Springer,
Berlin, Heidelberg. doi: 10.1007/978-3-662-46663-6_1.
[14] N.V. Kostenko, V.F. Ivanov. Experience of content analysis: Models and practices, Monograph.</p>
      <p>Free Press Center, Kyiv, 2003.
[15] Technologies of document analysis in sociological research. THEME 5. Formalized analysis of
documents (content analysis), 2018. URL:
https://bookonlime.ru/lecture/tema-5formalizovannyy-analiz-dokumentov-kontent-analiz.
[16] O.V. Ivanov, Quantitative content analysis: the problem of context, Bulletin of V.N. Karazin
Kharkiv National University. Series: Sociological research of modern society: methodology,
theory, methods, vol. 30, Kharkiv National University named after V.N. Karazina, Kharkiv, 2012.
№ 999. 95–99.
[17] Q. Yang, A novel recommendation system based on semantics and context awareness, Computing,
2018, vol. 100, no. 8, рр. 809-823.
[18] Fei Long and Hongju Cheng. Improved Personalized Recommendation Algorithm Based on
Context-Aware in Mobile Computing Environment. Wirel. Commun. Mob. Comput. Vol.
2020:110. doi: 10.1155/2020/8857576.
[19] R. I. Loktev and S.M. Zuev Kontent-analiz kak metod issledovanija osobennostej
zhiznedejatel'nosti postojannogo naselenija pribrezhnyh naselennyh punktov JaNAO [Content
analysis as a method of studying the features of vital activity of a permanent population of coastal
settlements of the Yamal-Nenets Autonomous Okrug]. Arktika i Sever, no.26, pp.126-135, 2017.
[20] V. Maran, A. Machado, G. M. Machado, I. Augustin, and J. P. M. de Oliveira, “Domain content
querying using ontology-based context-awareness in information systems”, Data &amp; Knowledge
Engineering, vol. 115, 2018, pp. 152–173.
[21] A. Dai, C. Olah and Q. Le, "Document Embedding with Paragraph Vectors", Arxiv.org, 2018.</p>
      <p>URL: https://arxiv.org/abs/1507.07998.
[22] G. Zav'jalov, D. Mal'cev, Ju. Solodovnikova, R. Jaremchuk, Kontent-analiz [Content Analysis].</p>
      <p>URL: http://www.myshared.ru/slide/909850/.
[23] I. Shubin, S. Solonska, S. Snisar, V. Slavhorodskyi, V. Skovorodnikova, Efficiency Evaluation for
Radar Signal Processing on the Basis of Spectral-Semantic Model. Proceedings of the 15th
International Conference on Advanced Trends in Radioelectronics, Telecommunications and
Computer Engineering, TCSET 2020, 2020, pp. 171–174.
[24] A. A. Marchenko, A. A. Nikonenko, Kontekstnyj semanticheskij analiz teksta. Sistema tekstovogo
monitoringa i kachestvennogo ocenivanija fokusnogo obekta [Contextual semantic text analysis.
The system of text monitoring and qualitative assessment of the focal object]. Shtuchnij іntelekt.
2008. No 3. рр. 808-813. URL:
http://dspace.nbuv.gov.ua/bitstream/handle/123456789/7155/02Marchenko.pdf?sequence=1.
[25] K. Smelyakov, S. Smelyakov, A. Chupryna. Advances in Spatio-Temporal Segmentation of Visual
Data. Chapter 1. Adaptive Edge Detection Models and Algorithms. Series Studies in
Computational Intelligence (SCI), Vol. 876. Publisher Springer, Cham, 2020. pp. 1-51. doi:
10.1007/978-3-030-35480-0.
[26] Kontent-analiz [Content Analysis]. Wikipedia. URL:
https://ru.wikipedia.org/wiki/%D0%9A%D0%BE%D0%BD%D1%82%D0%B5%D0%BD%D1
%82-%D0%B0%D0%BD%D0%B0%D0%BB%D0%B8%D0%B7.
[27] B. M. Konorev, V. S. Harchenko, G. N. Chertkov, Instrumental'naja sistema dlja podderzhki
jekspertizy i nezavisimoj verifikacii kriticheskogo PO: principy postroenija i primenenija [Tool
system to support the examination and independent verification of critical software: principles of
construction and application]. Informacionnye tehnologii i bezopasnost'. Kyiv. NANU, 2003. №4.
pp. 85-91.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>